BCPS 2013: Biostatistics
Terms in this set (47)
Can only take a limited number of values within a given range.
Data of categories only, unordered, with no indication of relative severity. Data cannot be arranged in an ordering scheme. (Gender, Race, Religion, mortality, disease state)
Placed in categories and rank ordered, distance between categories may not be the same. Ex: pain is low, moderate, high
Variable that can have unlimited number of possible values; can take on any value within a given range. Ex.
Data are ranked in a specific order with a consistent change in magnitude between units; zero point is arbitrary (degrees Fahrenheit)
Interval data with an absolute zero (degrees Kelvin, HR, BP, time, distance)
The arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores. Should only be used with Continuous and normally distributed data. Very sensitive to outliers
A value found by ordering a group of data from least to greatest and choosing the middle value of the group. Also called the 50th percentile. Can be used with Ordinal and Continuous data. Insensitive to outliers.
The number that occurs most often in a set of data
A measure of variability that indicates the average difference between the scores and their mean. Use only for continuous data that are normally distributed. +/- 1 SD = 68% of sample values, +/- 2 SD = 95% of sample values, +/- 3 SD = 99% of sample values
Coefficient of Variation
A measure of dispersion calculated by dividing a distribution's standard deviation by its mean. SD/meanX100%
The difference between the greatest and least numbers in a set of data. Sensitive to outliers.
The point value in a distribution in which a value is larger than some percentage of the other values in the sample. The 75th percentile lies at a point at which 75% of the other values are smaller
IQR interquartile range
The difference between the 25th and 75th percentiles; contains 50% of data
Gaussian dist. Approximate distribution of scores expected when a sample is taken from a large population, drawn as a frequency polygon that often takes the form of a bell-shaped curve, called the normal curve. The mean an the median will be about equal
Data has a tendency to group together, to have a common mean from which they individually vary. Normally distributed data are termed parametric.
Standard Error of the Mean
An estimation of the unaccounted for error within a mean. If the mean is 10 and the standard error of the mean is 2, then the true score is likely to fall somewhere between 8 and 12 or 10 +/- 2. Can be estimated by dividing the SD by the square root of n (sample size)
A range of numbers that encompasses the value that would be obtained if an experiment was performed many times (necessary because the valuemight change slightly each time) (a range from mean - Z to mean + Z; where Z is SEM; 95% (CI) = <0.05 (p) = 1.96 (Z)). ex. Baseline birth weight in a group with mean +-SD of 1.18 +-0.4kg. 95% CI ~mean +- 1.96 x SEM. 95% CI (1.07,1.29) meaning there is 95% certainty that the true mean of the entire population studied will have a mean weight between 1.07 and 1.29kg)
counted or ranked data, nominal or ordinal level, -Assumes no normal distribution
-do not necessarily require interval level data
-are statistical methods that do not make assumptions about population
The hypothesis that states there is no difference between two or more sets of data. Stating opposite of what you expect to find. If the null hypothesis is rejected, a statistically significant difference exists (unlikely attributable to chance)
The hypothesis that states there is a difference between two or more sets of data.
a statistical test based on mathematical derivations that include assumptions about parameters of populations from which the samples were drawn. The t test, ANOVA, and tests involving the Pearson correlation are parametric, - statistical significance tests that make assumptions about population from which the data were drawn
-Assumes the variable's normal distribution in the population (i.e., the normal curve)
*t-test, z-test, f-test/ANOVA, paired t-test, etc.
To compare two groups. Appropriate when researchers randomly assign participants to the research groups. A statistical test that compares two group means or compares a group mean to a population means to assess whether the differences between means are reliable. 1) the repeated measures t-test (paired t-test)—to compare two sets of scores in which the scores are paired, as in a before and after design. LDL today vs 3mths later 2) Independent Groups T-test (2 sample, unpaired): Lets us compare two groups where there is no connection between the scores. women LDL vs men LDL 3) One sampl t-test - Compare a single group's mean to the population mean. LDL in a group vs national avg.
A test for differences in the means of paired samples (related populations)
One sample t-test
Used to determine if a single sample mean is different from a known population mean
Two sample t-test
1. Samples are Independent 2. Data in each sample are independent 3. Both populations are Normal
analysis of variance- a test for the significance of differences among three of more means; parametric.
One way ANOVA
an inferential statistical test for comparing the means of three or more groups using a between-participants design and one independent variable. Ex Group 1, Group 2, Group 3
Two way ANOVA
Hypothesis test used when there are two independent variables used to create groups;
- one continuous dependent variable is measured and a means is computed for each group. Ex Young age Group 1 Group 2 Group 3 vs Old age Group 1 Group 2 Group 3
Repeated Measures ANOVA
Statistical test that allows a researcher to determine if differences in the same interval/ratio variable occur over three or more measurements of the variable. Group 1 - Measurement 1, Measurement 2, Measurement 3
Wilcoxon Rank and Mann Whitney U-test
Statistical test is used for evaluating ordinal and continuous data. Nonparmetric test. Compares 2 independent samples. (related to t-test)
Sign test or Wilcoxon signed rank test
Ignore magnitude of change only direction.
Count patients improved, compare to probabilities according to chance. Compares 2 matched or paired samples. (related to paired t test)
Chi square test
For Nominal data - compares frequencies of proportions between groups to see if they're different. Compares expected and observed proportions between two or more groups. Ex. 80% men 20% female in one group versus 70%men 30%female in second group,
Fisher Exact test
nonparametric test used instead of chi-square if sample size is small or some cells in contingency table have no observations
Type I error
alpha error - An error caused by rejecting the null hypothesis when it is true; has a probability of alpha. Practically, a Type I error occurs when the researcher concludes that a relationship or difference exits in the population when in reality it does not exist.
The probability that the association is due to chance (<.05=statistically significant) The smaller the p value, the less likely it would have happened by chance.
Type II error
beta errr - (acceptable rate = 0.1 to 0.2) error of failing to reject a null hypothesis when in fact it is false (also called a "false negative"). You think there is NO CAUSE EFFECT but THERE IS
A measure of how much ability exists to find a significant effect using a specific statistical tool. Mathematically, power is a direct function of Type I error rate (1 - β)
Examines the strength of the association between two variables. A statistical measure that indicates the extent to which two factors vary together and thus how well one factor can be predicted from the other. Correlations can be positive or negative.
Examines the ability of one or more variables to predict another variable. The relation between selected values of x and observed values of y (from which the most probable value of y can be predicted for any value of x). Prediction model = Y=mx+b
A correlation statistic used primarily for two sets of data that are of the ratio or interval scale. The most commonly used correlational technique. Usually described as the degree of association between the two variables. Does not imply that one vaiable is dependent on the other (regression analysis will do that). Ranges from -1 to 1.
Spearman Rank correlation
nonparametric test that quantifies the strength of an association b/t 2 variables that does not assume normal distribution continuous data. Can be used for ordinal or nonnormally distributed continuous data.
Coefficient of determination - accuracy of prediction. How well the independent variable predicts the dependent variable. Can range between 0 and 1. If r squared = 0.8 then 80% of the variability in Y is "explained" by the variability in X
An event history method, and others are used when the dependent variable of interest is a time interval (e.g., time from onset of disease to death).
Estimates survival function: uses survival times to estimate the proportion of people who would survive a given length of time under the same circumstances.
test used to compare survival analysis curves between 2 or more groups
Cox proportional hazards model
multivariate survival analysis controlling for other factors. Allows calculation of hazard ratio. (and CI)