410 terms

CPH Exam - Biostatistics

CPH Exam - Biostats
STUDY
PLAY

Terms in this set (...)

Biostatistics includes a set of principles and methods which allow us to ______, _______ and ______ important public health problems.
assess, analyze and solve
The application of biostatistics involves what?
The application of biostatistics involves developing clear research questions, designing studies to collect relevant information or data, applying appropriate techniques to analyze those data and drawing meaningful conclusions from data appropriately accounting for uncertainty.
T/F: In every study, it is not important to define the population of interest.
False
T/F: The population is the collection of all units (usually people) that we wish to make inferences about. The appropriate population depends on the research question.
True
T/F: Using a probability sampling technique maximizes the likelihood the sample is representative of the larger population.
True
T/F: This is important to minimize bias which is a systematic error in a study that leads to uncertainty about the estimates generated from the study. Bias is actually caused by the investigators in the design or conduct of the study
True
Summary measures based on population data are called ________ and estimates derived from sample data are called _______.
Summary measures based on population data are called parameters and estimates derived from sample data are called statistics.
N is used to denote the ________and is an example of a parameter, n denotes the ________ and is an example of a statistic
N is used to denote the population size and is an example of a parameter, n denotes the sample size and is an example of a statistic.
T/F: In applied biostatistical analysis, we use sample statistics to estimate unknown population parameters.
True
Name three general classifications of variables
discrete, continuous and time to event variables
Discrete variables
variables that assume only a finite number of values, for example, whether or not a participant is taking lipid lowering treatment (yes or no), their blood type (A, B, AB or O), or symptom severity (none, mild, moderate, or severe).
Discrete variables with two response options are called______________
Discrete variables with two response options are called dichotomous
Discrete variables with more than 2 unordered are called __________
Discrete variables with more than 2 unordered are called categorical/nominal
Continuous variables
Sometimes called quantitative or measurement variables, can take on any value within a range of plausible values. For example, total serum cholesterol level, height, weight and systolic blood pressure are examples of continuous variables. Time to event variables reflect the time to a particular event such as a heart attack, cancer remission or death.
Ordered response options are called __________
Ordered response options are called ordinal variables
Continuous variables are summarized using measures of _____________ and ___________.
Continuous variables are summarized using measures of central tendency and variability
The ______ and ___________are generally appropriate to describe central tendency and variability, respectively.
The mean and standard deviation are generally appropriate to describe central tendency and variability, respectively.
The mean is computed:
n = the sample size or the number of participants in the sample
The standard deviation is computed as follows:
T/F: When there are extreme values in the sample (called outliers), the mean may be inflated or deflated depending on whether the extreme values are high or low.
True
When there are outliers, what ranges are appropriate to describe central tendency and variability, respectively?
When there are outliers, the median (middle value) and interquartile range (third quartile, Q3, 75th percentile - first quartile, Q1, 25th percentile) are appropriate to describe central tendency and variability, respectively.
Popular guide for assessing outliers:
Outliers are values either above Q3+1.5(Q3-Q1) or below Q1-1.5(Q3-Q1), where Q1 and Q3 are the first and third quartiles, respectively.
T/F: A box and whisker plot is a popular graphical display for a continuous variable
True: The "box" contains the middle 50% of the distribution (i.e., Q1 is the bottom of the box and Q3 is the top of the box) and the median is the horizontal line in between. The range of the data (minimum to maximum) are indicated by the vertical line. Some computer packages will indicate outliers in the sample in the box and whisker plot.
When comparing groups with respect to a continuous variable, we typically compare the ________ between groups.
When comparing groups with respect to a continuous variable, we typically compare the means between groups.
What do µ1 and µ2 represent?
Let µ1 and µ2 be the true, unknown population means of the two groups that we wish to compare.
How is the difference in the true means (µ1- µ2) estimated?
The difference in the true means (µ1- µ2) is estimated by the difference in the observed sample means ( X1-X2 ).
Categorical and ordinal variables are best summarized by the ________ and proportion or _______________ of participants in each response category.
Categorical and ordinal variables are best summarized by the frequency (count) and proportion or relative frequency (frequency/n) of participants in each response category.
T/F: Graphical displays are also useful for summarizing discrete data.
True
What type of chart is used to summarize categorical or nominal variable?
A bar chart is used to summarize a categorical or nominal variable
What type of chart is used to summarize ordinal data?
A histogram is used to summarize ordinal data
What do p1 and p2 represent?
Let p1 and p2 be the true, unknown population proportions in the two groups that we wish to compare.
How is the difference in the true proportions (p1-p2) estimated?
The difference in the true proportions (p1-p2) is estimated by the difference in the observed sample proportions.

Also called the risk difference
Relative Risk Ratio:
Estimated using the observed sample data.
Odds ratio:
Estimated using the observed sample data.
T/F: relative risk ratio is more intuitive
True
T/F: in case-control studies, it is not possible to estimate a relative risk because the case-control design involves selecting participants on the basis of their outcome status.
True
T/F: in case-control studies, the odds ratio can be estimated and used to summarize the association between the exposure or risk factor and the outcome.
True
T/F: When the proportion of outcome events is low, the odds ratio is close in value to the relative risk.
True
A common way to summarize time to event data is through a ___________
A common way to summarize time to event data is through a hazard rate
What is the hazard rate?
The hazard rate is the rate of a particular outcome conditional on time. To compare two groups with respect to the time to event, we use the hazard ratio, which is the ratio of the hazard rates for each group.
What is a probability?
A probability is a number between 0 and 1 (inclusive) that represents the likelihood that a particular outcome occurs.
A random variable is a variable whose value is determined by _______.
A random variable is a variable whose value is determined by chance.
T/F: A probability distribution is a table or a function that links each value of a random variable to its likelihood of occurring (probability).
True
The normal distribution is characterized by its _______ and ___________.
The normal distribution is characterized by its mean and standard deviation.
A normal distribution is one in which the mean is equal to the __________ (and also to the mode or the most frequent value).
A normal distribution is one in which the mean is equal to the median (and also to the mode or the most frequent value).
Approximately 95% of the values in a normal distribution lie between the mean minus _____ times the standard deviation and the mean plus _______ times the standard deviation.
Approximately 95% of the values in a normal distribution lie between the mean minus two times the standard deviation and the mean plus two times the standard deviation.
Approximately all of the values lie between the mean plus or minus _______ times the standard deviation.
Approximately all of the values lie between the mean plus or minus three times the standard deviation.
T/F: Many continuous variables follow a normal distribution.
True
Central Limit Theorem:
The central limit theorem is an important theorem in statistics that essentially states that regardless of the distribution of the population (normal or otherwise), if we take simple random samples from the population and for each sample compute the sample mean ( ); if the sample size is large (usually n ≥ 30 is sufficient) then the distribution of the sample means is approximately normal. This is important as many of the procedures for statistical inference are based on the assumption that the outcome follows a normal distribution.
Name an important assumption in statistical procedures:
An important assumption is independence of the observations or data points.

Or

Some procedures are based on the assumption that the outcome of interest is approximately normally distributed.
Name the two general areas of statistical inference:
Name the two general areas of statistical inference: estimation and hypothesis testing.
In estimation, we generate a confidence interval (CI) estimate of an unknown _________ _____________ (e.g., the mean in a single population, the difference in proportions in two independent samples) based on sample data appropriately accounting for sampling variability.
In estimation, we generate a confidence interval (CI) estimate of an unknown population parameter (e.g., the mean in a single population, the difference in proportions in two independent samples) based on sample data appropriately accounting for sampling variability.
The form of the confidence interval is _________ _________ ± margin of error, where the _________ _________ is the best estimate of the unknown parameter and the margin of error is the product of the confidence level and the standard error (or the variability of the point estimate).
The form of the confidence interval is point estimate ± margin of error, where the point estimate is the best estimate of the unknown parameter and the margin of error is the product of the confidence level and the standard error (or the variability of the point estimate).
How do you generate a confidence interval?
To generate a confidence interval, we select a confidence level; usually confidence levels of 90%, 95% or 99% are used. Confidence interval estimates are interpreted as a range of plausible values for an unknown population parameter with a probability attached.
In hypothesis testing, we formally compare __________ _______________ based on sample data again accounting for sampling variability.
In hypothesis testing, we formally compare population parameters based on sample data again accounting for sampling variability.
We set up competing hypotheses, called the ________ and ________ hypotheses.
We set up competing hypotheses, called the null and research hypotheses.
What does the null hypothesis reflect?
The null hypothesis reflects the "no difference" or "no effect" situation
What does the research hypothesis suggest?
The research or alternative hypothesis reflects the anticipated or hypothesized difference or effect.
T/F: A test statistic is computed which summarizes the sample information as it relates to the null hypothesis.
True
Hypothesis tests produce a p-value, which is:
Hypothesis tests produce a p-value, which is: The probability of observing a test statistic as extreme or more extreme than that observed if the null hypothesis is true.
T/F: A small p-value (e.g., p<0.05) suggests that there is less than a 5% probability of observing a test statistic as extreme or more extreme than that observed in the study sample and would likely lead to rejection of the null hypothesis in favor of the research hypothesis.
True
The significance criterion, denoted ____, is the probability of a Type I error (rejecting the null when in fact it is true) and should always be selected before looking at the data.
The significance criterion, denoted α, is the probability of a Type I error (rejecting the null when in fact it is true) and should always be selected before looking at the data.
T/F: The p-value or exact significance level should always be reported so that the reader can judge the significance of the findings (or lack thereof).
True
T/F: When a test of hypothesis is statistically significant (e.g., p < 0.05), we can be comfortable that the data support the research hypothesis because we control the probability that this conclusion is in error (by selecting a small Type I error rate, i.e. the significance level).
True
T/F: If we fail to reject the null hypotheses (e.g., p > 0.05), it may be that in fact there is no effect or that we have committed a Type II error.
True
A Type II error is:
A Type II error is the probability of failing to reject the null hypothesis when it is actually false.
Unfortunately, we cannot simply set a Type II error rate at the outset as it depends on the Type I error rate, the sample size and the size of the effect.
When a test fails to reject the null hypothesis it may be that this conclusion is in error.
What is the best safeguard against a Type II error?
The best safeguard is to plan the study carefully to ensure that the sample size is large enough to minimize the Type II error rate or to maximize the power of the test (power is the probability of rejecting a null hypothesis when in fact it is false and is equal to 1-Type II error rate). It is important to note that failing to reject the null hypothesis is not equivalent to proving the null to be true.
How are procedures organized?
Procedures are organized according to the number of comparison groups under investigation (e.g., one, two, more than two) and the nature of the outcome variable (e.g., continuous, dichotomous, discrete, time to event).
When are one sample procedures most useful?
One sample procedures are most useful when investigating new techniques or technologies and an estimate of the unknown mean or proportion in the population is needed.
The confidence interval estimate provides a range of plausible value for the __________ _________or _________ that can be useful, for example, in planning future studies (e.g., comparative studies).
The confidence interval estimate provides a range of plausible value for the unknown mean or proportion that can be useful, for example, in planning future studies (e.g., comparative studies).
The formula for a confidence interval for the population mean is:
X is the mean in the study sample
t is the value from the t distribution reflecting the desired confidence level (e.g., 95%)
s is the standard deviation in the study sample
n is the sample size
Standard error equation?
S is the standard error (or the variability of the sample mean).
The above is appropriate with a sample of n independent participants and when the characteristic under study is approximately normally distributed.
If the sample size is large (e.g., n ≥ 30), then the distributional assumption can be relaxed.
If the outcome is dichotomous and it is of interest to generate a CI estimate for the unknown proportion, the formula is:
where p-hat is the proportion in the study sample
Z is the value from the standard normal distribution reflecting the desired confidence level
n is the sample size.
The above is appropriate as long as the sample has at least 5 independent participants in each of the dichotomous response categories. If there are fewer than 5 participants in one or both of the response categories, then an exact procedure should be used
In the hypothesis testing approach, a one sample test can be used, for example, to compare the ________ ___________in a study sample to a known _______ or the observed proportion in a study sample to a known proportion when the primary outcome is continuous or dichotomous, respectively.
In the hypothesis testing approach, a one sample test can be used, for example, to compare the observed mean in a study sample to a known mean or the observed proportion in a study sample to a known proportion when the primary outcome is continuous or dichotomous, respectively.
The tests for the mean and proportion are
One sample tests can be criticized, however, because the comparators (i.e., µ0, p0) are often based on ________ _________ and might not represent an appropriate comparison.
One sample tests can be criticized, however, because the comparators (i.e., µ0, p0) are often based on historical data and might not represent an appropriate comparison.
T/F: Comparisons are often more appropriate when the comparison groups are evaluated in parallel or concurrently
True
If the sample size is small and the distribution of the outcome is highly non-normal, then a ________ ______________(which does not assume normality), such as the Wilcoxon signed rank test should be used
If the sample size is small and the distribution of the outcome is highly non-normal, then a nonparametric test (which does not assume normality), such as the Wilcoxon signed rank test should be used
When the primary outcome is continuous and there are two independent comparison groups (e.g., patients assigned to an active drug versus placebo, male versus female participants), then a _________ __________ for the difference in means can be produced or a test for a difference in means can be performed.
When the primary outcome is continuous and there are two independent comparison groups (e.g., patients assigned to an active drug versus placebo, male versus female participants), then a confidence interval for the difference in means can be produced or a test for a difference in means can be performed.
The appropriate use of the two independent samples t test assumes 3 things: that there are independent participants in each of two independent comparison groups, that the outcome is approximately normally distributed and that the variances in the groups are comparable.
1. that there are independent participants in each of two independent comparison groups
2. that the outcome is approximately normally distributed
3. that the variances in the groups are comparable.
The confidence interval for the difference in means is given by the following:
where X1 and X2 are the means in the study samples,
t is the value from the t distribution reflecting the desired confidence level (e.g., 95%),
Sp is the pooled standard deviation
n1 and n2 are the respective sample sizes.
Pooled Standard deviation (appropriate when the population variances are assumed to be equal and computed by combining the variances in the two study samples):
In the hypothesis testing approach, a two independent samples t test can be used:
The confidence interval and test of hypothesis are two different approaches to making the comparison between _______.
The confidence interval and test of hypothesis are two different approaches to making the comparison between means.
The 95% confidence interval provides the range of plausible values for the _______ ______ ___________, whereas the test of hypothesis examines the _________ ___________of the difference (i.e., interpretation of the p-value).
The 95% confidence interval provides the range of plausible values for the difference in means, whereas the test of hypothesis examines the statistical significance of the difference (i.e., interpretation of the p-value).
T/F: If a 95% confidence interval for the difference in means does not include 0 (i.e., the null value) then there is evidence of a statistically significant difference in means at α=0.05.
True
If the sample sizes are small and the distribution of the outcome is highly non-normal, then a nonparametric test which does not assume normality, such as the ________ _________ ________ _______or the _________ __________ _______ __________ should be used
If the sample sizes are small and the distribution of the outcome is highly non-normal, then a nonparametric test which does not assume normality, such as the Wilcoxon rank sum test or the Mann Whitney U test should be used
When data are matched or paired, then the analysis is focused on __________ __________. For example, suppose a study is conducted in which measures of body mass index are taken on n participants at the start of the study (baseline) and then again after 6 weeks of exposure to an exercise program. Suppose the objective is to assess the change in body mass index in response to the exercise program. Because two measurements are taken on each participant, we violate the assumption of independence of observations. The procedure is to compute difference scores on each participant by subtracting the measurements (e.g., baseline-6 weeks).
When data are matched or paired, then the analysis is focused on difference scores.
A confidence interval for the mean difference or a test for the mean difference in the population can be conducted. The confidence interval formula is:
Where Xd is the mean of the difference scores in the study sample,
t is the value from the t distribution reflecting the desired confidence level,
sd is the standard deviation of the difference scores in the study sample
n is the sample size (i.e., the number of independent participants, equal to the number of pairs).
The hypothesis testing approach is:
T/F: It is very important to note that in the paired t test (and confidence interval for paired data) summary statistics (i.e., the mean and standard deviation) are based on difference scores.
True
When the primary outcome is dichotomous and there are two independent comparison groups, then a confidence interval for the difference in proportions, for the _______ ________ or for the _________ ________ can be produced.
When the primary outcome is dichotomous and there are two independent comparison groups, then a confidence interval for the difference in proportions, for the relative risk or for the odds ratio can be produced.
The confidence interval for the difference in proportions is:
where p-hat 1 and p-hat 2 are the proportions in the study samples,
Z is the value from the standard normal distribution reflecting the desired confidence level
n1 and n2 are the respective sample sizes.
The confidence interval for the relative risk is usually computed in two steps:
The first step involves estimating a confidence interval for the natural log of the relative risk
The second step involves taking the antilog (ex) of each limit to produce a confidence interval for the relative risk.
The formula to estimate a confidence interval for the natural log of the relative risk:
where p-hat 1 and p-hat 2 are the proportions in the study samples,
x1 and x2 are the numbers of positive responses in the respective comparison groups,
Z is the value from the standard normal distribution reflecting the desired confidence level
n1 and n2 are the respective sample sizes.
Confidence interval for the odds ratio - estimating a confidence interval for the natural log of the odds ratio using:
where p-hat 1 and p-hat 2 are the proportions in the study samples
x1 and x2 are the numbers of positive responses in the respective comparison groups
Z is the value from the standard normal distribution reflecting the desired confidence level
n1 and n2 are the respective sample sizes.
Equality of proportions chart:
In the hypothesis testing approach, a test can be performed to assess equality of proportions. The null hypothesis of equality of proportions (risk difference=0) is equivalent to a relative risk or an odds ratio equal to 1.
How is p-hat computed?
p-hat is computed by summing all of the positive responses and dividing by the total sample size,
The 95% confidence interval provides the range of plausible values for the difference in proportions, for the ______ _______or for the ______ _______, whereas the test of hypothesis examines the statistical significance of the difference in proportions, the ________ ______ or the _________ _______ (i.e., interpretation of the p-value).
The 95% confidence interval provides the range of plausible values for the difference in proportions, for the relative risk (RR) or for the odds ratio (OR), whereas the test of hypothesis examines the statistical significance of the difference in proportions, the RR or the OR (i.e., interpretation of the p-value).
If a 95% confidence interval for the difference in proportions does not include ____ (i.e., the null value) or if a 95% confidence interval for the relative risk or odds ratio does not include ___ (i.e., the null value), then there is evidence of a statistically significant difference in proportions at α=0.05.
If a 95% confidence interval for the difference in proportions does not include 0 (i.e., the null value) or if a 95% confidence interval for the relative risk or odds ratio does not include 1 (i.e., the null value), then there is evidence of a statistically significant difference in proportions at α=0.05.
When there are more than two independent groups, the procedure to test for differences in means of a continuous outcome is __________ ________ __________.
When there are more than two independent groups, the procedure to test for differences in means of a continuous outcome is analysis of variance (ANOVA).
T/F: In ANOVA, there are k (≥2) independent groups and again variances among groups are assumed to be equal.
True
The procedure for testing the equality of means in ANOVA is:
where Xj is the mean of the jth study sample,
X is the overall mean (computed by all pooling study samples),
nj is the sample size in the jth group,
k is the number of groups
N is the total sample size (pooling all study samples).
T/F: A rejection of the null hypothesis in ANOVA is evidence that all means are not equal. It is possible that some of the means are equal to one another - just not all equal.
True
chi-square goodness of fit test - procedure for testing whether the distribution of responses in the study sample is different from the pre-specified distribution:
Where O represents the observed frequency or number of responses in each category of the discrete variable
E represents the expected frequency in each response category
he expected frequency is determined under the assumption that the null hypothesis is true
The procedure is appropriate when the expected cell frequency in each cell is at least 5
If any of the expected frequencies are below 5, exact procedures are needed
chi-square test of independence procedure testing:
When there are two or more independent groups and the outcome is discrete (with 2 or more response options), the procedure to test for differences in proportions among the groups
Where O represents the observed frequency in each cell of the two-way table
E represents the expected frequency in each cell of the two-way table (under the assumption that the null hypothesis is true).
It can be shown that the expected frequencies in each cell are computed as follows: E=(row total)*(column total)/n.
The chi-square test of independence is suitable to test for differences in __________ across comparison groups.
The procedure is appropriate when the expected cell frequency in each cell is at least 5. If any of the expected frequencies are below 5, exact procedures are needed
The chi-square test of independence is suitable to test for differences in proportions across comparison groups.

The procedure is appropriate when the expected cell frequency in each cell is at least 5. If any of the expected frequencies are below 5, exact procedures are needed
What do multivariable statistical methods allow us to do?
Multivariable statistical methods allow us to consider the impact or relationships among several variables simultaneously.

Typically this is done using a procedure called regression analysis.
What is regression analysis?
Regression analysis is a widely used procedure in biostatistics to relate an outcome, or dependent, variable (denoted Y) to one or more independent or predictor variables (denoted X1, X2, etc).
What are confounders?
Confounders are other variables that are related to the risk factor of interest and also to the outcome that may mask or enhance an association between the risk factor of interest and the outcome.
Graphical displays, specifically _______ _________, are a useful way to summarize associations between continuous predictor and outcome variables.
Graphical displays, specifically scatter plots, are a useful way to summarize associations between continuous predictor and outcome variables.
Regression models differ depending on the classification of the outcome or dependent variable. If the outcome is continuous, a _______ _______ model is appropriate.
Regression models differ depending on the classification of the outcome or dependent variable. If the outcome is continuous, a linear regression model is appropriate.
Regression models differ depending on the classification of the outcome or dependent variable. If the outcome is dichotomous, a ______ _________ model is appropriate.
Regression models differ depending on the classification of the outcome or dependent variable. If the outcome is dichotomous, a logistic regression model is appropriate.
Regression models differ depending on the classification of the outcome or dependent variable. If the outcome is time to event then a ________ ___________ _________model is appropriate.
Regression models differ depending on the classification of the outcome or dependent variable. If the outcome is time to event then a Cox proportional hazards model is appropriate.
Discrete variables can also be considered as predictor variables in a regression model as a set of ________ __________.
Discrete variables can also be considered as predictor variables in a regression model as a set of dummy variables.
T/F: A positive regression coefficient indicates that the predictor is positively associated with the outcome, whereas a negative regression coefficient indicates that the predictor is inversely associated with the outcome.
True
Example 1: A clinical trial is conducted to test the efficacy of a new drug for hypertension. The new drug is compared to a placebo in a trial involving n=80 participants. The primary outcome is systolic blood pressure measured after 4 weeks on the assigned drug. At 4 weeks participants are also classified as meeting the criteria for hypertension or not. The table below shows characteristics of study participants measured at baseline (prior to randomization) as well as information on the outcomes measured 4 weeks post-randomization.
Q.1 Was the randomization successful?
To assess whether the randomization was successful we compare baseline characteristics between the placebo and new drug groups. This can be done informally by comparing the means of continuous characteristics - for example the mean ages seem comparable (75.2 versus 74.7 years) as do the mean BMIs (26.1 versus 26.9). To assess whether the observed differences (0.5 year difference in mean ages and 0.8 unit difference in mean BMI) are beyond what would be expected by chance, we can conduct formal tests of hypothesis. In each case a t test for the difference in means is run and p-values are produced. With regard to the dichotomous characteristics (e.g., sex, diabetes status and smoking status) we can again informally compare proportions between groups or use a two sample test for equality of proportions to make a formal, statistical comparison. To assess if there is a statistically significant difference in baseline blood pressure categories, a chi-square test is used. Suppose all of the tests described are conducted and the p-values for the tests are shown below.
Example 1: A clinical trial is conducted to test the efficacy of a new drug for hypertension. The new drug is compared to a placebo in a trial involving n=80 participants. The primary outcome is systolic blood pressure measured after 4 weeks on the assigned drug. At 4 weeks participants are also classified as meeting the criteria for hypertension or not. The table below shows characteristics of study participants measured at baseline (prior to randomization) as well as information on the outcomes measured 4 weeks post-randomization.
Q.2 Which, if any, of the baseline characteristics are statistically significantly different between groups?
Using a 5% significance criterion, none of the characteristics are statistically significantly different between groups. However, the proportion of participants who are diabetic (32.5% versus 20.0%) is approaching statistical significance (p=0.09).
Example 1: A clinical trial is conducted to test the efficacy of a new drug for hypertension. The new drug is compared to a placebo in a trial involving n=80 participants. The primary outcome is systolic blood pressure measured after 4 weeks on the assigned drug. At 4 weeks participants are also classified as meeting the criteria for hypertension or not. The table below shows characteristics of study participants measured at baseline (prior to randomization) as well as information on the outcomes measured 4 weeks post-randomization.
Q.3 Is there statistical evidence of efficacy of the new drug?
The 95% confidence interval for the difference in mean systolic blood pressures between groups is probably the most informative summary of the effect of treatment. On average, systolic blood pressures of participants treated with the new drug are 7 units lower than those of participants who are untreated. The 95% confidence interval suggests that systolic blood pressures are lower by anywhere from 0.5 to 13.6 units in participants treated with the new drug as compared to placebo. The investigators may wish to also present data on the difference in proportions of participants who meet the criteria for hypertension following 4 weeks of treatment (i.e., 28% of those treated with the new drug as compared to 52% of those untreated). The main conclusion of the trial, however, should be based on the primary outcome, systolic blood pressure.
Example 1: A clinical trial is conducted to test the efficacy of a new drug for hypertension. The new drug is compared to a placebo in a trial involving n=80 participants. The primary outcome is systolic blood pressure measured after 4 weeks on the assigned drug. At 4 weeks participants are also classified as meeting the criteria for hypertension or not. The table below shows characteristics of study participants measured at baseline (prior to randomization) as well as information on the outcomes measured 4 weeks post-randomization.
Q.4 Is confounding an issue and if so, how should it be handled?
Based on the comparisons of baseline characteristics, it does not appear that there are any meaningful differences between participants assigned to the new drug as compared to the placebo and thus confounding is not likely an issue. In general, randomization (a key component of a clinical trial) minimizes the likelihood of confounding. Unfortunately, in some trials there are important differences that must be reconciled despite the fact that participants are randomized. This is generally handled with multivariable regression models that assess the effect of treatment while controlling or adjusting for confounders.
Example 2:
A case-control study is performed to examine the relationship between the concentration of plasma antioxidant vitamins and cancer risk. For the study, twenty five participants with stomach cancer are selected along with 25 frequency-matched controls (participants free of stomach cancer). The table below shows data for participants classified as having insufficient concentrations of plasma antioxidant vitamins in each study group.
Q.1 What measure would be best to quantify the association between insufficient concentrations of plasma antioxidant vitamins and cancer?
Because this is a case-control study the odds ratio is the most appropriate measure of effect. The estimate of the odds ratio is 1.4 (the odds of insufficient concentrations of plasma antioxidant vitamins are 1.4 times higher in persons with cancer as compared to persons without cancer).
Example 2:
A case-control study is performed to examine the relationship between the concentration of plasma antioxidant vitamins and cancer risk. For the study, twenty five participants with stomach cancer are selected along with 25 frequency-matched controls (participants free of stomach cancer). The table below shows data for participants classified as having insufficient concentrations of plasma antioxidant vitamins in each study group.
Q.2 Is there a statistically significant association between insufficient concentrations of plasma antioxidant vitamins on cancer?
The 95% confidence interval for the odds ratio is 0.8 to 2.4. Because the confidence interval includes the null value (one), we cannot infer that this association is statistically significant.
Example 2:
A case-control study is performed to examine the relationship between the concentration of plasma antioxidant vitamins and cancer risk. For the study, twenty five participants with stomach cancer are selected along with 25 frequency-matched controls (participants free of stomach cancer). The table below shows data for participants classified as having insufficient concentrations of plasma antioxidant vitamins in each study group.
Q.3 Is bias or confounding an issue here and if so, how would these issues be handled?
Bias can be an issue in case-control studies - in fact there are several possible sources of bias including misclassification and recall bias. If there are issues of bias then the study results are in question and they may not able to be corrected analytically. Confounding may also be an issue. In this study, controls were matched to cases on age which is a likely confounder. If there are other characteristics that are related to the exposure (concentration of plasma antioxidant vitamins) and also to the outcome (stomach cancer) these would also be potential confounders and it would be necessary to account for these using multivariable logistic regression analysis.
Example 2:
A case-control study is performed to examine the relationship between the concentration of plasma antioxidant vitamins and cancer risk. For the study, twenty five participants with stomach cancer are selected along with 25 frequency-matched controls (participants free of stomach cancer). The table below shows data for participants classified as having insufficient concentrations of plasma antioxidant vitamins in each study group.
Q.4 Is a case-control study the best design to assess the relationship between the concentration of plasma antioxidant vitamins and cancer risk?
From a statistical standpoint, a clinical trial is always preferable. However, it may not be the optimal or a feasible design to answer the research question of interest. A clinical trial would involve randomizing participants free of stomach cancer to receive sufficient or insufficient antioxidant vitamins (which may or may not be ethical depending on what is known about the health effects of antioxidant vitamins) and following them for a sufficiently long period of time so that adequate numbers develop stomach cancer. Adequate in this instance refers to a sufficiently large number of cases of stomach cancer to ensure precision in the statistical analyses. The time it might take to observe a sufficient number of cases of stomach cancer might render the clinical trial an impractical approach. A cohort study is another option. A cohort study has many positive advantages including the ability to establish temporal associations, but also suffers from the same disadvantage that it may take too long to observe a sufficient number of cases of stomach cancer. Thus, the case-control design is likely a reasonable approach to this research question.
Example 3
An open label study (where participants are aware of the treatment they are taking) is run to assess the time to pain relief in patients with arthritis following treatment. In this study, all patients are followed until they experience pain relief (based on pilot studies, pain relief is experienced in patients within 6 hours of treatment). The following linear regression equations are estimated relating time to pain relief measured in minutes (outcome variable) to participant's age (in years), sex (coded 1 for males and 0 for females) and severity of disease (a score ranging from 0 to 100 with higher scores indicative of more severe arthritis). In this analysis, the three predictors (age, sex and severity of disease) are statistically significant, p<0.05, when considered individually and simultaneously. Time to Pain Relief = -24.2 + 0.9 Age Time to Pain Relief = 11.8 + 19.3 Male Sex Time to Pain Relief = 3.2 + 0.4 Severity Time to Pain Relief = -19.8 + 0.50 Age + 10.9 Male Sex + 0.2 Severity

Q.1 How and to what extent is the expected time to pain relief associated with age, sex and severity of disease?
In models relating each predictor one-at-a-time to time to pain relief, age, male sex and severity of disease are all positively associated with time to pain relief. Older patients are more likely to report longer times to pain relief as are men and patients with higher disease severity scores (e.g., each additional year of age is associated with an increase of 0.9 minutes in time to pain relief, men report times to pain relief of approximately 19.3 minutes longer than women, and each additional scale point in disease severity is associated with a 0.4 minute increase in time to pain relief).
Example 3
An open label study (where participants are aware of the treatment they are taking) is run to assess the time to pain relief in patients with arthritis following treatment. In this study, all patients are followed until they experience pain relief (based on pilot studies, pain relief is experienced in patients within 6 hours of treatment). The following linear regression equations are estimated relating time to pain relief measured in minutes (outcome variable) to participant's age (in years), sex (coded 1 for males and 0 for females) and severity of disease (a score ranging from 0 to 100 with higher scores indicative of more severe arthritis). In this analysis, the three predictors (age, sex and severity of disease) are statistically significant, p<0.05, when considered individually and simultaneously. Time to Pain Relief = -24.2 + 0.9 Age Time to Pain Relief = 11.8 + 19.3 Male Sex Time to Pain Relief = 3.2 + 0.4 Severity Time to Pain Relief = -19.8 + 0.50 Age + 10.9 Male Sex + 0.2 Severity

Q.2 Suppose we are interested in assessing the association between sex and time to pain relief. Is there evidence of confounding by age or severity?
In the unadjusted model (relating male sex to time to pain relief ignoring other factors), on average males report times to pain relief of approximately 19.3 minutes longer than women. However, in the multivariable model adjusting for age and disease severity the impact of male sex is reduced. Holding age and severity of disease constant, males report pain relief of approximately 10.9 minutes longer than women. When the regression coefficients in an unadjusted and adjusted model change substantially (some investigators use a criterion of 10% or more), there is evidence of confounding.
Example 4
A longitudinal cohort study is conducted to assess the association between soda consumption and development of Type II (adult onset) diabetes. A cohort of 250 participants between the ages of 30 and 49 who are free of diabetes enroll in the study. Sociodemographic and clinical risk factors are measured in each participant at the start of the study. The primary risk factor of interest is soda consumption and participants are asked to report the number of standard sodas (12 ounces) they consume in a typical day (including caffeinated and decaffeinated, diet and regular sodas). Each participant is followed for 10 years for the development of diabetes. For analytic purposes, soda consumption is categorized as 0, 1, 2, or 3+ sodas per day. Of interest is whether there is an increased risk of developing diabetes in participants who consume 3+ sodas per day as compared to participants who consume less. Summary data on selected sociodemographic and clinical variables as well as the outcome (diabetes status) classified by soda consumption are shown below.

Q.1 Is there an association between soda consumption and incident diabetes?
There appears to be an increasing risk of incident diabetes associated with increased soda consumption. Approximately 7% of persons who do not drink soda develop diabetes over a 10 year period as compared to 9%, 17% and 33% of persons who drink 1, 2 and 3 or more sodas per day, respectively.
Example 4
A longitudinal cohort study is conducted to assess the association between soda consumption and development of Type II (adult onset) diabetes. A cohort of 250 participants between the ages of 30 and 49 who are free of diabetes enroll in the study. Sociodemographic and clinical risk factors are measured in each participant at the start of the study. The primary risk factor of interest is soda consumption and participants are asked to report the number of standard sodas (12 ounces) they consume in a typical day (including caffeinated and decaffeinated, diet and regular sodas). Each participant is followed for 10 years for the development of diabetes. For analytic purposes, soda consumption is categorized as 0, 1, 2, or 3+ sodas per day. Of interest is whether there is an increased risk of developing diabetes in participants who consume 3+ sodas per day as compared to participants who consume less. Summary data on selected sociodemographic and clinical variables as well as the outcome (diabetes status) classified by soda consumption are shown below.

Q.2 What is the best measure of effect to summarize the association between soda consumption (considering 3+ sodas per day versus < 3 sodas per day) and incident diabetes?
With the dichotomous risk factor, a relative risk is the best measure of association. In this study there are n=210 persons who drink less than 3 sodas per day as compared to 40 who drink 3 or more per day. Among them 23/210=11% develop diabetes as compared to 33% respectively. This translates to a relative risk of 3 (i.e., persons who drink 3 or more sodas per day are 3 times more likely to develop diabetes than persons who drink fewer than 3 sodas per day).
Example 4
A longitudinal cohort study is conducted to assess the association between soda consumption and development of Type II (adult onset) diabetes. A cohort of 250 participants between the ages of 30 and 49 who are free of diabetes enroll in the study. Sociodemographic and clinical risk factors are measured in each participant at the start of the study. The primary risk factor of interest is soda consumption and participants are asked to report the number of standard sodas (12 ounces) they consume in a typical day (including caffeinated and decaffeinated, diet and regular sodas). Each participant is followed for 10 years for the development of diabetes. For analytic purposes, soda consumption is categorized as 0, 1, 2, or 3+ sodas per day. Of interest is whether there is an increased risk of developing diabetes in participants who consume 3+ sodas per day as compared to participants who consume less. Summary data on selected sociodemographic and clinical variables as well as the outcome (diabetes status) classified by soda consumption are shown below.
Q.3 Is the association between soda consumption and incident diabetes statistically significant?
The 95% confidence interval for the relative risk is 1.6 to 5.6. Because the confidence interval does not include the null value (one), we can infer that this association is statistically significant. We can also conduct a test for equality of proportions of diabetes which produces p<0.001.
Example 4
A longitudinal cohort study is conducted to assess the association between soda consumption and development of Type II (adult onset) diabetes. A cohort of 250 participants between the ages of 30 and 49 who are free of diabetes enroll in the study. Sociodemographic and clinical risk factors are measured in each participant at the start of the study. The primary risk factor of interest is soda consumption and participants are asked to report the number of standard sodas (12 ounces) they consume in a typical day (including caffeinated and decaffeinated, diet and regular sodas). Each participant is followed for 10 years for the development of diabetes. For analytic purposes, soda consumption is categorized as 0, 1, 2, or 3+ sodas per day. Of interest is whether there is an increased risk of developing diabetes in participants who consume 3+ sodas per day as compared to participants who consume less. Summary data on selected sociodemographic and clinical variables as well as the outcome (diabetes status) classified by soda consumption are shown below.
Q.4 Is there evidence of confounding and if so, how should it be handled?
There appear to be substantial differences in age, sex and percent obese among the different comparison groups defined by soda consumption. Persons who drink 3 or more sodas per day are younger, more likely to be male and obese as compared to persons who drink fewer or no soda. To appropriately account for these differences a multivariable logistic regression model should be estimated relating incident diabetes to soda consumption adjusting for age, sex and obesity status.
adjusted rate (adjustment):
A summarizing procedure for a statistical measure in which the effects of differences in composition of the populations being compared have been minimized by statistical methods. Examples are adjustment by regression analysis and by standardization. Adjustment is often performed on rates or relative risks, commonly because of differing age distributions in populations that are being compared. The mathematical procedure commonly used to adjust rates for age differences is direct or indirect standardization.
alpha (α)
The probability of a type I error, the error of rejecting a true null hypothesis, i.e. declaring a difference exists when it does not.
alternative hypothesis:
1. A supposition, arrived at from observation or reflection, that leads to refutable predictions. 2. Any conjecture cast in a form that will allow it to be tested and refuted.
analysis of covariance (ANCOVA):
Originally used for an extension of the analysis of variance that allows for the possible effects of continuous concomitant variables (covariates) on the response variable, in addition to the effects of the factor or treatment variables. Usually assumed that covariates are unaffected by treatments and that their relationship to the response is linear. If such a relationship exists then inclusion of covariates in this way decreases the error mean square and hence term now appears to also be more generally used for almost any analysis seeking to assess the relationship between a response variable and a number of explanatory variables.
analysis of variance (ANOVA):
The separation of variance attributable to one cause from the variance attributable to others. By partitioning the total variance of a set of observations into parts due to particular factors, for example, sex, treatment group, etc, and comparing variances (mean squares) by way of F-tests, differences between means can be assessed. The simplest analysis of this type involves a one-way design, in which N subjects are allocated, usually at random, to the k different levels of a single factor. The total variation in the observations is then divided into a part due to differences between level means (the between groups sum of squares) and a part due to the differences between subjects in the same group (the within groups sum of squares, also known as the residual sum of squares). These terms are usually arranged as an analysis of variance table.

If the means of the populations represented by the factor levels are the same, then within the limits of random variations, the between groups mean square and within groups mean square, should be the same. Whether this is so can, if certain assumptions are met, be assessed by a suitable F-test are that the response variable is normally distributed in each population and that the populations have the same variance. Essentially an example of ageneralized linear model with an identity link function and normally distributed errors.
Bayes' theorem:
A procedure for revising and updating the probability of some event in the light of new evidence. The theorem originates in an essay by the Reverend Thomas Bayes. In its simplest form the theorem may be written in terms of conditional probabilities as:

where Pr( A | Bj ) denotes the conditional probability of event A conditional on event Bj and B1 , B2 ,...,Bk are mutually exclusive and exhaustive events.
The theorem gives the probabilities of the Bj when A is known to have occurred. The quantity Pr( Bj ) is termed the prior probability and Pr( Bj | A ) the posterior probability .
Pr( A | Bj ) is equivalent to the (normalized) likelihood , so that the theorem may be restated as posterior (prior) x (likelihood).
beta (β):
The probability of a type II error, the error of failing to reject a false null hypothesis, i.e. declaring that a difference does not exist when in fact it does.
bias:
In general terms, deviations of results or inferences from the truth, or processes leading to such deviation. More specifically, the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated, or does not test the hypothesis to be tested. In estimated usually measured by the difference between a parameter estimate and its expected value.
binary variable (binary observation):
Observations which occur in one of two possible states, these often being labeled 0 and I. Such data is frequently encountered in medical investigations; commonly occurring examples include 'dead/alive', 'improved/not improved' and 'depressed/not depressed.' Data involving this type of variable often require specialized techniques for their analysis such as logical regression.
binomial distribution:
The distribution of the number of 'successes', X, in a series of n- independent Bernoulli trials where the probability of success at each trial is p and the probability of failure is q = 1- p . Specifically the distribution is given by x = 0, 1, 2 ......, n

The mean, variance, skewness and kurtosis of the distribution are as follows:
mean = np
variance = npq
skewness = ( q - p )/( npq ) 1/2
kurtosis = 3 - (6/n) +(1/npq)
Biostatistics:
A branch of science which applies statistical methods to biological problems. The science of biostatistics encompasses the design of biological experiments, especially in medicine and health sciences.
bivariate:
Outcomes belong to two categories, e.g. yes/no, acceptable/defective "bivariate binomial distribution".
blinded study (blinding):
A procedure used in clinical trials to avoid the possible bias that might be introduced if the patient and/or doctor knew which treatment the patient is receiving. If neither the patient nor doctor are aware of which treatment has been given the trial is termed double-blind. If only one of the patient or doctor is unaware, the trial is called single-blind. Clinical trials should use the maximum degree of blindness that is possible, although in some areas, for example, surgery, it is often impossible for an investigation to be double-blind.
Bonferroni correction:
A procedure for guarding against an increase in the probability of a type I error when performing multiple significance tests. To maintain the probability of a type I error at some selected value (α), each of the m tests to be performed is judged against a significance level (α/m ). For a small number of simultaneous tests (up to five) this method provides a simple and acceptable answer to the problem of multiple testing. It is however highly conservative and not recommended if large numbers of tests are to be applied, when one of the many other multiple comparison procedures available is generally preferable.
case-control study
(Syn: case comparison study, case compeer study, case history study, case referent study, retrospective study) The observational epidemiologic study of persons with the disease (or other outcome variable) of interest and a suitable control (comparison, reference) group of persons without the disease.
categorical data:
Categorical data represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level. While the latter two variables may also be considered in a numerical manner by using exact values for age and highest grade completed, it is often more informative to categorize such variables into a relatively small number of groups.
censored observation:
An observation (Xi) on some variable of interest is said to be censored if it is known only that Xi =Li ( left-censored) or Xi =Ui ( right-censored) where Li and Ui are fixed values. Such observations arise most frequently in studies where the main purpose variable is time until a particular event occurs (for example, time to death) when at the completion of the study, the event of interest has not happened to a number of subjects.
Central Limit Theorem:
If a random variable Y has population mean µ and population variance σ2, then the sample mean, ÿ , based on n observations, has an appropriate normal distribution with a mean µ and variance σ2/ n , for sufficiently large n. The theorem occupies an important place in statistical theory. In short, the Central Limit Theorem states that if the sample size is large enough, the distribution of sample means can be approximated by a normal distribution, even if the original population is not normally distributed.
Chi-Square Distribution:
The Chi-Square distribution is based on a normally distributed population with variance σ2, with randomly selected independent samples of size n and computed sample variance s2 for each sample. The sample statistic X2= ( n - 1) s2/σ2. The chi-square distribution is skewed, the values can be zero or positive but not negative, and it is different for each number of degrees of freedom. Generally, as the number of degrees of freedom increases, the chi-square distribution approaches a normal distribution.
Chi-square statistic:
A statistic having, at least approximately, a chi-squared distribution.
Chi-square test for trend:
A test applied to a two-dimensional contingency table in which one variable has two categories and the other has k ordered categories, to assess whether there is a difference in the trend of the proportions in the two groups. The result of using the ordering in this way is a test that is more powerful than using the chi-squared statistic to test for independence.
clinical trial :
(Syn: therapeutic trial) A research activity that involves the administration of a test regimen to humans to evaluate its efficacy and safety. The term is subject to wide variation in usage, from the first use in humans without any control treatment to a rigorously designed and executed experiment involving test and control treatments and randomization. Several phases of clinical trials are distinguished:
Phase I trial Safety and pharmacologic profiles. The first introduction of a candidate vaccine or a drug into a human population to determine its safety and mode of action. In drug trials, this phase may include studies of dose and route of administration. Phase I trials usually involve fewer than 100 healthy volunteers. Phase II trial Pilot efficacy studies. Initial trial to examine efficacy usually in 200 to 500 volunteers; with vaccines, the focus is on immunogenicity, and with drugs, on demonstration of safety and efficacy in comparison to other existing regimens. Usually but not always, subjects are randomly allocated to study and control groups. Phase III trial Extensive clinical trial. This phase is intended for complete assessment of safety and efficacy. It involves larger numbers, perhaps thousands, of volunteers, usually with random allocation to study and control groups, and may be a multicenter trial. Phase IV trial With drugs, this phase is conducted after the national drug registration authority (e.g., the Food and Drug Administration in the United States) has approved the drug for distribution or marketing. Phase IV trials may include research designed to explore a specific pharmacologic effect, to establish the incident of adverse reactions, or to determine the effects of long-term use. Ethical review is required for phase IV clinical trials, but not for routine post marketing surveillance.
coefficient of variation (CV):
The measure of spread for a set of data defined as 100 x standard deviation / mean CV = s/x bar(100) = sample CV = σ/µ(100) = population Originally proposed as a way of comparing the variability in different distributions, but found to be sensitive to errors in the mean. Simpler definition: The ratio of the standard deviation to the mean. This is meaningful only if the variable is measured on a ratio scale.
cohort study :
(Syn: concurrent, follow-up, incidence, longitudinal, prospective study) The analytic method of epidemiologic study in which subsets of a defined population can be identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors hypothesized to influence the probability of occurrence of a given disease or other outcome.
complementary event:
Mutually exclusive events A and B for which
Pr(A) + Pr(B) = 1

where Pr denotes probability.
conditional probability:
The probability that an event occurs given the outcome of some other event. Usually written, Pr(A l B). For example, the probability of a person being colour blind given that the person is male is about 0.1, and the corresponding probability given that the person is female is approximately 0.0001. It is not, of course, necessary that Pr(A l B) = Pr(A l B); the probability of having spots given that a patient has measles, for example, is very high, the probability of measles given that a patient has spots is, however, much less. If Pr(A l B) = Pr(A l B) then the eventsA and B are said to be independent.
confidence interval (CI):
A range of values, calculated from the sample observations, that is believed, with a particular probability, to contain the true value of a population parameter. A 95% confidence interval, for example, implies that were the estimation process repeated again and again, then 95% of the calculated intervals would be expected to contain the true parameter value. Note that the stated probability level refers to properties of the interval and not to the parameter itself which is not considered a random variable.
confounding variable: A confounding variable (also confounding factor , lurking variable , a confound , orconfounder )
is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable . The methodologies of scientific studies therefore need to control for these factors to avoid what is known as a type 1 error : A 'false positive' conclusion that the dependent variables are in a causal relationship with the independent variable . Such a relation between two observed variables is termed a spurious relationship . Thus, confounding is a major threat to the validity of inferences made about cause and effect, i.e. internal validity , as the observed effects should be attributed to the confounder rather than the independent variable.
By definition, a confounding variable is associated with both the probable cause and the outcome. The confounder is not allowed to lie in the causal pathway between the cause and the outcome: If A is thought to be the cause of disease C, the confounding variable B may not be solely caused by behaviour A; and behaviour B shall not always lead to behaviour C. An example: Being female does not always lead to smoking tobacco, and smoking tobacco does not always lead to cancer. Therefore, in any study that tries to elucidate the relation between being female and cancer should take smoking into account as a possible confounder. In addition, a confounder is always a risk factor that has a different prevalence in two risk groups (e.g. females/males). (Hennekens, Buring & Mayrent, 1987).
contingency table (or two-way frequency table):
The table arising when observations on a number of categorical variables are cross-classified. Entries in each cell are the number of individuals with the corresponding combination of variable values. Most common are two-dimensional tables involving two categorical variables, an example of which is shown below.

The analysis of such two-dimensional tables generally involves testing for the independence of the two variables using the familiar chi-squared statistics. Three- and higher-dimensional tables are now routinely analyzed using log-linear models.
continuous data:
result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions or jumps, e.g. blood pressure.
controlled trial:
A Phase III clinical trial in which an experimental treatment is compared with a control treatment, the latter being either the current standard treatment or a placebo.
correlation coefficient r (Pearson product moment):
An index that quantifies the linear relationship between a pair of variables. In a bivariate normal distribution, for example, the parameter, p. An estimator of p obtained from n sample values of the two variables of interest, (x1, y1), (x2, y2),...,(xn,yn), is Pearson's product moment correlation coefficient, r, given by

The coefficient takes values between -1 and 1, with the sign indicating the direction of the relationship and the numerical magnitude its strength. Values of -1 and 1 indicate that the sample values fall on a straight line. A value of zero indicates the lack of any linear relationship between the two variables.
covariate:
Often used simply as an alternative name for explanatory variables, but perhaps more specifically to refer to variables that are not of primary interest in an investigation, but are measured because it is believed that they are likely to affect the response variable and consequently need to be included in analyses and model building.
cox regression model (Proportional Hazards Model):
A statistical model used in survival analysis developed by D.R. Cox in 1972 asserting that the effect of the study factors on the hazard rate in the study population is multiplicative and does not change over time.
critical value:
The value with which a statistic calculated from sample data is compared in order to decide whether a null hypothesis should be rejected. The value is related to the particular significance level chosen.
crossover rate:
The proportion of patients in a clinical trial transferring from the treatment decided by an initial random allocation to an alternative one.
cross-sectional study:
(Syn: disease frequency survey, prevalence study) A study that examines the relationship between diseases (or other health-related characteristics) and other variables of interest as they exist in defined population at one particular time.
degrees of freedom:
An elusive concept that occurs throughout statistics. Essentially the term means the number of independent units of information in a sample relevant to the estimation of a parameter or calculation of a statistic. For example, in a two-by-two contingency table with a given set of marginal totals, only one of the four cell frequencies is free and the table has therefore a single degree of freedom. In many cases the term corresponds to the number of parameters in a model. Also used to refer to a parameter of various families of distributions, for example, Student's t-distribution and the F-distribution.
dependent variable (response or outcome variable):
The variable of primary importance in investigations since the major objective is usually to study the effects of treatment and/or other explanatory variables on this variable and to provide suitable models for the relationship between it and the explanatory variables.
descriptive statistics:
A general term for methods of summarizing and tabulating data that make their main features more transparent. For example, calculating means and variances and plotting histograms.
dichotomous observation:
A nominal measure with two outcomes (examples are gender male or female; survival yes or no); also called binary. See dichotomous data.
dichotomous scale:
one that arranges items into either of two mutually exclusive categories, e.g. yes/no, alive/dead.
discrete data:
result when the number of possible values is either a finite number or a "countable" number.
Discrete variable:
a countable and finite variable, for example grade: 1, 2, 3, 4...- 12.
distribution (population):
In statistics this term is used for any finite or infinite collection of 'units', which are often people but may be, for example, institutions, events, etc.
double-blinded trial:
A procedure used in clinical trials to avoid the possible bias that might be introduced if the patient and/or doctor knew which treatment the patient is receiving. If neither the patient nor doctor are aware of which treatment has been given the trial is termed double-blind.
dummy coding :
Dummy coding provides one way of using categorical predictor variables in various kinds of estimation models (see also effect coding), such as, linear regression. Dummy coding uses only ones and zeros to convey all of the necessary information on group membership. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm
dummy variable (indicator variable):
in statistics, a variable taking only one of two possible values, one (usually 1) indicating the presence of a condition, and the other (usually 0) indicating the absence of the condition, used mainly in regression analysis.
effect or effect size:
a measure of the strength of the relationship between two variables. In scientific experiments, it is often useful to know not only whether an experiment has a statistically significant effect, but also the size of any observed effects. In practical situations, effect sizes are helpful for making decisions. Effect size measures are the common currency of meta-analysis studies that summarize the findings from a specific area of research
Effective sample size:
The sample size after dropouts, deaths and other specified exclusions from the original sample.
Expected frequencies:
A term usually encountered in the analysis of contingency tables. Such frequencies are estimates of the values to be expected under the hypothesis of interest. In a two-dimensional table, for example, the values under independence are calculated from the product of the appropriate row and column totals divided by the total number of observations.
experiment (in probability):
A probability experiment involves performing a number of trials to measure the chance of the occurrence of an event our outcome. http://www.uic.edu/classes/upp/upp503/sanders4-5.pdf
experiment:
A study in which the investigator intentionally alters one or more factors under controlled conditions in order to study the effects of doing so.
experimental study :
A study in which conditions are under the direct control of the investigator. In epidemiology, a study in which a population is selected for a planned trial of a regimen whose effects are measured by comparing the outcome of the regimen in the experimental group with the outcome of another regimen in a control group.
explanatory variable:
The variables appearing on the right-hand size of the equations defining, for example, multiple regression or logistic regression, and which seek to predict or 'explain' the response variable. Also commonly known as the independent variables, although this is not to be recommended since they are rarely independent of one another.
factor:
An event, characteristic, or other definable entity that brings about a change in a health condition or other defined outcome.
factor analysis:
A set of statistical methods for analyzing the correlations among several variables in order to estimate the number of fundamental dimensions that underlie the observed data and to describe and measure those dimensions. Used frequently in the development of scoring systems for rating scales and questionnaires.
factorial designs:
Designs which allow two or more questions to be addressed in an investigation. The simplest factorial design is one in which each of two treatments or interventions are either present or absent, so that subjects are divided into four groups; those receiving neither treatment, those having only the first treatment, those having only the second treatment and those receiving both treatments. Such designs enable possible interactions between factors to be investigated. A very important special case of a factorial design is that where each of k factors of interest has only two levels; these are usually known as 2k factorial designs. A single replicate of a 2k design is sometimes called an unreplicated factorial.
false-negative:
The proportion of cases in which a diagnostic test indicates disease is absent in patients who have the disease.
false-positive:
The proportion of cases in which a diagnostic test indicates disease is present in disease-free patients
F distribution (variance ratio distribution):
The distribution of the ratio of two independent quantities each of which is distributed like a variance in normally distributed samples. So named in honor of R.A. Fisher who first described the distribution.
Fisher's exact test:
An alternative procedure to use of the chi-squared statistic for assessing the independence of two variables forming a two-by-two contingency table particularly when the expected frequencies are small. The method consists of evaluating the sum of the probabilities associated with the observed table and all possible two-by-two tables that have the same row and column totals as the observed data but exhibit more extreme departure from independence. The probability of each table is calculated from the hypergeometric distribution.
Fisher's z-transformation:
frequency (occurrence):
a general term describing the frequency or occurrence of a disease or other attribute or event in a population without distinguishing between incidence and prevalence.
frequency distribution:
lists data values (either individually or by groups of intervals), along with their corresponding frequencies (or counts).
cumulative frequency distribution:
The tabulation of a sample of observations in terms of numbers falling below particular values. The empirical equivalent of the cumulative probability distribution. An example of such a tabulation is shown below.
frequency table:
a way of summarizing data; used as a record of how often each value (or set of values) of a variable occurs. A frequency table is used to summarize categorical, nominal, and ordinal data. It may also be used to summarize continuous data once the data is divided into categories.
F-test:
A test for the equality of the variances of two populations having normal distributions, based on the ratio of the variances of a sample of observations taken from each. Most often encountered in the analysis of variance , where testing whether particular variances are the same also test for the equality of a set of means.
gold standard trials:
A term usually retained for those clinical trials in which there is random allocation to treatments, a control group and double-blinding.
goodness of fit:
Degree of agreement between an empirically observed distribution and a mathematical or theoretical distribution.
goodness-of-fit test:
A statistical test of the hypothesis that data have been randomly sampled or generated from a population that follows a particular theoretical distribution or model. The most common such tests are chi-square tests.
hazard:
Inherent capability of an agent or situation to have an adverse effect. A factor or exposure that may effect adversely effect health.
hazard rate (force of morbidity, instantaneous incidence rate):
A theoretical measure of the risk of an occurrence of an event, e.g. death or new disease, at a point in time, t , defined mathematically as the limit, as Δ tapproaches zero, of the probability that an individual well at time t will experience the event by t + Δ t , divided by Δ t .
histogram:
A graphical representation of a set of observations in which class frequencies are represented by the areas of rectangles centred on the class interval. If the latter are all equal, the heights of the rectangles are also proportional to the observed frequencies. A histogram of heights of elderly women is shown (see above).
historical controls:
A group of patients treated in the past with a standard therapy, used as the control group for evaluating a new treatment on current patients. Although used fairly frequently in medical investigations, the approach is not to be recommended since possible biases, due to other factors that may have changed over the time, can never be satisfactory eliminated.
homogeneity (homogeneous):
A term that is used in statistics to indicate the equality of some quantity of interest (most often a variance), in a number of different groups, populations, etc.
homoscedasticity:
homo means "same" and -scedastic means "scattered" therefore homoscedasticity means the constancy of the variance of a measure over the levels of the factors under study.
hypothesis testing:
A general term for the procedure of assessing whether sample data is consistent or otherwise with statements made about the population.
incidence:
A measure of the rate at which people without a disease develop the disease during a specific period of time. Calculated as it measures the appearance of disease. More generally, the number of new events, e.g. new cases of a disease in a specified population, within a specified period of time. The term incidence is sometimes wrongly used to denote incidence rate.
Independence:
Two events are said to be independent if the occurrence of one is in no way predictable from the occurrence of the other. Two variables are said to be independent if the distribution of values of one is the same for all values of the other.
independent variable (explanatory variables):
The variables appearing on the right-hand side of the equations defining, for example, multiple regression or logistic regression, and which seek to predict or 'explain' the response variable. Using the term independent variable is not recommended since they are rarely independent of one another.
inference (statistical):
The process of drawing conclusions about a population on the basis of measurements or observations made on a sample of individuals for the population.
interaction:
A term applied when two (or more) explanatory variables do not act independently on a response variable. The graphic below shows an example from a 2 x 2 factorial design. In statistics, interaction is also the necessity for a product term in a linear model.
intercept:
The parameter in an equation derived from a regression analysis corresponding to the expected value of the response variable when all the explanatory variables are zero.
interquartile range:
A measure of spread given by the difference between the first and third quartiles of a sample.
Inter-rater reliability (observer variation, inter-rater agreement, Concordance):
the degree of agreement among raters. It gives a score of how much homogeneity or consensus there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained. There are a number of statistics which can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are: joint-probability of agreement, Cohen's kappa and the related Fleiss' kappa, inter-rater correlation, concordance correlation coefficient and intra-class correlation.
intervention study:
A study in which conditions are under the direct control of the investigator. In epidemiology, a study in which a population is selected for a planned trial of a regimen whose effects are measured by comparing the outcome of the regimen in the experimental group with the outcome of another regimen in a control group.
Kaplan-Meier estimate (product limit method):
A nonparametric method of compiling life or survival tables. This combines calculated probabilities of survival and estimates to allow for censored observations, which are assumed to occur randomly. The intervals are defined as ending each time an event (death, withdrawal) occurs and are therefore unequal.
Kappa:
A measure of the degree of nonrandom agreement between observers or measurements of the same categorical variable K=Po-Pe/1-Pe where Po is the proportion of times the measurements agree, and Pe is the proportion of times they can be expected to agree by chance alone. If the measurements agree more often than expected by chance, kappa is positive; if concordance is complete, kappa = 1; if there is no more nor less than chance concordance, kappa = 0; if the measurements disagree more than expected by chance, kappa is negative.
kurtosis:
the extent to which a unimodal distribution is peaked.
least squares:
A principle of estimation, attributable to Gauss, in which the estimates of a set of parameters in a statistical model are those quantities that minimize the sum of squared differences between the observed values of the dependent variable and the values predicted by the model.
level of significance:
The level of probability at which it is agreed that the null hypothesis will be rejected. Conventionally set at 0.05.
life table analysis:
A procedure often applied in prospective studies to examine the distribution of mortality and/or morbidity in one or more diseases in a cohort study of patients over a fixed period of time. For each specific increment in the follow-up period, the number entering the period, the number leaving during the period, and the number either dying from the disease (mortality) or developing the disease (morbidity), are all calculated. It is assumed that an individual not completed the follow-up period is exposed for half this period, thus enabling the data for those 'leaving' and those 'staying' to be combined into an appropriate denominator for the estimation of the percentage dying from or developing the disease. The advantage of this approach is that all patients, not only those who have been involved for an extended period, can be included in the estimation process.
likelihood function:
A function constructed from a statistical model and a set of observed data that gives the probability of the observed data for various values of the unknown model parameters. The parameter values that maximize the probability are the maximum likelihood estimates of the parameters.
likelihood ratio:
The ratio of the likelihood of observing data under actual conditions, to observing these data under the other, e.g., "ideal" conditions; or comparison of various model conditions to assess which model provides the best fit. Likelihood ratios are used to appraise screening and diagnostic tests in clinical epidemiology.
likelihood ratio test:
A statistical test based on the ratio of the maximum value of the likelihood function under one statistical model to the maximum value under another statistical model; the models differ in that one includes and the other excludes one or more parameters.
linear regression (of Y on X):
a form of regression analysis in which observational data are modeled by a function which is a linear combination of the model parameters and depends on one or more independent variables. In simple linear regression the model function represents a straight line. The results of data fitting are subject to statistical analysis. The data consist of m values taken from observations of the dependent variable (response variable) y . The independent variables are also called regressors, exogenous variables, input variables and predictor variables. In simple linear regression the data model is written as yi = ß0 + xiß1+ εi where εi is an observational error. ß0 (intercept) and ß1 (slope) are the parameters of the model.
logistic regression:
logistic Model:
A statistical model of an individual's risk (probability of disease y ) as a function of a risk factor x : P( y | x ) = 1/(1 + e ^-a-βx) where e is the (natural) exponential function. This model has a desirable range, 0 to 1, and other attractive statistical features. In the multiple logistic model, the term βx is replaced by a linear term involving several factors, e.g., β1 x1 + β2 x2 if there are two factors x1 and x2.
logit (log-odds):
the logarithm of the ratio of frequencies of two different categorical outcomes such as healthy versus sick.
logit model:
A linear model for the logit (natural log of the odds) of disease as a function of a quantitative factor X:
Logit (disease given X = x ) = α + β x This model is mathematically equivalent to the logistic model.
log-linear model:
A statistical model that uses an analysis of variance type of approach for the modeling of frequency counts in contingency tables.
logrank test:
longitudinal study (cohort study):
Studies that give rise to longitudinal data. The defining characteristic of such a study is that subjects are measured repeatedly through time.
Mantel-Haenszel estimate:
Mantel and Haenszel provided an adjusted odds ratio as an estimate of relative risk that may be derived from grouped and matched sets of data. It is now known as the Mantel-Haenszel estimate. The statistic may be regarded as a type of weighted average of the individual odds ratios, derived from stratifying a sample into a series of strata that are internally homogeneous with respect to confounding factors. The Mantel-Haenszel summarization method can also be extended to the summarization of rate ratios and rate differences from follow-up studies. An estimator of the assumed common odds ratio in a series of two-by-two contingency tables arising from different populations, for example, occupation, country of origin, etc.
Mantel-Haenszel test:
A summary chi-square test developed by Mantel and Haenszel for stratified data and used when controlling for confounding.
Marginals:
the row and column totals of a contingency table.
matching (or matched groups):
The process of making a study group and a comparison group comparable with respect to extraneous factors. Often used in retrospective studies when selecting cases and controls to control variation in a response variable due to sources other than those immediately under investigation. Several kinds of matching can be identified, the most common of which is when each case is individually matched with a control subject on the matching variables, such as age, sex, occupation, etc. When the variable on which the matching takes place is continuous it is usually transformed into a series of categories (e.g. age), but a second method is to say that two values of the variable match if their difference lies between defined limits. This method is known as caliper matching.Also important is group or category matching in which the distributions of the extraneous factors are made similar in the groups to be compared.
maximum likelihood estimate:
the value for an unknown parameter that maximizes the probability of obtaining exactly the data that were observed. Used to solve logistic regression.
McNemar test:
A test for comparing proportions in data involving paired samples. The test statistic is given by the above equation.

where b is the number of pairs for which the individual receiving treatment A has a positive response and the individual receiving treatment B does not, and c is the number of pairs for which the reverse is the case. If the probability of a positive response is the same in each group, then X2 has a chi-squared distribution with a single degree of freedom.
mean:
A measure of location or central value for a continuous variable. For a definition of the population value see expected value.
For a sample of observations,x1 , x2 ,...,xn the measure is calculated as:

Most useful when the data has a symmetric distribution and do not contain outliers.
mean-squared error:
the expected value of the square of the difference between an estimator and the true value of a parameter. If the estimator is unbiased then the mean squared error (MSE) is simply the variance of the estimator. For a biased estimator the MSE is equal to the sum of the variance and the square of the bias.
Measurement error:
A mismatch between an estimated value and its true value. Can be observed when using multiple measures of the same entity or concept.
measurement scale:
The range of possible values for a measurement (e.g. the set of possible responses to a question, the physically possible range for a set of body weights). Measurement scales can be classified according to the quantitative character of the scale:

- dichotomous scale - one that arranges items into either of two mutually exclusive categories, e.g. yes/no, alive/dead.
- nominal scale - classification into unordered qualitative categories, e.g. race, religion, country of birth. Measurements of individual attributes are purely nominal scales, as there is no inherent order to their categories.
- ordinal scale - classification into ordered qualitative categories, e.g. grade, where the values have a distinct order but their categories are qualitative in that there is no natural (numerical) distance between their possible values.
- interval scale -an equal interval involves assignment of values with a natural distance between them, so that a particular distance (interval) between two values in another region of the scale. Examples include Celsius and Fahrenheit temperature, date of birth.
- ratio scale - a ratio is an interval scale with a true zero point, so that ratios between values are meaningfully defined. Examples are absolute temperature, weight, height, blood count, and income, as in each case it is meaningful to speak of one value as being so many times greater or less than another value.
measures of central tendency:
A general term for several values of the distribution of a set of values or measurements located at or near the middle of the set. The principal measures of central tendency are the mean, median, and mode.
median:
The value in a set of ranked observations that divides the data into two parts of equal size. When there is an odd number of observations the median is the middle value. When there is an even number of observations the measure is calculated as the average of the two central values. Provides a measure of location of a sample that is suitable for asymmetric distributions and is also relatively insensitive to the presence of outliers.
meta-analysis:
A collection of techniques whereby the results of two or more independent studies are statistically combined to yield an overall answer to a question of interest. The rationale behind this approach is to provide a test with more power than is provided by the separate studies themselves. The procedure has become increasingly popular in the last decade or so but it is not without its critics particularly because of the difficulties of knowing which studies should be included and to which population final results actually apply.
mode:
The most frequently occurring value in a set of observations. Occasionally used as a measure of location.
multicollinearity:
in multiple regression analysis, a situation in which at least some of the independent variables are highly correlated with each other. Such a situation can result in inaccurate estimates of the parameters in the regression model.
multinomial distribution:
the probability distribution associated with the classification of each of a sample of individuals into one of several mutually exclusive and exhaustive categories. When the number of categories is two, the distribution is called binomial.
multiple comparison test:
Procedures for detailed examination of the differences between a set of means, usually after a general hypothesis that they are all equal has been rejected. No single technique is best in all situations and a major distinction between techniques is how they control the possible inflation of the type I error.
multiple regression:
multivariate analysis:
a set of techniques used when the variation in several variables has to be studied simultaneously. In statistics any analytic method that allows the simultaneous study of two or more dependent variables.
multivariate data:
Data for which each observation consists of values for more than one random variable. For example, measurements on blood pressure, temperature and heart rate for a number of subjects. Such data are usually displayed in the form of a data matrix, i.e.

when n is the number of subjects, q the number of variables and xij the observation on a variable j for subject i.
mutually exclusive events:
Events that cannot occur jointly.
nominal scale:
classification into unordered qualitative categories, e.g. race, religion, country of birth. Measurements of individual attributes are purely nominal scales, as there is no inherent order to their categories.
nonparametric method (distribution fee methods):
Statistical techniques of estimation and inference that are based on a function of the sample observations, the probability distribution of which does not depend on a complete specification of the probability distribution of the population from which the sample was drawn. Consequently the techniques are valid under relatively general assumptions about the underlying population. Often such methods involve only the ranks of the observations rather than the observations themselves. Examples are Wilcoxon's signed rank test and Friedman's two way analysis of variance. In many cases these tests are only marginally less powerful than their analogues which assume a particular population distribution (usually a normal distribution), even when that assumption is true. Also commonly known as nonparametric methods although the terms are not completely synonymous.
nonrandomized clinical trial:
A clinical trial in which a series of consecutive patients receive a new treatment and those that respond (according to some pre-defined criterion) continue to receive it. Those patients that fail to respond receive an alternative, usually the conventional, treatment. The two groups are then compared on one or more outcome variables. One of the problems with such a procedure is that patients who respond may be healthier than those who do not respond, possibly resulting in an apparent but not real benefit of treatment.
normal distribution:
null hypothesis:
The 'no difference' or 'no association' hypothesis to be tested (usually by means of a significance test) against an alternative hypothesis that postulates non-zero difference or association.
observational study:
A study in which the objective is to uncover cause-and-effect relationships but in which it is not feasible to use controlled experimentation, in the sense of being able to impose the procedure or treatments whose effects it is desired to discover, or to assign subjects at random to different procedures. Surveys and most epidemiologic studies fall into this class. Since the investigator does not control the assignment of treatments there is no way to ensure that similar subjects receive different treatments. The classical example of such a study that successfully uncovered evidence of an important causal relationship is the smoking and lung cancer investigation of Doll and Hill.
observer variation (error):
variation (or error) due to failure of the observer to measure or identify a phenomenon accurately. Observer variation erodes scientific credibility whenever it appears. There are two varieties of observer variation: inter observer variation, i.e. the amount observers vary from one another when reporting on the same material, and intra observer variation, i.e. the amount one observer varies between observations when reporting more than once on the same material.
odds:
the ratio of the probability of occurrence of an event to that of nonoccurrence (a binary variable), or the ratio of the probability that something is so to the probability that it is not so.
odds ratio (OR):
The ratio of two odds for a binary variable in two groups of subjects, for example, males and females. If the two possible states of the variable are labeled 'success' and 'failure' then the odds ratio is a measure of the odds of a success in one group relative to that in the other. When the odds of a success in each group are identical then the odds ratio is equal to one. Usually estimated as:

where a, b, c and d are the appropriate frequencies in the two-by-two contingency table formed from the data.
one-tailed test (one-sided test):
A significance test for which the alternative hypothesis is directional; for example, that one population mean is greater than another. The choice between a one-sided and two-sided test must be made before any test statistic is calculated.
ordinal scale:
classification into ordered qualitative categories, e.g. grade, where the values have a distinct order but their categories are qualitative in that there is no natural (numerical) distance between their possible values.
outliers:
observations differing so widely from the rest of the data as to lead one to suspect that a gross error may have been committed, or suggesting that these values come from a different population. Statistical handling of outliers varies and is difficult.
paired t-test (matched pair t-test):
Parameter:
A numerical characteristic of a population or a model. The probability of a 'success' in a binomial distribution, for example.
parametric test:
a statistical test that depends upon assumptions about the distribution of the data, e.g. that the data are normally distributed.
percentage:
a way of expressing a number as a fraction of 100 (per cent meaning "per hundred").
Percentile:
The set of divisions that produce exactly 100 equal parts in a series of continuous values, such as blood pressure, weight, height, etc. Thus a person with blood pressure above the 80th percentile has a greater blood pressure value than over 80% of the other recorded values.
placebo:
A treatment designed to appear exactly like a comparison treatment, but which is devoid of the active component.
point estimate (estimation):
The process of providing a numerical value for a population parameter on the basis of information collected from a sample. If a single figure is calculated for the unknown parameter the process is called point estimation. If an interval is calculated which is likely to contain the parameter, then the procedure is called interval estimation.
Poisson distribution:
The probability distribution of the number of occurrences, X, of some random event, in an interval of time or space.
Given by , x = 0, 1, ...

The mean and variances of the distribution are both λ. The skewness of the distribution is , and its kurtosis is 3+(1/λ). The distribution is positively skewed for all values of λ.
population:
In statistics this term is used for any finite or infinite collection of 'units', which are often people but may be, for example, institutions, events, etc.
post hoc comparisons:
Analyses not explicitly planned at the start of a study but suggested by an examination of the data. Such comparisons are generally performed only after obtaining a significant overall F value.
power:
The probability of rejecting the null hypothesis when it is false. Power gives a method of discriminating between competing test of the same hypothesis, the test with the higher power being preferred. It is also the basis of procedures for estimating the sample size needed to detect an effect of a particular magnitude. Mathematically, power is 1-β (type II error).
predictive value:
(positive and negative): In screening and diagnostic tests, the probability that a person with a positive test is a true positive (i.e., does have the disease) is referred to as the "predictive value of a positive test." The predictive value of a negative test is the probability that a person with a negative test does not have the disease. The predictive value of a screening test is determined by the sensitivity and specificity of the test, and by the prevalence of the condition for which the test is used.
predictive value of a negative test:
the probability that a person with a negative test does not have the disease.
predictive value of a positive test:
the probability that a person with a positive test is a true positive (i.e. does have the disease).
prevalence:
principal component analysis:
a statistical method to simplify the description of a set of interrelated variables. Its general objectives are data reduction and interpretation; there is no separation into dependent and independent variables; the original set of correlated variables is transformed into a smaller set of uncorrelated variables called the principal components. Often used as the first step in a factor analysis.
probability:
A measure associated with an event A and denoted by Pr(A) which takes a value such that 0 ≤ Pr(A) ≤ 1. Essentially the quantitative expression of the chance than an event will occur. In general the higher the value of Pr(A) the more likely It is that the event will occur. If the event cannot happen Pr(A) = 0; if an event is certain to happen Pr(A) = 1. Numerical values can be assigned in simple cases by one of the following two methods:
If the sample space can be divided into subsets of n (n ≥ 2) equally likely outcomes and the event A is associated with r (0 ≤ r ≤ n) of these, then Pr(A) = r / n.
If an experiment can be repeated a large number of times, n, and in r cases the event A occurs, then r / n is called the relative frequency of A. If this leads to a limit as n ? 8, this limit is Pr(A).
probability distribution:
For a discrete random variable, a mathematical formula that gives the probability of each value of the variable. See, for example, binomial distribution and Poisson distribution. For a continuous random variable, a curve described by a mathematical formula which specifies, by ways of areas under the curve, the probability that the variable falls within a particular interval. Examples include the normal distribution and the exponential distribution. In both cases the term probability density may also be used. (A distinction is sometimes made between 'density' and 'distribution', when the latter is reserved for the probability that the random variable falls below some value. In this dictionary, however, the latter will be termed the cumulative probability distribution and probability distribution and probability density used synonymously.
proportion:
A type of ratio in which the numerator is included in the denominator.
proportional hazards model (Cox's proportional hazards):
A method that allows the hazard function to be modeled on a set of explanatory variables without making restrictive assumptions about the dependence of the hazard function on time. The model involved is

where x1, x2, ...,xq are the explanatory variables of interest, and h(t) the hazard function. The so-called baseline hazard function, a(t), is an arbitrary function of time. For any two individuals at any point in time the ratio of the hazard functions is a constant. Because the baseline hazard function, a(t), does not have to be specified explicitly, the procedure is essentially a distribution free method. Estimates of the parameters in the model, i.e. ß1, ß2,...,ßq are usually obtained by maximum likelihood estimation, and depend only on the order in which events occur, not on the exact times of their occurrence.
prospective study (cohort study):
Studies in which individuals are followed-up over a period of time. A common example of this type of investigation is where samples of individuals exposed and not exposed to a possible risk factor for a particular disease, are followed forward in time to determine what happens to them with respect to the illness under investigation. At the end of a suitable time period a comparison of the incidence of the disease amongst the exposed and non-exposed is made. A classical example of such a study is that undertaken among British doctors in the 1950s, to investigate the relationship between smoking and death from lung cancer. All clinical trials are prospective.
P-value:
the probability that a test statistic would be as extreme as or more extreme than observed if the null hypothesis were true.
qualitative data:
1. observations or information characterized by measurement on a categorical scale, i.e. a dichotomous (non-numeric) or nominal scale, or if the categories are ordered, an ordinal scale. Examples are sex, hair color, death or survival. 2. systematic non-numerical observations by sociologists, anthropologists, etc. using approved methods such as participant observation or key informants.
quantiles:
Divisions of a probability distribution or frequency distribution into equal, ordered subgroups, for example, quartiles or percentiles.
quartile:
The values that divide a frequency distribution or probability distribution into four equal parts.
random error or variation:
The variation in a data set unexplained by identifiable sources.
randomization (randomized experiment):
Allocation of individuals to groups, e.g., for experimental and control regimens, by chance.
randomized controlled trial (RCT):
an epidemiologic experiment in which subjects in a population are randomly allocated into groups, usually called study and control groups, to receive or not receive an experimental preventive or therapeutic procedure, maneuver, or intervention. The results are assessed by rigorous comparison of rates of disease, death, recovery, or other appropriate outcome in the study and control groups. RCTs are generally regarded as the most scientifically rigorous method of hypothesis testing available in epidemiology.
random sample:
Either a set of n independent and identically distributed random variables, or a sample of n individuals selected from a population in such a way that each sample of the same size is equally likely.
random variable:
A variable, the values of which occur according to some specified probability distribution.
range:
The difference between the largest and smallest observations in a data set. Often used as an easy-to-calculate measure of the dispersion in a set of observations but not recommended for this task because of its sensitivity to outliers and the fact that its value increases with sample size.
ranks:
The relative positions of the members of a sample with respect to some characteristic
rate:
A measure of the frequency of some phenomenon of interest given by

(This resulting value is often multiplied by some power of ten to covert it to a whole number.)
ratio:
The value obtained by dividing one quantity by another: a general term of which rate, proportion, percentage, etc., are subsets. The important difference between a proportion and a ratio is that the numerator of a proportion is included in the population defined by the denominator, whereas this is not necessarily so for a ratio.
receiver operating characteristic (ROC or relative operating characteristic) curve:
a graphic means for assessing the ability of a screening test to discriminate between healthy and diseased persons. The term receiver operating characteristic comes from psychometry, where the characteristic operating response of a receiver-individual to faint stimuli or nonstimuli was recorded.
regression:
As used by Francis Galton (1822-1911) one of the founders of modern biology and biometry, in his bookHereditary Genius (1869), this meant the tendency of offspring of exceptional parents to possess characteristics closer to the average for the general population. Hence "regression to the mean," i.e. the tendency of individuals at the extremes to have values nearer to the mean on repeated measurement. Can also be a synonym for regression analysis in statistics.
regression analysis:
given data on a dependent variable y and one or more independent or predictor variables x1, x2, etc., regression analysis involves finding the "best" mathematical model (within some restricted class of models) to describe y as a function of the x's, or to predict y from the x's. The most common form is a linear model; in epidemiology, the logistic and proportional hazards models are also common.
regression coefficient (multiple regression):
relative risk (RR or risk ratio):
A measure of the association between exposure to a particular factor and risk of a certain outcome, calculated as

Thus a relative risk of 5, for example, means that an exposed person is 5 times as likely to have the disease than one who is not exposed. Relative risk does not measure the probability that someone with the factor will develop the disease. The disease may be rare among both the nonexposed and the exposed.
reliability:
The extent to which the same measurements of individuals obtained under different conditions yield similar results. Reliability refers to the degree to which the results obtained by a measurement, procedure can be replicated. Lack of reliability may arise from divergences between observers or instruments of measurement or instability of the attribute being measured.
repeated-measures design:
Repeated measures is a type of analysis of variance that generalizes Student's t test for paired samples. It is used when two or more measurements of the same type are made on the same subject. Analysis of variance is characterized by the use of factors, which are composed of levels. Repeated measures analysis of variance involves two types of factors--between subjects factors and within subjects factors. The repeated measures make up the levels of the within subjects factor. For example, suppose each subject has his/her reaction time measured under three different conditions. The conditions make up the levels of the within subjects factor. Depending on the study, subjects may divided into groups according to levels of other factors called between subjects factors. Each subject is observed at only a single level of a between-subjects factor. For example, if subjects were randomized to aerobic or stretching exercise, form of exercise would be a between-subjects factor. The levels of a within-subject factor change as we move within a subject, while levels of a between-subject factor change only as we move between subjects.
Residual:
response variable:
The variable of primary importance in investigations since the major objective is usually to study the effects of treatment and/or other explanatory variables on this variable and to provide suitable models for the relationship between the explanatory variables
retrospective study:
A general term for studies in which all the events of interest occur prior to the onset of the study and findings are based on looking backward in time. Most common is the case-control study, in which comparisons are made between individuals who have a particular disease or condition (the cases) and individuals who do not have the disease (the controls). A sample of cases is selected from the population of individuals who have the disease of interest and a sample of controls is taken from among those individuals known not to have the disease. Information about possible risk factors for the disease is then obtained retrospectively for each person in the study by examining past records, by interviewing each person and/or interviewing their relatives, or in some other way. In order to make the cases and controls otherwise comparable, they are frequently matched on characteristics known to be strongly related to both disease and exposure leading to a matched case-control study. Age, sex and socioeconomic status are examples of commonly used matching variables. Also commonly encountered is the retrospective cohort study, in which a past cohort of individuals are identified from previous information, for example, employment records, and their subsequent mortality or morbidity determined and compared with the corresponding experience of some suitable control group.
risk factor:
An aspect of persona behavior or lifestyle, an environmental exposure, or an inborn or inherited characteristic which is thought to be associated with a particular disease or condition.
risk ratio:
The ratio of two risks, usually exposed/not exposed.
sample:
a selected subset of a population. A sample may be random or nonrandom and may be representative or nonrepresentative. Several types of samples exist:area sample - a method of sampling that can be used when the numbers in the population are unknown. The total area to be sampled is divided into subareas, e.g. by means of a grid that produces squares on a map; these subareas are then numbered and sampled, using a table of random numbers.
cluster sample - each unit selected is a group of persons (all persons in a city block, a family, a school, etc.) rather than an individual.
grab sample (sample of convenience) - samples selected by easily employed but basically nonprobabilistic methods. It is improper to generalize from the results of a survey based upon such a sample, for there is no way of knowing what types of bias may have been present.
probability (random) sample -all individuals have a known chance of selection. They may all have an equal chance of being selected, or, if a stratified sampling method is used, the rate at which individuals from several subsets are sampled can be varied so as to produce greater representation of some classes than others.
simple random sample - a form of sampling design in which n distinct units are selected from the N units in the population in such a way that every possible combination of n units is equally likely to be the sample selected. With this type of sampling design the probability that the ith population unit is included in the same, so that theinclusion probability is the same for each unit. Designs other than this one may also give each unit equal probability of being included, both other here does each possible sample of n units have the same probability.
stratified random sample - this involves dividing the population into distinct subgroups according to some important characteristic, such as age or socioeconomic status, and selecting a random sample out of each subgroup. If the proportion of the sample drawn from each of the subgroups or strata, is the same as the proportion of the total population contained in each stratum, then all strata will be fairly represented with regard to numbers of persons in the sample.
systematic sample - the procedure of selecting according to some simple, systematic rule, such as all persons whose names begin with specified alphabetic letters, born on certain dates, or located at specified points on a list. A systematic sample may lead to errors that invalidate generalizations.
sampling distribution:
The probability distribution of a statistic calculated from a random sample of a particular size. For example, the sampling distribution of the arithmetic mean of samples of size n taken from a normal distribution with mean μ with standard deviation s, is a normal distribution also with mean μ but with standard deviation
scatterplot (Synonym for scatter diagram):
A two-dimensional plot of a sample of bivariate observations. The diagram is an important aid in assessing what type of relationship links the two variables. An example is shown in below.
sensitivity:
An index of the performance of a diagnostic test, calculated as the percentage of individuals with a disease who are correctly classified as having the disease, i.e. the conditional probability of having a positive test result given having the disease. A test is sensitive to the disease if it is positive for most individuals having the disease where: CHART

a. diseased individuals detected by the test (true positives) b. nondiseased individuals positive by the test (false positives) c. diseased individuals not detectable by the test (false negatives) d. nondiseased individuals negative by the test (true negatives) Sensitivity = a/(a + c) Specificity = d/(b + d) Predictive value (positive test result) = a/(a + b) Predictive value (negative test result) = d/(c + d)3
Specificity:
An index of the performance of a diagnostic test, calculated as the percentage of individuals without the disease who are classified as not having the disease, i.e. the conditional probability of a negative test result given that the disease is absent. A test is specific if it is positive for only a small percentage of those without the disease.
Slope:
used to describe the measurement of the steepness, incline, gradient, or grade of a straight line. A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on a line, or in other words, the ratio of the altitude change to the horizontal distance between any two points on the line. The slope of a line in the plane containing the x and y axes is generally represented by the letter m, and is defined as the change in the y coordinate divided by the corresponding change in the x coordinate, between two distinct points on the line. This is described by the following equation:

m = Δy / Δx

If y is a linear function of x, then the coefficient of x is the slope of the line created by plotting the function. Therefore, if the equation of the line is given in the form y = mx + b then m is the slope. This form of a line's equation is called the slope-intercept form, because b can be interpreted as the y-intercept of the line, the y-coordinate where the line intersects the y-axis.
standard deviation:
A measure of dispersion or variation. The most commonly used measure of the spread of a set of observations. Equal to the positive square root of the variance.
standard error (SE):
standardization:
A set of techniques used to remove as much as possible the effects of age or other confounding variables when comparing two or more populations. The common method uses weighted averaging of rates of age, sex or some other confounding variable(s) according of some specified distribution of these variables.
statistic:
A numerical characteristic of a sample. For example, the sample mean and sample variance.
statistical significance:
Statistical methods allow an estimate to be made of the probability of the observed or greater degree of association between independent and dependent variables under the null hypothesis. From this estimate, in a sample of given size, the statistical "significance" of a result can be stated. Usually the level of statistical significance is stated by the p value.
Statistical test:
a procedure that is intended to decide whether a hypothesis about the distribution of one or more populations or variables should be rejected or accepted. Statistical tests may be parametric or nonparametric.
stem and leaf plot:
A method of displaying data in which each observation is split into two parts labeled the 'stem' and the 'leaf'. A tally of the leaves corresponding to each stem ahs the shape of a histogram but also retains the actual observation values.
stepwise regression (selection models in regression):
A series of methods for selecting 'good' (although not necessarily the best) subsets of explanatory variables when using regression analysis. The three most commonly used of these methods are forward selection, backward elimination and a combination of both of these known asstepwise regression. The criterion used for assessing whether or not a variable should be added to an existing model in forward selection or removed from an existing model in backward elimination is, essentially, the change in the residual sum-of-squares produced by the inclusion or exclusion of the variable. Specifically, in forward selection, an 'F-statistic' known as the F-to-enter is calculated as:

and compared with a preset term; calculated Fs greater than the preset value lead to the variable under consideration being added to the model. (RSSm and RSSm+1 are the residual sums of squares when models with m and m + 1 explanatory variables have been fitted.) In backward selection a calculated F less than a corresponding F-to-remove leads to a variable being removed from the current model. In the stepwise procedure, variables are entered as with forward selection, but after each addition of a new variable, those variables currently in the model are considered for removal by the backward elimination process. In this way it is possible that variables included at some earlier stage might later be removed, because the presence of new variables has made their contribution to the regression model no longer important. It should be stressed that none of these automatic procedures for selecting variables is foolproof and they must be used with care.
student's t-distribution:
subjective probability (personal probability):
A radically different approach to allocating probabilities to events than, for example, the commonly used long-term relative frequency approach. In this approach, probability represents a degree of belief in a proposition, based on all the information. Two people with different information and different subjective ignorance may therefore assign different probabilities to the same proposition. They only constraint is that a single person's probabilities should not be consistent.
sums of squares:
a concept in inferential statistics and descriptive statistics. More properly, it is "the sum of the squared deviations". Mathematically, it is an unscaled, or unadjusted measure of variability. When scaled for the number of degrees of freedom, it estimates the variance, or spread of the observations about their mean value. The distance from any point in a collection of data, to the mean of the data, is the deviation. This can be written as Xi - Xbar, where Xiis the ith data point, and Xbar is the estimate of the mean. If all such deviations are squared, then summed, as in:

we have the "sum of squares" for these data.
Survey:
an investigation in which information is systematically collected but in which the experimental method is not used. A population survey may be conducted by face-to-face inquiry, self-completed questionnaires, telephone, postal service, or in some other way.
survival analysis:
a class of statistical procedures for estimating the survival function and for making inferences about the effects on it of treatments, prognostic factors, exposures, and other covariates.
symmetric distribution:
A probability distribution or frequency distribution that is symmetrical about some central value.
target population:
The collection of individuals, items, measurements, etc., about which it is required to make inferences. Often the population actually sampled differs from the target population and this may result in misleading conclusions being made. The target population requires a clear precise definition, and that should include the geographical area (country, region, town, etc.) if relevant, the age group and gender.
test statistic:
A statistic used to assess a particular hypothesis in relation to some population. The essential requirement of such a statistic is known a distribution when the null hypothesis is true.
transformation:
A change in the scale of measurement for some variable(s). Examples are the square room transformation and logarithm transformation.
t-test (T-distribution):
the t-distribution is the distribution of a quotient of independent random variables, the numerator of which is a standard normal variate and the denominator of which is the positive square root of the quotient of a chi-square distributed variate and its number of degrees of freedom. The t-test uses a statistic that, under the null hypothesis, has the t-distribution to test whether two means differ significantly, or to test linear regression or correlation coefficients.
two-sample t-test:
a hypothesis test for answering questions about the mean where the data are collected from two random samples of independent observations, each from an underlying normal distribution
two-tailed test:
a statistical significance test based on the assumption that the data are distributed in both directions from the central value(s).
two-way analysis of variance (factorial AOV):
The two-way analysis of variance is an extension to the one-way analysis of variance. There are two independent variables (hence the name two-way). The two independent variables in a two-way ANOVA are called factors. The idea is that there are two variables, or factors, which affect the dependent variable. Each factor will have two or more levels within it, and the degrees of freedom for each factor is one less than the number of levels. The same assumptions apply for one-way analysis of variance.
Type I error:
The error of rejecting a true null hypothesis; i.e. declaring a difference exists when it does not.
Type II error:
the error of failing to reject a false null hypothesis; i.e. declaring a difference does not exist when it in fact does.
unbiased (bias):
In general terms, deviations of results or inferences from the truth, or processes leading to such deviation. More specifically, the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated, or does not test the hypothesis to be tested. In estimated usually measured by the difference between a parameter estimate and its expected value. An estimator for which is said to be unbiased.
Variable:
Some characteristic that differs from subject to subject or from time to time. Any attribute, phenomenon, or event that can have different values.
variance:
In a population, the second moment about the mean. An unbiased estimator of the population value is provided by s^2 given by

where x1, x2, ...,xn are the n sample observations and is the sample mean.
weighted average:
An average of quantities to which have been attached a series of weights in order to make proper allowance for their relative importance. For example a weighted arithmetic mean of a set of observations x1, x2,..., xn, with weights w1, w2, ..., wn is given by
weighted sample:
a sample that is not strictly proportional to the distribution of classes in the universe population. A weighted sample has been adjusted to include larger proportions of some than other parts of the population because those parts accorded greater "weight" would otherwise not have sufficient numbers in the sample to lead to generalizable conclusions, or because they are considered to be more important, more interesting, more worthy of detailed study or other reasons.
z-score (standard scores):
Variable values transformed to zero mean and unit variance.
z-test:
z-transformation (Fisher's Z transformation):
The lengths of stay for six patients were 0, 0, 1, 2, 2, and 16 days. Which is (are) the best measure(s) to summarize these data?
(A) Mean
(B) Median
(C) Median and standard deviation
(D) Mean and standard deviation
(E) Median and range
(E) Median and range
Because the data are skewed and have an outlier, the median and range would best summarize the data.
An epidemiologist attempts to predict the weight of an elderly person from demispan. She randomly chooses 70 elderly subjects in a particular geographic area and records their weight and demispan measurements in the form of (x i , y i ) for i = 1,. . . ,70 Given that the value of the Pearson correlation coefficient is zero, what can be deduced?
(A) There is no relation between weight and demispan
(B) There is an almost perfect relationship between weight and demispan
(C) There could be some nonlinear relationship between weight and demispan
(D) There is a strong negative relationship between weight and demispan
(E) All pairs of values of weight and demispan are practically identical
(C) There could be some nonlinear relationship between weight and demispan

The justification is that Pearson correlation only look at linear relationships. The zero value means that there is no linear relation but there could be a non linear one.
For example if points are (-3,9),(-2,4),(-1,1), (0,0),(1,1),(2,4),(3,9) then the Pearson correlation is zero but Y=X squared.
Which of the following statistical tests is not considered a nonparametric test?

A) Kruskal-Wallis test
(B) Wilcoxon rank-sum test
(C) Tukey's test
(D) Mann-Whitney test
(C) Tukey's test

There are actually two Tukey's tests. One is a post hoc procedure for ANOVA, and the other is a test for additivity used in ANOVA. Neither is a nonparametric test.
A researcher is designing a new questionnaire to examine patient stress levels on a scale of 0 to 5. What type of outcome variable is being collected?

(A) Ratio
(B) Nominal
(C) Interval
(D) Ordinal
(E) Binary
(D) Ordinal
Data are at the ordinal level of measurement if they can be arranged in some order, but differences between data values either cannot be determined or are meaningless.
If the chances for a second event to occur stay the same, regardless of the outcome of a first event, then the two events are:
(A) Equally likely
(B) Independent
(C) Indeterminate
(D) Mutually exclusive
(B) Independent
Two events A and B are independent if the occurrence of one does not affect the probability of the occurrence of the other. If A and B are not independent, they are considered dependent.
In simple linear regression, what is a method of determining the slope and intercept of the best-fitting line?
(A) Least squares
(B) R-square
(C) Minimum error
(D) Least error
(E) Regression
(A) Least squares
Simple linear regression involves data on a dependent variable y and one or more independent variables (x 1 , x 2 , etc.) Regression analysis involves finding the "best" mathematical model (within some restricted class of models) to describe y as a function of the x's or to predict y from the x's. The regression line is the presentation of the regression equation. Residuals are used to determine the best-fitting line, and residuals are calculated by subtracting the observed minus expected values along the regression line. A straight line satisfies the least-squares property if the sum of the squares of the residuals is the smallest sum possible.
In a group of individuals, the probability of characteristic C is 0.4, and the probability of characteristic D is 0.2. The probability of their intersection is 0.10. Which of the following statements is correct?
(A) Characteristics C and D are independent
(B) Characteristics C and D are not independent
(C) Characteristics C and D are mutually exclusive
(D) Characteristics C and D are independent and mutually exclusive
(E) Not enough information is given about the relationship between the two variables to ascertain the answer
(B) Characteristics C and D are not independent
Two events A and B are independent if the occurrence of one does not affect the probability of the occurrence of the other. Because there is some probability of an intersection of events, these events are not independent.
If all of the numbers in a list increase by 2, then the standard deviation is:
(A) Increased by 2
(B) Increased by 4
(C) Unchanged
(D) Cannot be determined without the actual list of numbers
(C) Unchanged
Adding a constant number to a list of data does not change the standard deviation, but will change the list of numbers.
The sensitivity of a particular screening test for a disease is 95%, and the specificity is 90%. Which of the following statements is most correct?
(A) Of 100 people sampled from a population with the disease, the test will correctly detect 95 individuals as positive for the disease
(B) Of 100 people sampled from a population with the disease, the test will correctly detect 90 individuals
(C) If a person tests positive, the probability of having the disease is 0.95
(D) If a person has the disease, there is a 5% chance that the test will be negative
(E) If a person does not have the disease, there is a 5% chance that the test will be positive
(A) Of 100 people sampled from a population with the disease, the test will correctly detect 95 individuals as positive for the disease
(D) If a person has the disease, there is a 5% chance that the test will be negative
Sensitivity is the proportion of truly diseased people in the screened population who are identified as diseased by the screening test. It is a measure of the probability of correctly diagnosing a case or the probability that any given case will be identified by the test (e.g., true positives). Specificity is the proportion of truly non-diseased people who are so identified by the screening test. It is a measure of the probability of correctly identifying a non-diseased person with the screening test (e.g., true negatives).
Which is the most correct statement about a scatterplot?
(A) It is used to determine whether to perform a linear regression
(B) It shows the relationship between any two variables
(C) It is used to compare the means of two variables
(D) It is a useless plot when the relationship between two variables is nonlinear
(E) It is used to investigate the relationship between two continuous variables
(E) It is used to investigate the relationship between two continuous variables
A scatterplot diagram is a plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis. A scatterplot can be used to investigate the relationship between two continuous variables as well as to identify outliers within a data set
The Central Limit Theorem states that:
(A) The sample mean is unbiased
(B) The sample mean is approximately normal
(C) The parent population of the sample distribution is normally distributed
(D) The sample standard deviation is approximately normal
(E) Both statements (A) and (C) can be deduced from the Central Limit Theorem
(B) The sample mean is approximately normal
The Central Limit Theorem states that if the sample size is large enough, the distribution of the sample means can be approximated by a normal distribution, even if the original population is not normally distributed. In other words, the distribution of the sample means approaches a normal distribution as the sample size increases.
Assume that a researcher has measured weight in a sample of 100 overweight adults before and after a diet and exercise program conducted at the local health department's weekly Eat Healthy-Be Fit community program. To determine whether the mean weight decreased six weeks after the exercise program compared to the initial baseline measures, the researcher should:
(A) Compute the correlation coefficient, r, and determine the association between being overweight and the community program
(B) Conduct a t-test for independent samples
(C) Conduct a t-test for dependent samples
(D) Conduct a chi-square test for association
(E) Not estimate the decrease because there was no control group for the program
(C) Conduct a t-test for dependent samples
A t-test is a hypothesis test to compare population means and proportions. In this case, the sample is dependent because the tests are performed on the same individuals in the sample.
Now assume that the researcher has measured weight in a sample of 200 overweight adults who have been randomized to receive either the diet and exercise program (cited in the previous question) or no program (i.e., to serve as a control group). All subjects are weighed at baseline and again six weeks later. Choosing from the following analysis options, which is the most appropriate way to determine whether the diet program had an impact on weight loss?
(A) Conduct an independent t-test on the six-week follow-up weight measures between the diet and exercise group and the control group
(B) Conduct a paired t-test (baseline and six-week weight) in the exercise program group and then again in the control group and compare the two P-values to determine which group had a statistically significant change
(C) Conduct a survival analysis and compute the hazard ratio to determine whether the community program is protective against weight gain
(D) Conduct an analysis of covariance using the baseline to six-week weight change scores as the dependent variable and the diet and exercise program versus control group as the independent variable
(E) Conduct an analysis of covariance using the weight at six weeks as the dependent variable, the diet and exercise program versus control group as the independent variable, and the baseline weight as a covariate
(E) Conduct an analysis of covariance using the weight at six weeks as the dependent variable, the diet and exercise program versus control group as the independent variable, and the baseline weight as a covariate
Analysis of covariance (ANCOVA) is a technique that involves a multiple regression model in which the study factors of interest are all treated as nominal variables. The variables being controlled for in the model (the covariates) may be measurements of any scale. We want to use the baseline weight as a covariate to adjust for, or control for, any confounding of baseline weight.
If a population has a standard deviation σ, then the standard deviation of the mean of 100 randomly selected items from this population is:
(A) σ
(B) 100 σ
(C) σ /10
(D) σ /100
(E) 0.1
(C) σ/10
The standard deviation of the mean is given by: σ/√n, here n = 100.
Select the most correct statement concerning relative risk and odds ratios.
(A) A relative risk of 10 has the same strength of association as a relative risk of 0.1.
(B) If the confidence interval for the relative risk does not contain 0, there is an association.
(C) It is possible to calculate a relative risk when data are from a case-control study.
(D) At least one variable should be normally distributed to calculate a relative risk.
(E) Coefficients from logistic regression analysis yield relative risk.
(A) A relative risk of 10 has the same strength of association as a relative risk of 0.1.
The risk ratio measures the increased risk for developing a disease after being exposed to a risk factor compared to not being exposed to the risk factor. It is given by RR = risk for the exposed/risk for the unexposed, and it is often referred to as the relative risk, which is a proportion.
Assume that a linear regression analysis is performed. Which of the following results would justify trying a different method of analysis for the data?
(A) The constant is not significant
(B) The slope coefficient = 0.001
(C) The r 2 = 0.99
(D) The r 2 = 0.001
(E) Plotting the residuals against the dependent variable gives a random cloud of points
(D) The r 2 = 0.001.
The r 2 value indicates the amount of variance in the criterion variable Y that is accounted for by the variation in the predictor variable X. In the linear regression analysis, the set of predictor variables x 1 , x 2 , ... is used to explain variability of the criterion variable Y. The r 2 value should fall between 0 and 1, with a value closer to 1 explaining more of the variability of the model.
A type I error is defined as:
(A) The probability of rejecting the null hypothesis when the null hypothesis is true
(B) The probability of rejecting the alternative hypothesis when the null hypothesis is true
(C) The probability of rejecting the null hypothesis when the alternative hypothesis is true
(D) The probability of rejecting the alternative hypothesis when the alternative hypothesis is true
(A) The probability of rejecting the null hypothesis when the null hypothesis is true
This is simply the definition of type I error.
References: Pagano and Gauvreau, p. 234 | Rosner, p. 228 | Moye, p. xxi
The idea that z has a standard normal distribution when is justified by:
(A) Chebyshev's Inequality
(B) Central Limit Theorem
(C) Bonferroni Inequality
(D) Box-Cox Transformation
(B) Central Limit Theorem
This is a standard definition. The Central Limit Theorem states that if the sum of the variables has a finite variance, then it will be approximately normally distributed.
References: Rosner, p. 174 | Pagano, p. 197 | van Belle et al., p. 83
An investigator measures a continuous variable on four independent, disjointed groups of people and would like to know whether the means of each group differ. Which statistical test should the investigator use to answer this question?
(A) Logistic regression
(B) Cox regression
(C) Chi-squared test
(D) Analysis of variance
(D) Analysis of variance
Analysis of variance is the only choice that is appropriate for continuous outcome variables. Logistic regression is used for binary outcomes; Cox regression is used for survival outcomes; and chi-squared tests are used for pairs of categorical variables.
References: Rosner, Chapter 12 | Pagano, Chapter 12 | van Belle et al., Chapter 10
An investigator would like to assess the association between two categorical variables, but a cross-tabulation of the variables reveals that some cells contain counts equal to zero. Which statistical test would be most appropriate in this situation?
(A) Scheffe's test
(B) Fisher's exact test
(C) McNemar's test
(D) Student t-test
(B) Fisher's exact test
Fisher's exact test is often used to test for association between two categorical variables when there are small cell counts (e.g., expected cell counts are less than or equal to 5) in a table.
The proportion of people with a disease who are correctly identified by a screening test as having the disease is called:
(A) Sensitivity
(B) Specificity
(C) Positive predictive value
(D) Negative predictive value
(A) Sensitivity
Sensitivity is defined as the proportion of truly diseased people in the screened population who are identified as having the disease by the screening test (true positives).
The probability that an event occurs if some condition is met is called a:
(A) Joint probability
(B) Simple probability
(C) Conditional probability
(D) Marginal probability
(C) Conditional probability
A conditional probability is the probability of an event assuming that another event occurred.
Which of the following does not describe a measure of the variability of a continuous variable?
(A) Standard deviation
(B) Interquartile range
(C) Confidence interval
(D) Kurtosis
(D) Kurtosis
In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations versus frequent modestly sized deviations.
The z-score measures the relative position of one observation relative to others in a data set. What components are needed to compute a z-score?
(A) Median and range
(B) Mean and range
(C) Mean and standard deviation
(D) Median and standard deviation
(C) Mean and standard deviation
A z-score measures the distance between an observation and the mean, measured in units of standard deviation.
In the construction of Box-plots, the upper and lower fences are used to detect which of the following summary?
(A) Outliers
(B) Maximum number
(C) Minimum number
(D) Quartiles
(A) Outliers
The lower fence is defined as: Q1 - 1.5(IQR). The upper fence is defined as: Q3 + 1.5(IQR) where Q1 and Q3 are the lower and upper quartiles and IQR is the interquartile range. The upper and lower fences are boundaries to detect any measurements beyond those fences which are called outliers.
A 911 emergency operator is flooded with calls during the daily rush hour period. What is the distribution that best describes this data set?
(A) Normal
(B) Binomial
(C) Hypergeometric
(D) Poisson
(D) Poisson
The Poisson distribution is used to model data that represent the number of occurrences of a specified event in a given unit of time or space.
The ability to reject the null hypothesis when the null is in fact false is called?
(A) Type I error
(B) Type II error
(C) Power
(D) Level of significance
(C) Power
The power of a statistical test is defined as
A clinical experiment with four treatment groups was analyzed using an ANOVA and a significant difference in the population means is found. Which of the following is a natural next step?
(A) Tukey's or a similar method of pairwise comparison
(B) Multiple t-test comparison
(C) Check model assumptions
(D) Power analysis
(A) Tukey's or a similar method of pairwise comparison
Once a significant difference among the population means is found after performing an ANOVA, we next examine pairwise comparisons to further identify the nature of the differences while adjusting for the multiple comparisons via Tukey's method or a similar method.
A doctor would like to estimate a patient's weight based on their age and gender. Age and gender are known as?
(A) Response variables
(B) Dependent variables
(C) Outcome variables
(D) Independent variables
(D) Independent variables
In regression, the variables used to predict the response variable are independent predictor variables.
Suppose the least squares line resulting from a simple linear regression analysis between weight (y in pounds) and height (in inches) is as follows: y-hat = 135+4x . The interpretation of this line is: If the height is increased by 1 inch on average the weight is expected to:
(A) Increase by 4 pounds
(B) Decrease by 4 pounds
(C) Increase by 1 pound
(D) Decrease by 1 pound
(A) Increase by 4 pounds
The basic model in simple linear regression is given by:

and the slope β1 can be interpreted as the change in the mean of y for a unit change in x. Here the slope is 4 thus we have an increase in weight by 4 pounds with each unit change in height.
One can describe the F-distribution as a sampling distribution of the ratio of which of the following:
(A) Two normal population means
(B) Two sample variances provided that the samples sizes are large
(C) Two normal population variances
(D) Two sample variances provided that the samples are independently drawn from two normal populations with equal variances
(D) Two sample variances provided that the samples are independently drawn from two normal populations with equal variances
This answer is just the definition of the F statistic which is typically used for comparing two population variances. If the parent populations are independently and normally distributed, then the F statistic is calculated by (F=var1/var2 or F=var2/var1) where the numerator is the larger of the two variances. This ratio has F-distribution with the degrees of freedom n 1 -1, n 2 -1 where n 1 and n 2 are the sample sizes.
The assumption of a t-test for the difference between the means of two independent populations is that the respective:
(A) Sample variances are equal
(B) Sample sizes are equal
(C) Populations are approximately normal
(D) Sample variances are equal, sample sizes are equal and populations are approximately normal
(C) Populations are approximately normal
One of the assumptions for the t-test for two independent populations is normality.
Reference: Rosner p.304.
Suppose a researcher calculates a confidence interval for a population mean based on a sample size of 9. Which of the following assumptions have been made?
(A) The sampling distribution of z is normal
(B) The sampled population is approximately normal
(C) The population standard deviation is known
(D) No assumptions have been made
(B) The sampled population is approximately normal
In general, the assumption for a 95% confidence interval is that the mean is approximately normal since the central limit theorem will hold. However with a sample of 9 the central limit theorem will not apply and so the sampled population needs to be approximately normal.
What does the abbreviation IQR stand for?
(A) Independent query re-sampling
(B) Interrelation quantity rescaled
(C) Interquartile range
(D) Integrated quantal relation
(E) Inter-quantile relationship
(C) Interquartile range
The inter-quartile range is calculated by subtracting the 25th percentile of the data from the 75th percentile of the data and is used to measure variability that is not as influenced by extreme values.
(C) Two independent samples t test
The outcome of interest is age, which is a continuous variable, and interest lies in comparing mean ages between two independent groups (participants assigned to the placebo as compared to participants assigned to the experimental group). The two independent samples t test is used to compare means of a continuous variable between two independent groups.
The following test would be used to compare mean ages between groups:
(A) Chi-square goodness of fit test
(B) Chi-square test of independence
(C) Two independent samples t test
(D) Analysis of variance
(B) Chi-square test of independence
The outcome of interest is educational level, measured here as a 3-level ordinal variable, and interest lies in comparing the proportions of participants in each educational category between groups (participants assigned to the placebo as compared to participants assigned to the experimental group). The data can be organized into a 3x2 table for analysis.
The following test would be used to compare educational levels between groups:
(A) Chi-square goodness of fit test
(B) Chi-square test of independence
(C) Two independent samples t test
(D) Analysis of variance
(B) Chi-square test of independence
The outcome of interest is prevalent diabetes status, a dichotomous or indicator variable, and interest lies in comparing the proportions of participants with diabestes between groups (participants assigned to the placebo as compared to participants assigned to the experimental group). The data can be organized into a 2x2 table for analysis.
The following test would be used to compare prevalent diabetes between groups:
(A) Chi-square goodness of fit test
(B) Chi-square test of independence
(C) Two independent samples t test
(D) Analysis of variance
A longitudinal cohort study is conducted to assess risk factors for diabetes. Participants free of diabetes at the start of the study and followed for 10 years for development of diabetes.

What test would be used to assess whether age is related to incident diabetes?
(A) Chi-square test of independence
(B) Two independent samples t test
(C) Analysis of variance
(D) Correlation analysis
(B) Two independent samples t test
The goal of the analysis is to compare mean ages (age is a continuous variable) between two independent groups (persons who develop diabetes over the 10 year follow-up as compared to those who do not).
A longitudinal cohort study is conducted to assess risk factors for diabetes. Participants free of diabetes at the start of the study and followed for 10 years for development of diabetes.

What test would be used to assess whether BMI (measured as normal weight, overweight and obese) is related to incident diabetes?
(A) Chi-square test of independence
(B) Two independent samples t test
(C) Analysis of variance
(D) Correlation analysis
(A) Chi-square test of independence
The outcome of interest is incident diabetes, a dichotomous or indicator variable, and the predictor is BMI. BMI is generally measured as a continuous variable but here participants are classified into one of three ordinal categories. The goal of the analysis is to compare the proportions of participants who develop diabetes among the BMI categories. The data can be organized into a 3x2 table for analysis.
A longitudinal cohort study is conducted to assess risk factors for diabetes. Participants free of diabetes at the start of the study and followed for 10 years for development of diabetes.

What test would be used to assess whether sex is related to incident diabetes?
(A) Chi-square test of independence
(B) Two independent samples t test
(C) Analysis of variance
(D) Correlation analysis
(A) Chi-square test of independence
In this analysis the outcome is dichotomous (incident diabetes - yes or no) as is the predictor (sex). The data can be organized in a 2x2 table and the chi-square test of independence is used to assess whether there is a difference in the proportions of men and women who develop diabetes.
A longitudinal cohort study is conducted to assess risk factors for diabetes. Participants free of diabetes at the start of the study and followed for 10 years for development of diabetes.

Consider the study described above and suppose that the outcome is change in blood glucose level over 10 years, what test would be used to assess whether age is related to change in blood glucose level?
(A) Chi-square test of independence
(B) Two independent samples t test
(C) Analysis of variance
(D) Correlation analysis
(D) Correlation analysis
The goal of this analysis is to assess the relationship between two continuous variables - age and change in blood glucose level. Correlation analysis is one technique to quantify the direction (positive or negative) and strength of the association, if one exists.
A longitudinal cohort study is conducted to assess risk factors for diabetes. Participants free of diabetes at the start of the study and followed for 10 years for development of diabetes.

Consider the study described above and suppose that the outcome is change in blood glucose level over 10 years, what test would be used to assess whether BMI (measured as normal weight, overweight and obese) is related to change in blood glucose level?
(A) Chi-square test of independence
(B) Two independent samples t test
(C) Analysis of variance
(D) Correlation analysis
(C) Analysis of variance
The outcome of interest is change in blood glucose level, a continuous variable, and we wish to test of there is a difference in mean changes in blood glucose levels among three independent groups (persons of normal weight, overweight and obese). Analysis of variance is used to test for a difference in more than two independent means.
A longitudinal cohort study is conducted to assess risk factors for diabetes. Participants free of diabetes at the start of the study and followed for 10 years for development of diabetes.

Consider the study described above and suppose that the outcome is change in blood glucose level over 10 years, what test would be used to assess whether sex is related to change in blood glucose level?
(A) Chi-square test of independence
(B) Two independent samples t test
(C) Analysis of variance
(D) Correlation analysis
(B) Two independent samples t test
The outcome of interest is change in blood glucose level, a continuous variable, and we wish to test of there is a difference in mean changes in blood glucose levels between men and women (two independent groups).
A clinical trial is conducted to assess the efficacy of a new drug for increasing HDL cholesterol. A 95% confidence interval for the difference in increase in HDL cholesterol levels over 12 weeks between patients assigned to the new drug or to placebo is (-2.45, 5.97). Which of the following statements is most correct?
(A) The drug is effective in increasing HDL cholesterol
(B) There is no significant difference in HDL cholesterol levels measured at 12 weeks
(C) There is no significant difference in increases in HDL cholesterol levels measured over 12 weeks
(D) The mean increase in HDL cholesterol is substantially higher in patients receiving the new drug
(C) There is no significant difference in increases in HDL cholesterol levels measured over 12 weeks
The outcome of interest is the increase (or change) in HDL over 12 weeks. Because the confidence interval for the difference (between patients assigned to the new drug versus placebo) in increase in HDL includes zero (the null value), we do not have evidence of a statistically significant difference in increase in HDL between groups.
Suppose we are counting the number of patients out of 100 that have a particular disease XYZ with the probability of any one individual having the disease is 0.40.
Assuming all the patients are independent and the chance of disease is the same for each patient, what is the expected number of patients with the disease XYZ?
(A) 50
(B) 40
(C) 30
(D) 20
(B) 40
Here we have a binomial experiment with 100 identical trials and the probability of success on a single trial equal 0.40. Therefore the expected number of patients with the disease XYZ (success) is 100x.40=40.
(C) Smaller within group variability and larger between group variability in scenario 1.
In scenario 2 we have more variability within the groups thus causing the two groups to mix such that it is more difficult to detect the identical difference in the means. This basic example can be formalized by the analysis of variance where the total variability in an experimental situation is partition into the sums of square treatment and sums of square error.
Given the following diagram below with Group 1 (triangle) and Group 2 (diamond)

We have both scenarios produce identical pairs of group means. What factors allow the easier detection among the two group means in scenario 1 and scenario 2?
(A) Larger within group variability in scenario 1
(B) Smaller with group variability in scenario 2
(C) Smaller within group variability and larger between group variability in scenario 1
(D) Smaller within group variability and larger between group variability in scenario 2
(D) MST/MSE
The test statistic for a One-way ANOVA is given by F=MST/MSE

We reject the null hypothesis for large values of F, using a right-tailed statistical test. When the null is true, this test statistic has an F-distribution with df1= k-1 and df2= n-k.
Based on the ANOVA table, how is the F-test computed based on the components in the ANOVA table?

(A) MST/(k - 1)
(B) MSE/(n - 1)
(C) SST/SSE
(D) MST/MSE
Let S denote the sample size and P denote the population size. Which of the following statement is most correct?
(A) S can be larger or smaller than P
(B) S is always equal to P
(C) S is always smaller than or equal to P
(D) S can be larger than P
(C) S is always smaller than or equal to P
The sample size is a subset of the population size thus it is always smaller then or equal to the population size.
Making inferences regarding certain characteristics of the population based on the sample data is referred to as:
(A) Random sample
(B) Statistical inference
(C) Descriptive statistics
(D) Histograms and bar charts
(B) Statistical inference
By definition statistical inference is the use of statistics to make inferences concerning some unknown aspect of a population.
(D) It is always equal to the number of individuals in the given data set
A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping categories; therefore, the sum of the frequencies among all categories will always equal the number of elements (individuals) in the data set.
Which of the following statement below is correct regarding the sum of frequencies for all categories in the above summary table?
(A) It is always equal to the number of categories in the given data set
(B) It is always equal to one
(C) It is always equal to a number between 0 and 1
(D) It is always equal to the number of individuals in the given data set
(B) 1
A cumulative relative frequency distribution is a tabular summary of a set of data showing the relative frequency of items less than or equal to the upper class limit of each class. The relative frequency is defined as the fraction or proportion of the total number of items. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row. Thus the last entry of the cumulative relative frequency column is one, indicating that one hundred percent of the data has been accumulated.
The last category in a cumulative relative frequency distribution will have a cumulative relative frequency equal to?
(A) 0
(B) 1
(C) The total number of individuals in the given data set
(D) The total number of categories in the given data set
The sum of the deviations of the individual observations from their mean is?
(A) 1
(B) 0
(C) Less than 0
(D) Greater than 1
(B) 0
The sum of the deviation of the individual data elements from their mean is always equal to zero. This is why we use the sum of squared deviations.
Example: 1, 3, 5, 2, 4 with mean &
The probability distribution for all possible values of a given sample statistic is called?
(A) Parameter
(B) Random sample
(C) Sampling distribution
(D) Sample space
C) Sampling distribution
The sampling distribution of a statistic is the distribution of values of the statistic over all possible samples of size n that could have been selected from the reference population.
Reference: Rosner Chapter 6
The goal an ANOVA statistical analysis is to determine whether or not:
(A) The means of two samples are different
(B) The means of more than two samples are different
(C) The means of two or more populations are different
(D) None of the above
(C) The means of two or more populations are different
One of the simplest experimental design is the completely randomized design in which random samples are selected independently from each of g populations. An analysis of variance is used to test if the g population means are the same, or is at least one mean different from the others.
Which one of the following item does not represent the value of biostatistics in the assessment of health problems of the population and determine their extent?
(A) Finding patterns in the collected data
(B) Summarizing and presenting the information to best describe the target population
(C) Deciding what information to gather to help identify the health problems
(D) Accounting for possible inaccuracies in responses and measurements
(D) Accounting for possible inaccuracies in responses and measurements
A biostatistician's responsibility within a collaborating research team is to aid in the research design, analysis and interpretation of the data. (A), (B), and (C) all describe tasks that would fall within a biostatistician's expertise area. A biostatistician would not be able to account for possible inaccuracies in the data. This is because a biostatistician only has access to the information contained within the data at hand and does not have information concerning the underlying reasoning for inaccuracies in the data.
If the biostatistician uses sampling and estimation methods to monitor how well regulators are complying with policy, determine possible interventions and/or preventive measures of health problems, and set regulations, what function(s) is he/she addressing?
(A) Assurance
(B) Assurance and policy development
(C) Policy development and assessment
(D) Assessment
(B) Assurance and policy development
(B) 4 and 15
In the ANOVA table, each source of variation when divided by its appropriate degrees of freedom provides an estiamte of the variation in the experiment. The total sums of square (Total SS) involves n squared observations, its degrees of freedom is (n - 1). Also, the sum of squares for the treatments involves k (the number of populations) squared observatrions, thus its degrees of freedom in (k - 1).
Finally, the sum of squares for error is a direct extension of the pooled estimate and its degrees of freedom is (n - k). We note that the two sources of variation and their respective degrees of freedom are combined to obtain the meas squares as MS = SS/df. Therefore MS(treat) = SS(treat) / df(treat) or df(treat) = SS(treat) / MS(treat) = 100/25 = 4. Similarly, df(error) = SS(error) / MS(error) = 60/4 = 15.
The numerator and denominator degrees of freedom (shown with * in the above table) are:
(A) 3 and 16
(B) 4 and 15
(C) 16 and 3
(D) 15 and 4
When considering a contingency table test with 6 rows and 6 columns, this implies that the number of degrees of freedom for the test must be which of the following:
(A) 25
(B) 6
(C) 12
(D) 36
(A) 25
The degree of freedom in a contingency table is given by (r-1)(c-1) where r and c are the number columns and rows. Thus we have (6-1)(6-1) = 25.
When performing a nonparametric Wilcoxon sum rank test, the first step is to combine the data values in the two samples and assign a rank of '1' to which of the following:
(A) The largest observation
(B) The middle observation
(C) The smallest observation
(D) The most frequently occurring observation
(C) The smallest observation
The ranking procedure for the Wilcoxon rank-sum test is to first combine the data from the two groups, and order the values from the lowest to the highest.
Reference: Rosner Chaper 9 for more details
Tchebysheff's Theorem states that given only k ≥ 1 and regardless of the shape of a population's frequency distribution, the proportion of observations falling within k standard deviations of the mean is:
(A) At least 1 - (1/k2)
(B) At most 1 - (1/k2)
(C) At least 1 - (1/k)2
(D) At most 1 - (1/k)2
(A) At least 1 - (1/k2)
References: Tchebysheff's Theorem (Mendenhall, Beaver, and Beaver)