5_Biostatistics/Other Non Pharm Material
Terms in this set (157)
A variable with observed values that may be considered outcomes of an experiment and whose values cannot be anticipated with certainty before the experiment is conducted
2 types: Discrete or continuous
1. Can take only a limited number of values within a given range
4. COMMON ERROR
Is random and discrete
Ranked in a specific order but with no consistent level of magnitude of difference between ranks (e.g., NYHA [New York Heart Association] functional class describes the functional status of
patients with heart failure, and subjects are classi ed in increasing order of symptoms: I, II, III, IV; Likert-type scales)
Is random and discrete
Classified into groups in an unordered manner and with no indication of relative severity
(e.g., male/female sex, mortality [dead or alive], disease presence [yes or no], race, marital status)
Is random and discrete
Measure of central tendency - In most cases, means and standard deviations
(SDs) should not be reported with ordinal data. What is a common incorrect use of means and SDs to show ordinal data?
Is a random variable, opposite of discrete variable
Sometimes Referred to as Counting Variables
1. Continuous variables can take on any value within a given range.
Is random and continuous
Data are ranked in a specific order with a consistent change in magnitude between units; the zero point is arbitrary (e.g., degrees Fahrenheit)
Is random and continuous
Like "interval" but with an absolute zero (e.g., degrees Kelvin, heart rate, blood pressure, time, distance)
Used to summarize and describe data that are collected or generated in research studies. This is done both visually and numerically.
1. Visual methods: frequency distribution, histogram, scatterplot
2. Numerical method, measures of central tendency: Arithmetic mean (i.e. average), median, and mode
3. Numerical method of describing data, data spread or variability: standard deviation, range, percentiles
Is a type of descriptive statistics
Frequency distribution, histogram, scatterplot
Descriptive Statistics, Numerical method (central tendency): Arithmetic mean (i.e. average)
i. Sum of all values divided by the total number of values
ii. Should generally be used only for continuous and normally distributed data iii. Very sensitive to outliers and tend toward the tail, which has the outliers
iv. Most commonly used and most understood measure of central tendency
v. Geometric mean
Descriptive Statistics, Numerical method (central tendency): Median
i. Midpoint of the values when placed in order from highest to lowest. Half of the observations
are above and below. When there are an even number of observations the mean of the two
ii. Also called 50th percentile
iii. Can be used for ordinal or continuous data (especially good for skewed populations)
iv. Insensitive to outliers
Descriptive Statistics, Numerical method (central tendency): Mode
i. Most common value in a distribution
ii. Can be used for nominal, ordinal, or continuous data
iii. Sometimes, there may be more than one mode (e.g., bimodal, trimodal).
iv. Does not help describe meaningful distributions with a large range of values, each of which
Descriptive Statistics, Numerical method (measures of data spread or variability): Standard Deviation
i. Measure of the variability about the mean; most common measure used to describe
the spread of data
ii. Square root of the variance (average squared difference of each observation from the mean);
returns variance back to original units (nonsquared)
iii. Appropriately applied only to continuous data that are normally or near normally distributed
or that can be transformed to be normally distributed
iv. By the empirical rule, 68% of the sample values are found within ±1 SD, 95% are found
within ±2 SD, and 99% are found within ±3 SD.
v. The coef cient of variation relates the mean and the SD (SD/mean × 100%).
Numerical method (measures of data spread or variability): Range
i. Difference between the smallest and largest value in a data set does not give a tremendous
amount of information by itself.
ii. Easy to compute (simple subtraction)
iii. Size of range is very sensitive to outliers.
iv. Often reported as the actual values rather than the difference between the two extreme values
Descriptive Statistics, Numerical method (measures of data spread or variability): Percentiles
i. The point (value) in a distribution in which a value is larger than some percentage of the
other values in the sample. Can be calculated by ranking all data in a data set
ii. The 75th percentile lies at a point at which 75% of the other values are smaller.
iii. Does not assume the population has a normal distribution (or any other distribution)
iv. The interquartile range (IQR) is an example of the use of percentiles to describe the middle
50% values. The IQR encompasses the 25th-75th percentile.
1. Conclusions or generalizations made about a population (large group) from the study of a sample
of that population
2. Choosing and evaluating statistical methods depend, in part, on the type of data used.
3. An educated statement about an unknown population is commonly referred to in statistics as an inference.
4. Statistical inference can be made by estimation or hypothesis testing.
1. Binomial distribution
2. Poisson distribution
Discrete distribution: Binomial distribution
Describes the behavior of a count variable X if the following conditions apply:
1: The number of observations n is fixed.
2: Each observation is independent.
3: Each observation represents one of two outcomes ("success" or "failure").
4: The probability of "success" p is the same for each outcome.
Discrete distribution: Poisson distribution
(1) the event is something that can be counted in whole numbers; (2) occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another; (3) the average frequency of occurrence for the time period in question is known; and (4) it is possible to count how many events have occurred, such as the number of times a firefly lights up in my garden in a given 5 seconds, some evening, but meaningless to ask how many such events have not occurred. This last point sums up the contrast with the Binomial situation, where the probability of each of two mutually exclusive events (p and q) is known.
Normal (Gaussian) Distribution
1. Most common model for population distributions
2. Symmetric or "bell-shaped" frequency distribution
3. Landmarks for continuous, normally distributed data
a. μ: Population mean is equal to zero.
b. σ: Population SD is equal to 1.
c. x and s represent the sample mean and SD.
4. When measuring a random variable in a large enough sample of any population, some values will occur more often than will others.
5. A visual check of a distribution can help determine whether it is normally distributed (whether it appears symmetric and bell shaped). Need the data to perform these checks
a. Frequency distribution and histograms (visually look at the data; you should do this anyway)
b. Median and mean will be about equal for normally distributed data (most practical and easiest to use).
c. Formal test: Kolmogorov-Smirnov test
d. More challenging to evaluate this when we do not have access to the data (when we are reading a paper), because most papers do not present all data or both the mean and median
6. The parameters mean and SD de ne a normally distributed population.
7. Probability: The likelihood that any one event will occur given all the possible outcomes
8. Estimation and sampling variability
a. One method that can be used to make an inference about a population parameter
b. Separate samples (even of the same size) from a single population will give slightly
c. The distribution of means from random samples approximates a normal distribution.
i. The mean of this "distribution of means" is equal to the unknown population mean, μ. ii. The SD of the means is estimated by the standard error of the mean (SEM).
iii. Like any normal distribution, 95% of the sample means lie within ±2 SEM of the
d. The distribution of means from these random samples is about normal regardless of the
underlying population distribution (central limit theorem). You will get slightly different mean and SD values each time you repeat this experiment.
e. The SEM is estimated with a single sample by dividing the SD by the square root of the sample
size (n). The SEM quantifies uncertainty in the estimate of the mean, not variability in the sample. Important for hypothesis testing and 95% con dence interval (CI) estimation
f. Why is all of this information about the difference between the SEM and SD worth knowing?
i. Calculation of CIs. (95% CI is approximately the mean ± 2 times the SEM.)
ii. Hypothesis testing
iii. Deception (e.g., makes results look less "variable," especially when used in graphic format)
9. Recall the previous example about high-density lipoprotein cholesterol (HDL-C) and green tea. From the calculated values in section III, do these data appear to be normally distributed?
Commonly Reported as a Way to Estimate a Population Parameter
CIs Can also be Used for Any Sample Estimate. Estimates derived from categorical data such as risk, risk differences, and risk ratios are often presented with the CI and will be discussed below.
95% CIs are the most commonly reported CIs. In repeated samples, 95% of all CIs include true population value (i.e., the likelihood/ confidence [or probability] that the population value is contained within the interval). In some cases, 90% or 99% CIs are reported. Why are 95% CIs most often reported?
a. Assume a baseline birth weight in a group (n=51) with a mean ± SD of 1.18 ± 0.4 kg.
b. 95% CI is about equal to the mean ± 1.96 × SEM (or 2 × SEM). In reality, it depends on the distribution being used and is a bit more complicated.
c. What is the 95% CI? (1.07, 1.29), meaning there is 95% certainty that the true mean of the entire
population studied will be between 1.07 and 1.29 kg.
d. What is the 90% CI? The 90% CI is calculated to be (1.09, 1.27). Of note, the 95% CI will always be wider than the 90% CI for any given sample. Therefore, the wider the CI, the more likely it is to encompass the true population mean. In general, the "more con dent" we wish to be.
The differences between the SD, SEM, and CIs should be noted when interpreting the literature because they are often used interchangeably. Although it is common for CIs to be confused with SDs; the information each provides is quite different and has to be assessed correctly.
CIs Instead of Hypothesis Testing
1. Hypothesis testing and calculation of p-values tell us (ideally) whether there is, or is not, a statistically significant difference between groups, but they do not tell us anything about the
magnitude of the difference.
2. CIs help us determine the importance of a finding(s), which we can apply to a situation.
3. CIs give us an idea of the magnitude of the difference between groups as well as the statistical significance.
4. CIs are a "range" of data, together with a point estimate of the difference.
5. Wide CIs
a. Many results are possible, either larger or smaller than the point estimate provided by the study.
b. All values contained in the CI are statistically plausible.
6. If the estimate is the difference between two continuous variables: A CI that includes zero (no difference between two variables) can be interpreted as not statistically significant (a p-value of 0.05 or greater). There is no need to show both the 95% CI and the p-value.
7. The interpretation of CIs for odds ratios and relative risks is somewhat different. In that case, a value of 1 indicates no difference in risk, and if the CI includes 1, there is no statistical difference. (See the discussion of case-control/cohort in other sections for how to interpret CIs for odds ratios and relative risks.)
Hypothesis Testing: Null Hypothesis (H0)
No difference between groups being compared (treatment A
equals treatment B)
Hypothesis Testing: Alternative Hypothesis (HA)
Opposite of null hypothesis; states that there is a difference
(treatment A does not equal treatment B)
Null v. alternative hypothesis
The structure or the manner in which the hypothesis is written dictates which statistical test is used. Two-sample t-test: H0: Mean 1 = Mean 2
Used to assist in determining whether any observed differences between groups can be explained by chance
Tests for statistical signi cance (hypothesis testing) determine whether the data are consistent with H0 (no difference)
The results of the "hypothesis testing" will indicate whether enough "evidence" exists for H0 to be rejected.
a. If H0 is "rejected": Statistically significant difference between groups (unlikely attributable to chance)
b. If H0 is "not rejected": No statistically significant difference between groups (any "apparent" differences may be attributable to chance). Note that we are not concluding that the treatments are equal.
Types of Hypothesis Testing. These are situations in which two groups are being compared. There are numerous other examples of situations these procedures could be applied to.
Hypothesis Testing: Nondirectional-Difference
Are the means different?
H0: Mean1 = Mean2
HA: Mean1 ≠ Mean2
or H0: Mean1 − Mean2 = 0 HA: Mean1 − Mean2 ≠ 0
Traditional two-sided t-test
Hypothesis Testing: Nondirectional-Equivalence
Are the means practically equivalent?
H0: Mean1 − Mean2 ≥ Δ HA: Mean1 − Mean2 < Δ
Two one-sided t-test (TOST) procedure Con dence intervals
Hypothesis Testing: Directional-Superiority
Is mean 1 > mean 2? (or some other similarly worded question)
H0: Mean1 ≤ Mean2
HA: Mean1 > Mean2
H0: Mean1 − Mean2 ≤ 0 HA: Mean1 − Mean2 > 0
Traditional one-sided t-test
Con dence intervals
Hypothesis Testing: Directional-Noninferiority
Is mean 1 no more than a certain amount lower than mean 2?
H0: Mean1 − Mean2 ≥ Δ HA: Mean1 − Mean2 < Δ
To Determine What Is Suf cient Evidence to Reject H0: Set the a priori significance level (α) and generate the decision rule.
1. Developed after the research question has been stated in hypothesis form
2. Used to determine the level of acceptable error caused by a false positive (also known as level
of signi cance)
a. Convention: A priori α is usually 0.05.
b. Critical value is calculated, capturing how extreme the sample data must be to reject H0.
Perform the Experiment and Estimate the Test Statistic.
1. A test statistic is calculated from the observed data in the study, which is compared with the critical value.
2. Depending on this test statistic's value, H0 is "not-rejected" (often referred to as fail to reject) or rejected.
3. In general, the test statistic and critical value are not presented in the literature; instead, p-values are generally reported and compared with a priori α values to assess statistical significance.
p-value: Probability of obtaining a test statistic and critical value as extreme, or more extreme, than the one actually obtained
4. Because computers are used in these tests, this step is often transparent; the p-value estimated in the statistical test is compared with the a priori α (usually 0.05), and the decision is made.
Choosing the Appropriate Statistical Test Depends on the Following:
1. Type of data (nominal, ordinal, or continuous)
2. Distribution of data (e.g., normal)
3. Number of groups
4. Study design (e.g., parallel, crossover) 5. Presence of confounding variables
6. One-tailed versus two-tailed
7. Parametric versus nonparametric tests
a. Parametric tests
b. Nonparametric tests
Parametric tests assume the following:
i. Data being investigated have an underlying distribution that is normal or close to normal.
Or more correctly: Randomly drawn from a parent population with a normal distribution.
Remember how to estimate this? (mean ~ median)
ii. Data measured are continuous data, measured on either an interval or a ratio scale.
iii. Parametric tests assume that the data being investigated have variances that arehomogeneous between the groups investigated. This is often referred to as homoscedasticity.
Are used when data are not normally distributed or do not meet other criteria for parametric tests (e.g., discrete data)
Parametric Test: One-sample t-test
Compares the mean of the study sample with the population mean
Group 1 v. Known population mean
Parametric Test: Two-sample t-test
AKA independent samples, or unpaired test
Compares the means of two independent samples. This is an independent samples test.
i. Equal variance test
(a) Rule of thumb for variances: If the ratio of larger variance to smaller variance is greater
than 2, we generally conclude the variances are different.
(b) Formal test for differences in variances: F test
(c) Adjustments can be made for cases of unequal variance.
ii. Unequal variance
Group 1 v. Group 2
Parametric Test: Two-sample t-test- Equal Variance Test
(a) Rule of thumb for variances: If the ratio of larger variance to smaller variance is greater than 2, we generally conclude the variances are different.
(b) Formal test for differences in variances: F test
(c) Adjustments can be made for cases of unequal variance.
Parametric Test: Paired t-test
Compares the mean difference of paired or matched samples. This is a related samples test.
Within Group 1, comparing measurement 1 with measurement 2
Parametric Test: Common error
Use of multiple t-tests with more than two groups
Parametric Test: Analysis of variance (ANOVA)
A more generalized version of the t-test that can apply to more than two groups
Post hoc tests
Compares the means of three or more groups in a study; also known as single-factor ANOVA. This is an independent samples test.
Ex: Compare Group 1, Group 2, and Group 3
Two -way ANOVA
Another factor is added such as age to an already studied factor
Younger Groups (Group 1, 2, and 3) compared with Old groups (Group 1, 2, and 3)
This is a related samples test.
Group 1 includes related measurements of measure 1, 2, and 3.
Post hoc tests
Many comparison procedures are used to determine which groups actually differ from each other
Tukey HSD (Honestly Significant Difference), Bonferroni, Scheffé,
Analysis of covariance (ANCOVA)
Provides a method to explain the in uence of a categorical
variable (independent variable) on a continuous variable (dependent variable) while statistically controlling for other variables (confounding)
These tests may also be used for continuous data that do not meet the assumptions of the t-test or ANOVA.
Tests for independent samples:
1) Wilcoxon rank sum test, Mann-Whitney U test, or Wilcoxon Mann-Whitney test
2) Kruskal-Wallis one-way ANOVA by ranks
Tests for related or paired samples
1) Sign test and Wilcoxon signed rank test
2) Friedman ANOVA by ranks
Sign test and Wilcoxon signed rank test
Compares two matched or paired samples (related to a paired t-test)
Tests for related or paired samples
Friedman ANOVA by ranks
Compares three or more matched/paired groups
Tests for related or paired samples
Wilcoxon rank sum test, Mann-Whitney U test, or Wilcoxon Mann-Whitney test
Compare two independent samples (related to a t-test)
Tests for independent samples
Kruskal-Wallis one-way ANOVA by ranks
i. Compares three or more independent groups (related to one-way ANOVA)
ii. Post hoc testing
iii. Tests for independent samples
Chi square test
Fisher exact test
Chi-square (χ2) test
Compares expected and observed proportions between two or more groups
a. Test of independence
b. Test of goodness of fit
Fisher exact test
Specialized version of the chi-square test for small groups (cells) containing fewer than five predicted observations
Controls for the in uence of confounders
Type II decision error
Hypothesis is wrong but you accepted it
The probability of making this error is termed beta.
1. Concluding that no difference exists when one truly does (not rejecting H0 when it should be rejected)
2. It has become a convention to set β to between 0.20 and 0.10.
Type I decision error
Hypothesis is right but you denied it (rather than accepted)
The probability of making this error is defined as the significance level α.
1. Convention is to set the α to 0.05, effectively meaning that, 1 in 20 times, a type I error will occur when the H0 is rejected. So, 5.0% of the time, a researcher will conclude that there is a statistically
significant difference when one does not actually exist.
2. The calculated chance that a type I error has occurred is called the p-value.
3. The p-value tells us the likelihood of obtaining a given (or a more extreme) test result if the H0 is true.
When the α level is set a priori, H0 is rejected when p is less than α. In other words, the p-value tells us
the probability of being wrong when we conclude that a true difference exists (false positive).
4. A lower p-value does not mean the result is more important or more meaningful, but only that it is statistically signi cant and not likely attributable to chance.
Power (1 − β)
The probability of making a correct decision when H0 is false; the ability to detect differences between groups if one actually exists
Dependent on the following factors:
a. Predetermined α
b. Sample size n
c. The size of the difference between the outcomes you wish to detect. Often not known before conducting the experiment, so to estimate the power of your test, you will have to specify how large a change is worth detecting
d. The variability of the outcomes that are being measured
e. Items c and d are generally determined from previous data and/or the literature.
Power is decreased by the following (in addition to the above criteria):
a. Poor study design
b. Incorrect statistical tests (use of nonparametric tests when parametric tests are appropriate)
Statistical power analysis and sample size calculation
a. Related to above discussion of power and sample size
b. Sample size estimates should be performed in all studies a priori. c. Necessary components for estimating appropriate sample size
i. Acceptable type II error rate (usually 0.10-0.20)
ii. Observed difference in predicted study outcomes that is clinically signi cant iii. The expected variability in item ii
iv. Acceptable type I error rate (usually 0.05)
v. Statistical test that will be used for primary end point
Statistical significance versus clinical signi cance
a As stated earlier, the size of the p-value is not necessarily related to the clinical importance of the result. Smaller values mean only that "chance" is less likely to explain observed differences.
b. Statistically significant does not necessarily mean clinically significant.
c. Lack of statistical significance does not mean that results are not clinically important.
d. When considering nonsignificant findings, consider sample size, estimated power, and observed variability.
Correlation Versus Regression
1. Correlation examines the strength of the association between two variables. It does not necessarily
assume that one variable is useful in predicting the other.
2. Regression examines the ability of one or more variables to predict another variable.
1. The strength of the relationship between two variables that are normally distributed, ratio or interval scaled, and linearly related is measured with a correlation coefficient.
2. Often referred to as the degree of association between the two variables
3. Does not necessarily imply that one variable is dependent on the other (regression analysis will do that)
4. Pearson correlation (r) ranges from −1 to +1 and can take any value in between: −1 Perfect negative, linear relationship, 0 No linear relationship, and +1 Perfect positive linear relationship
5. Hypothesis testing is performed to determine whether the correlation coef cient is different from zero. This test is highly in uenced by sample size.
Pearls About Correlation
1. The closer the magnitude of r to 1 (either + or −), the more highly correlated the two variables. The weaker the relationship between the two variables, the closer r is to 0.
2. There is no agreed-on or consistent interpretation of the value of the correlation coef cient. It is dependent on the environment of the investigation (laboratory vs. clinical experiment).
3. Pay more attention to the magnitude of the correlation than to the p-value because it is in uenced by sample size.
4. Crucial to the proper use of correlation analysis is the interpretation of the graphic representation of the two variables. Before using correlation analysis, it is essential to generate a scatterplot of the two variables to visually examine the relationship.
Spearman Rank Correlation
Nonparametric test that quanti es the strength of an association between two variables but does not assume a normal distribution of continuous data. Can be used for ordinal data or nonnormally distributed continuous data
1. A statistical technique related to correlation. There are many different types; for simple linear regression: One continuous outcome (dependent) variable and one continuous independent (causative) variable
2. Two main purposes of regression: (1) Development of prediction model and (2) accuracy of prediction
3. Prediction model: Making predictions of the dependent variable from the independent variable; Y = mx + b (dependent variable = slope × independent variable + intercept)
4. Accuracy of prediction: How well the independent variable predicts the dependent variable. Regression analysis determines the extent of variability in the dependent variable that can be explained by the independent variable.
a. Coefficient of determination (r2) measured describing this relationship. Values of r2 can range from 0 to 1.
b. An r2 of 0.80 could be interpreted as saying that 80% of the variability in Y is "explained" by the variability in X.
c. This does not provide a mechanistic understanding of the relationship between X and Y, but rather, a description of how clearly such a model (linear or otherwise) describes the relationship between the two variables.
d. Like the interpretation of r, the interpretation of r2 is dependent on the scientific arena (e.g., clinical research, basic research, social science research) to which it is applied
5. For simple linear regression, two statistical tests can be used.
a. To test the hypothesis that the y-intercept differs from zero
b. To test the hypothesis that the slope of the line is different from zero
6. Regression is useful in constructing predictive models. The literature is full of examples of predictions. The process involves developing a formula for a regression line that best ts the observed data.
7. Like correlation, there are many different types of regression analysis.
a. Multiple linear regression: One continuous independent variable and two or more continuous dependent variables
b. Simple logistic regression: One categorical response variable and one continuous or categorical explanatory variable
c. Multiple logistic regression: One categorical response variable and two or more continuous or categorical explanatory variables
d. Nonlinear regression: Variables are not linearly related (or cannot be transformed into a linear relationship). This is where our pharmacokinetic equations come from.
e. Polynomial regression: Any number of response and continuous variables with a curvilinear relationship (e.g., cubed, squared)
Multiple linear regression
One continuous independent variable and two or more continuous dependent variables
Simple logistic regression
One categorical response variable and one continuous or categorical explanatory variable
Multiple logistic regression
One categorical response variable and two or more continuous or categorical explanatory variables
Variables are not linearly related (or cannot be transformed into a linear relationship). This is where our pharmacokinetic equations come from.
Any number of response and continuous variables with a curvilinear relationship (e.g., cubed, squared)
a. Uses survival times (or censored survival times) to estimate the proportion of people who would survive a given length of time under the same circumstances
b. Allows the production of a table ("life table") and a graph ("survival curve")
c. We can visually evaluate the curves, but we need a test to evaluate them formally.
Compare the survival distributions between (two or more) groups.
a. This test precludes an analysis of the effects of several variables or the magnitude of difference between groups or the CI (see below for Cox proportional hazards model).
b. H0: No difference in survival between the two populations
c. Log-rank test uses several assumptions.
i. Random sampling and subjects chosen independently
ii. Consistent criteria for entry or end point
iii. Baseline survival rate does not change as time progresses.
iv. Censored subjects have the same average survival time as uncensored subjects.
Cox proportional hazards model
a. Most popular method to evaluate the impact of covariates; reported (graphically) like Kaplan-Meier
b. Investigates several variables at a time
c. Actual method of construction/calculation is complex.
d. Compares survival in two or more groups after adjusting for other variables
e. Allows calculation of a hazard ratio (and CI)
Studies the Time Between Entry in a Study and Some Event (e.g., death, myocardial infarction)
1. Censoring makes survival methods unique; considers that some subjects leave the study for reasons
other than the "event" (e.g., lost to follow-up, end of study period)
2. Considers that all subjects do not enter the study at the same time
3. Standard methods of statistical analysis such as t-tests and linear or logistic regression may not be appropriately applied to survival data because of censoring.
More than one patient with a similar experience or many case reports combined into a
Determines the association between exposures/factors and disease/condition development.
Allows an estimation of the risk of outcome (and the RR between the exposure groups). Study outcome of interest in those with and without the exposure of interest.
Advantages: Can control for confounding factors to a greater extent, easier to plan for data collection
Disadvantages: More expensive and time intensive, loss of subject follow-up, dif cult to study rare
diseases/conditions at a reasonable cost
Population-based, cross-sectional study: Prevalence of serious eye disease and visual impairment in a north London population
Cross-sectional analysis of data from a large cohort study: Maternal characteristics and migraine pharmacotherapy during pregnancy
Advantages: Easy design, "snapshot in time," all data collected at one time, studies are accomplished by questionnaire, interview, or other available biomedical information (e.g., laboratory values).
Disadvantages: Does not allow the study of a factor (or factors) in individual subjects over time, just at the time of assessment; dif cult-to-study, rare conditions
Measure of the probability of developing a disease
Incidence rate: Number of new cases of disease per population in a specified time
Calculated by dividing the number of individuals who develop a disease during a given period by the
number of individuals who were at risk of developing a disease during the same period
Measure of the number of individuals who have a condition/disease at any given time
Point prevalence: Prevalence on a given date
Period prevalence: Prevalence in a period (e.g., year, month)
Relative Risk Ratio/ Odds Ratio
RR: Risk of disease is lower in the exposed group
OR: Odds of exposure is lower in the diseased group
RR: Risk of disease in the two groups is the same
OR: Odds of exposure in the two groups is the same
RR: Risk of disease is greater in the exposed group
OR: Odds of exposure is greater in the diseased group
0.75: 25% reduction in the risk/odds
1.0: No difference in risk/odds
1.5: 50% increase in the risk/odds
3: 3-fold (or 200%) increase in the risk/odds
a. RR = [A/(A + B)]/[C/(C + D)]
b. OR = (A/C)/(B/D) or (A x D)/(B x C)
Design allows assessment of causality.
a. Sufficient cause
b. Necessary cause
c. Risk factor
Minimizes bias through randomization and/or stratification
b. Block randomization
c. Strati cation
d. Cluster randomization
a. Placebo controlled
b. Active controlled
c. Historical control
Either subjects or investigators are unaware of subject assignment to active/control.
Both subjects and investigators are unaware of subject assignment to active/control.
Both subjects and investigators are unaware of subject assignment to active/control; in addition, an analysis group is unaware.
Two placebos necessary to match active and control therapies
Everyone is aware of subject assignment to active/control.
Primary End Points
The primary end point is one of the most important decisions to make in the design of a clinical study.
A composite end point combines several end points
Surrogate End Points
Do not always predict clinical outcomes
Variables thought to be associated with clinical outcomes
Designed to detect a difference between experimental treatments. This is the typical
design in a clinical trial.
Designed to investigate whether a treatment is not clinically worse (not less effective than stated margin, or inferior) than an existing treatment.
Compares outcomes on the basis of initial group assignment or "as randomized." The allocation to
groups was how they were "intended to be treated," even though they may not have taken the medication
for the duration of the study, dropped out, and did not comply with the protocol.
Subjects who do not adhere to allocated treatment are not included in the nal analysis; only those who completed the trial and adhered to the protocol (based on some predetermined de nition [e.g., 80% adherence]).
Subjects are analyzed by the actual intervention received. If subjects were in the active treatment group
but did not take active treatment, the data would be analyzed as if they were in the placebo group.
Summary that uses explicit methods to perform a comprehensive literature search, critically appraise it,
and synthesize the world literature on a specific topic
Systematic review that uses mathematical/statistical techniques to summarize the results of the evaluated studies
Absolute and Relative Differences
Absolute differences or absolute changes
Relative differences or relative changes
Absolute differences are more important than relative differences, although the authors of many clinical
studies highlight the differences observed in trials with relative differences because they are numerically larger. Why? Larger numbers are more convincing to practitioners and patients. Most drug advertisements (both directly to patients and to health care professionals) quote relative differences.
Number Needed to Treat (NNT)
Another means to characterize changes or differences in absolute risk De nition: The reciprocal of the absolute risk reduction (ARR)
NNT = 1/(ARR).
Rounded to the next highest whole number is the most conservative approach
Differences in cost among comparable therapies are evaluated
Outcome: Clinical units or cost per unit health outcome (outcome examples: years of life saved, number
of symptom-free days, blood glucose, blood pressure, etc.)
Useful to measure the cost impact when health outcomes are improved
Assigns utility weights to outcomes so the impact can be measured in relation to cost (outcome example:
Compares outcomes related to mortality when mortality may not be the most important outcome
Cost-Bene t Analysis
Monetary value is placed on both therapy costs and bene cial health outcomes.
Allows analysis of both the cost of treatment and the costs saved with bene cial outcomes
Proportion of True Positives That Are Correctly Identi ed by a Test; a test with a high sensitivity means that a negative test can rule OUT the disorder.
Sensitivity = TP/(TP + FN)
Proportion of True Negatives That Are Correctly Identi ed by a Test; a test with high speci city means that a positive test can rule IN the disorder.
Specificity = TN/(TN + FP)
Positive Predictive Value
Proportion of Patients with a Positive Test Result Who Actually HAVE the Disease
Positive predictive value = TP/(TP + FP)
Negative Predictive Value
Proportion of Patients with a Negative Test Result Who Actually DO NOT HAVE the Disease
Negative predictive value = TN/(TN + FN)
Positive likelihood ratio
sensitivity/(1 − specificity)
Negative likelihood ratio
(1 − sensitivity)/specificity
the field of study that evaluates the behavior or welfare of individuals, firms, and markets relevant to the use of pharmaceutical products, services, and programs
Clinical outcomes are changes in biomedical and physical events.
Economic outcomes are changes in the use of resources.
Humanistic outcomes are changes in patient status or quality of life.
Each item is priced separately (e.g., the drug, syringe, catheter, time to prepare, time
Microcosts are time-intensive to collect and can have a broad range of variation, so many providers, such as hospitals, use a technique to estimate these costs, called a cost-to-charge ratio.
An average cost for the sum of the individual items is combined into a cost for a unit of resource (e.g., an intensive care unit [ICU] hour). Aggregation can be a simple average or can entail complex processes.
The charge is the price of the resource unit (the billed amount); cost is what is paid for the components (the amount expended).
Direct medical costs
included in government health care expenditure data
(which is the only aspect included in the GDP as health care
Drugs (being compared or used to treat adverse effects or failure), devices for administration, devices for monitoring, laboratory values for monitoring, clinic visits, hospital days, ICU hours, surgery, and emergency department and recovery room minutes, labor costs (with bene ts) if not included in the previous categories, and relevant other resources
Direct nonmedical costs
Direct nonmedical costs are those that would not be expended in the absence of the disease but
are not considered medical purchases.
These include transportation, child care, special diets (not including medical supplements),
modi cation of the home, lodging, and away meals.
Pharmacoeconomic analyses include only the costs of lost productivity caused by morbidity and mortality.
Cost is lost workdays (hours), which is called absenteeism. The cost of absenteeism may be directly measurable if sick leave is paid or a replacement is used or can be estimated by the cost of not getting the work completed
Randomized controlled trials and piggyback studies
Evidence-based medicine uses a hier- archy where the randomized controlled trial is considered strong evidence because of the internal validity.
However, a randomized controlled trial usually does not have a primary outcome that is economic.
Often, an economic study will run concurrently, or economic data will be collected while the study is being conducted (called a piggyback study)
Randomized pragmatic studies
Obtain more generalizability (increased external validity) while obtaining high-quality evidence that can be applied in practice, randomized pragmatic studies are more commonly being conducted, especially as part of comparative effectiveness research, and usually include economic outcomes
Cohort studies are similar but without the randomization; the lack of randomization can make the study groups very different because the decision to use the intervention of interest may be reserved for speci c patients (sicker, wealthier)
Retrospective cohort and case-control economic studies use historical informa- tion that was collected not for use in a study but for other reasons, such as health care documentation and reimbursement
Commonly used study design in pharmacoeconomics is modeling, which uses information from multiple sources including clinical studies, epidemiologic studies, databases (including census, claims), and electronic medical records to create an electronic road map of the disease being studied
Decision analysis model
Decision tree analysis is a visual road map to the range of outcomes
from the comparison of interventions. The decision tree is a simpli cation of the major com- ponents leading to the outcome. Decision trees create a linear progression to the outcome. For many decisions, this simplistic model can give reasonable results for making a decision
Markov model is used when the analysis must consider the complexity of the disease. The differences between this model and the decision tree include the inclusion of the concept of time and the ability to revert to an earlier health state, not just linear progression.
Similar to Markov models, Bayesian models use existing data to predict the probability of outcomes in the future. The main difference is that Bayesian models can do this even when important data elements are missing or inconclusive.
Discrete event simulation
Newer, more naturalistic model may be used more in the future. This discrete event simulation does not mandate mutually exclusive events (or states) or xed time cycles.
Monte Carlo technique:
The results from the models are produced using a hypothetical healthy group of subjects (cohort) and running a sample of the cohort through the model. Probabilities are close to true outcomes if samples are run an infinite number of times.
A sample size is selected (e.g.,
1000 subjects) and run through the model. Two bootstrapping techniques can be used: "with replacement," where an individual may be included in multiple samples, or "without replace- ment," where the individuals in each sample are unique
Jackkni ng samples
from the population by randomly deleting samples, without replacement, from the original population. The model is run until the population is depleted.
Unless all data elements are actually collected from the same patients, costs and probabilities of outcomes are estimates, and a concept called sensitivity analysis is used to test the strength of those estimates
One-way (univariate) sensitivity analysis varies one estimate at a time while all others are held constant. A graph, often called a tornado diagram, is created, with the spread of val- uations for each estimate being presented, and the estimate showing the most effect is represented at the top.
Scenario (multivariate) analysis varies two or more estimates at the same time (creating scenarios).
Threshold analysis changes one or more estimates until the decision changes.
Probabilistic sensitivity analysis draws randomly from the distribution of the estimate, running the model multiple times.
When the costs and bene ts occur over a period that exceeds 1 year, economic theory states that discounting must be done. This concept is based on the preference of society (and individuals) for paying later and receiving bene ts now. Controversies exist about the choice of discount factor and whether the costs and bene ts should be discounted at the same rate.
=1/(1 + r)^t
Cost of illness
If outcomes are not the output of the analysis, the evaluation is considered a partial analysis. Usually, this would result in a list of costs and is considered a cost analysis.
Cost consequence analyses often are a list of costs and of consequences without direct comparisons; however, some researchers consider cost consequence analyses a version of cost-benefit analyses.
intended to assess the burden of illness or a speci c illness on society. Whether COI studies are full or partial analyses is debatable because COI studies do not separate out alternatives, nor are outcomes evaluated (although costs of outcomes are included). These analyses often provide a framework for full evaluations such as cost-benefit and cost-effectiveness analyses. Cost-of-illness studies do not measure the effectiveness or efficiency of resource use, and the many methods used to generate them make comparisons very difficult.
Cost Effectiveness Analysis
Compares the relevant costs and outcomes (benefits or conse-
quences) of two competing therapies, with costs presented in monetary units and outcomes in their natural units (units of effectiveness).
Cost-benefit analysis measures both costs and bene ts in monetary units. This allows comparisons of alternatives across different health interventions or programs and between health and other social projects. Benefits are often costs averted, so care must be taken not to double count one as the other.
The cost-effectiveness ratio is expressed as cost per case cured, cost per life-year saved, or similar ratios. The decision is to choose the lowest cost per unit of effectiveness. The advantage of this method is that value (health outcome per dollar spent) can be directly presented. The disadvantage is that only alternatives with the same outcome can be compared.
When a single unit of resource outcome is used, an appropriate "sample size" might be the number needed to treat or the number needed to harm (depending on what the outcome is measuring), in that gain of one benefit or prevention of one harm for the comparator would be the basis of the sample size selection.
Another difference, in addition to the units of outcome, between cost-bene t and cost-effectiveness is that with cost-benefit, any positive net bene t is selected. With cost-effectiveness, a "worth" or minimum value has to be established, and the cost-effectiveness ratio has to be equal to or less than the threshold amount, which is analogous to a WTP from the payer's perspective.
Average cost-effectiveness ratio
Average Cost Effectiveness Ratio
average cost-effectiveness ratio is the total costs of one alter- native divided by the effectiveness for those costs; this is independent of other alternatives
Incremental cost-effectiveness ratio
Incremental cost-effectiveness ratio (ICER) is the differ- ence between the intervention and comparison costs to the difference in units of outcome (analogous to a net cost-effectiveness); this is the result with the most meaning because it provides the cost per one additional unit of outcome
Marginal: Incremental marginal cost-effectiveness compares an intervention with itself (e.g., two doses of the drug or two sequential laboratory tests). A threshold is still needed to determine the cost-effectiveness.
Value of information: Another variant in health care economics is the value of information. This technique determines a value for perfect information (i.e., no uncertainty exists for any parameter). If the cost to conduct more research is less than this calculated value, a study should be conducted.
The complete cost-utility analysis is the most expensive economic anal- ysis technique because of the time required from both researchers and subjects to collect the utili- ties. It should be used only when quality of life is the outcome of interest or is one of the outcomes of interest
QALY: The QALY is a function of quality multiplied by quantity of life, which are independent. The life-year is simply the change in survival (the measure of mortality)
Utilities: Health is the construct (surrogate) for being able to participate in life at the level desired; other non-health-related aspects of quality of life are not considered here.
Utility determination: Although multi-attribute utility instruments (also called MAUIs), such as the EQ-5D, HUI-3 or SF-6D, are being used more often, three direct methods of determining utilities are commonly used. A MAUI must be validated against of the direct methods, usually standard gamble, and may include a state worse than death (range -1.0 to 1.0).
A probability (p) of full health is presented (with death being 1 - p)
Time trade-off assumes that people with a loss of health would be willing to give up part of their life span to live in perfect health.
The rating scale (usually a visual analog scale similar to the 100-mm pain scale, where the distance between each millimeter mark is equal) has been used, but because it does not give a choice between two alternatives, has been considered an indirect technique similar to PROs rather than a utility generator. The rater places a mark at the point where he or she believes the scenario belongs relative to several other scenarios, with the top being "full health" and the bottom being "death.
• Perfect Health = 1.0
• Dialysis center for 1 month = 0.85
• Ambulatory dialysis for 8 years = 0.65
• Kidney transplant = 0.58
• Dialysis center for 8 years = 0.56
• Ambulatory dialysis for life = 0.40
• Dialysis center for life = 0.32
• Death = 0.0
Questions (QALY = utility × time in state)
PATIENT-REPORTED OUTCOMES (PRO)
"any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else."
Done using questionnaires (also called instruments, surveys, tools, measures, or tests). These instruments evaluate patients' ability to function in various areas of their lives and re ect their experiences with their disease and their care. The questionnaires can contain one question or many; the Medical Outcomes Study: Measures of Quality of Life Core Survey (MOS), used in a 1989 landmark study, consisted of 116 items
The main test of reliability is using Cronbach α to test internal consistency.
test is really only useful if the items are reflective indicators
tested over time (longitudinal or test-retest reliability) to determine whether the instrument results are the same if no change occurred or shifted in the correct direction if a change occurred
For an instrument to be determined valid, it has to be tested in large sample sizes for several components of validity. Even a validated instrument for one group has to be revalidated for another group. This must also be done if the method of administration is changed (e.g., from pen-and-paper to computer-assisted)
The simplest type of validity is face validity, or the appearance of validity obtained by examining the questions.
Constructs (latent traits or factors) for humanistic outcomes (e.g., HRQoL) use the survey items in a manner similar to surrogates for clinical outcomes
Criterion validity compares the outcome of the item with another way of measur- ing the same construct that has been validated and is, preferably, the gold standard for that measure- ment (this is dif cult because various de nitions for HRQoL exist)
Discriminant validity often uses known groups, those with the disease and those without or different age groups, to measure whether the instrument can make this discrimination.
determined using factor analysis, a statistical technique to assess convergent and divergent validity.
Original articles that have not been interpreted, condensed, or evaluated (except by peer review) by others
Index or abstract of the primary literature and tertiary literature found in journals, with the goal of
directing the user to the primary literature.
MEDLINE EMBASE PubMed Google Scholar IDIS
Journal watch LexisNexis BIOSIS
Cochrane Library Current Contents CINAHL
Established knowledge or consensus of opinion; works that summarize, discuss, criticize,
etc., the primary literature
General textbooks (e.g., Pharmacotherapy, Applied Therapeutics, Briggs' Drugs in Pregnancy and Lactation, Meyler's Side Effects of Drugs)
General product information (e.g., American Hospital Formulary Service, Drug Facts and Comparisons, Physicians' Desk Reference, Drug Information Handbook, Clinical Pharmacology, UpToDate)
Electronic textbooks and databases (McGraw-
Hill ACCESS Pharmacy, STAT!Ref, Lippincott Health
IRB Common Rule
Obtaining and documenting informed consent
b. IRB membership, function, operations, review of research, and recordkeeping
c. Additional protections for certain vulnerable research subjects: Pregnant women, prisoners,
children, individuals with impaired capacity
d. Ensuring compliance by research institutions
At least five members:
ii. Scienti c member
iii. Nonscienti c member
iv. Layperson unaf liated with the institution
Information must be presented to the individual (or representative) to enable that person to make a voluntary decision to participate as a research subject.
i. Description of any reasonably foreseeable risks or discomforts
ii. Description of any bene ts to the subject or to others that may reasonably be expected
iii. Disclosure of appropriate alternative procedures or courses of treatment, if any
iv. Statement describing the extent, if any, to which con dentiality of records identifying the subject will be maintained
v. For research involving more than minimal risk, an explanation about whether any compensation, and an explanation about whether any medical treatments, will be available if injury occurs
vi. Contact information for answers to questions about the research and research subjects' rights; whom to contact if the subject has a research-related injury
vii. A statement that participation is voluntary; refusal to participate will involve no penalty or loss of bene ts, and the subject may discontinue participation at any time without penalty
If there is no more than minimal risk and the research could have been done without a waiver, then you don't need informed consent
HIPAA Protected Information
All geographic subdivisions smaller than a state, including street address, city, county, precinct, and zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if one of the following applies according to the current publicly available data from the Bureau of the Census:
a. The geographic unit formed by combining all zip codes with the same three initial digits contains
more than 20,000 people, and
b. The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people
is changed to 000.
All elements of dates (except year) for dates directly related to an individual, including birth date,
admission date, discharge date, and date of death; and all ages older than 89 years and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 years or older
Electronic mail addresses
Social Security numbers
Medical record numbers
Health plan beneficiary numbers
Certificate and license numbers
Vehicle identifiers and serial numbers, including license plate numbers
Medical device identifiers and serial numbers
Internet universal resource locators
Internet protocol (IP) addresses
Biometric identifiers ( fingerprints and voiceprints)
Full-face photographic images and comparable images
Any other unique identifying number, characteristic, or code (may assign a code for de-identified
information to be re-identified)