Search
Create
Log in
Sign up
Log in
Sign up
23_Stats/ DI/ study design
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (176)
Con of tertiary literature?
lag time for updates;
interpretation is dependent on author opinion;
often incomplete
Tertiary literature
works that summarize, discuss, criticize, etc.,
the primary literature
- guidelines, texts, ACCESS pharmacy
Secondary literature
index of abstract primary and teriary literature found in journals, provides a rapid method to search for primary literature
- Medline, PubMed
Evidence-based medicine?
The conscientious explicity and judicious use of current best evidence in making decisions about the care of individual pts while integrating clinical experience with the best available evidence from a systemic search
Criticisms of EBM?
cookbook - reduced clinician autonomy; too dificult to apply to individuals; limited data to suggest improved care
Quality improvement vs research definitions
if the results of a project are presented outside an organization, it is defined as research, if only used internally, and not meant to contribute to generalized knowledge, then QI
Nuremberg code?
1948 Germany, subjects need to give informed consent and benefits must outweight risks
Helsinki Code?
Governs research ethics, defines rules, basis for good clinical practices used today
Tuskegee syphilis study?
increased pt risks, heightened awareness of the need to protect human subjects and to ensure their informed voluntary consent
Belmont report?
summarized basic ethical principles identified in deliberations
Code of federal regulations (CFR) DHHS title 45 part 46 protection of human subjects
none
When would you get expedited IRB review?
minimal risk to participant, minor change to previously approved study, minimal risk to subjects
Title II preventing health care fraud and abuse; admin simplification; medical liability reform
privacy rules, transaction and code set rules, security rules, enforcment rules, unique identifier rule (NPI)
Can you use share pt information?
work-related: trmt, payment, healthcare operations, authorized by the pt
mean
average
very sensitive to outliers
median
midpoint (50th percentile)
good for skewed populations
mode
most common value
standard deviation
variability about the mean
Applied to continuous data that are normally or near-normally distributed
68% of the sample values are within ±1 SD, 95% are within ±2 SD, and 99% are within ±3 SD.
parametric
normally distributed data
Internal validity
Is study design legit to come to its conclusions?
External validity
Can you generalize the study results outside the study setting?
Benefit of a case report/case series
allows hypothesis generation (FQ QTc)
Observational study design? Drawback of this method?
Does not involve investigator intervention, only observation; these studies investigate associations, not causes
Case-control study (or retrospective study)
Study the exposure or what happened in a grp that has the outcome of interest (Ex. lung cancer, did they work in oil fields)
Minimize bias in case control study?
Cases and controls are selected in the same way
Measure of association in a case-control study?
OR - odds of exposure to a factor in those with a condition or disease compared to those who do not have condition
1 indicates no difference in risk, and if the CI includes 1, there is no statistical difference.
Cohort (another observational design) definition?
Determine association between exposure/factors and a disease/condition.
Have study sample, some have a risk factor and others don't, some have the outcome being looked at and some don't in each grp
Cross-sectional study?
Prevalence study, identify prevalence or characteristics of a condition in a grp of individuals (how many pregnant women are using headache medication)
Incidence?
Measure of the probability of developing a disease
Number of new cases per population in specified time (divide number of individuals who develop a disease during a time period by the number of individuals who were at risk)
Prevalence?
number of individuals who have condition/disease at any given time
Null hypothesis (H0)
No difference between grps
Hypothesis testing determine whether the data are consistent with H0 (no difference).
Alternative hypothesis (Ha)
states that there is a difference
level of significance
level of acceptable error caused by a false positive
- α usually 0.05.
Paarametric tests
- student t-test
- ANOVA
- Linear regression
Student t-test
- One-sample test: Grp 1 vs known population mean
- Two-sample, independent samples, or unpaired test: Grp 1 vs Grp 2
- Paired test: Measure 1 vs 2 in grp 1
ANOVA
generalized version of the t-test that can apply to > 2 grps
Non-parametric tests
Used for continuous data that do not meet the assumptions of the t-test or ANOVA.
- Wilcoxon rank sum + Mann-Whitney
Nominal data
- Chi-square test - 2+ grps
- Fisher exact test - <5 grps
Alpha error
Type 1 error
- probability of making error = α
- incorrect rejection of true null hypothesis ("false positive")
- can only be considered if statistical difference is found
- α = 0.05. 1 in 20 times, a type I error will occur when the H0 is rejected. So, 5.0% of the time, will conclude that there is a statistically significant difference when one does not actually exist.
- calculated chance that a type I error has occurred is called the "p-value."
Beta error
Type II error
- probability of making error = β
- failure to reject false null hypothesis ("false negative")
- convention to set β between 0.20 and 0.10
Power
(1 − β)
- ability to detect differences between grps if one actually exists
- α: risk of error you will tolerate when rejecting H0
Regression
- statistical technique related to correlation
- how well the independent variable predicts the dependent variable
Coefficient of determination (r2)
- range from 0 to 1.
- r2 of 0.80 = 80% of the variability in Y is "explained" by the variability in X.
Calculate OR/RR?
= odds/risk in experimental grp/ odds/risk in control grp
- incidence of disease in exposed group divided by incidence of disease in unexposed group
RR < 1
RR > 1
RR of < 1 = negative association - event less likely to occur in the experimental grp than in the control.
RR of > 1 = positive association - event more likely to occur in the experimental grp.
RR and OR of 0.75?
25% reduction in the risk/odds
RR and OR 1.5?
50% increase in risk/odds
Absolute vs relative difference
Absolute differences are more important than relative differences, although the authors of many clinical studies highlight the differences observed in their trials with relative differences because they are larger.
Among high-risk pts in trial 1, the event rate in the control group (placebo) is 40 per 100 pts, and the event rate in the trmt group is 30 per 100 pts.
ARR ( risk difference) is the simple difference in the event rates (40% - 30% = 10%).
RRR is the difference between the event rates in relative terms. Here, the event rate in the trmt group is 25% less than the event rate in the control group (i.e., the 10% absolute difference expressed as a proportion of the control rate is 10/40 or 25% less).
Absolute risk reduction (ARR)
Absolute Risk of Control - Absolute Risk of Active Group (expressed as a Percentage)
Relative risk reduction (RRR)
RRR=(Difference in 2 gps/Untreated group (expressed as a ratio of 2 percentages)
NNT
= 1/ARR
- applied to clinical outcomes with dichotomous data (yes/no, alive/dead, MI/no MI)
- provided only for significant effects because it is difficult to interpret the CIs for nonsignificant results.
- number of pts to whom a clinician would need to administer a particular trmt to prevent 1 pt from
having an AE over a defined period oftime.
Kaplan-Meier method
Uses survival times to estimate the proportion of people who would survive a given length of time under the same circumstances
sensitivity
Proportion of true positives correctly identified
specificity
Proportion of true negatives correctly identified
Log-rank test
Compare survival distributions between 2+ grps
Nominal
2 independent samples: X2 or Fisher exact test
2 related samples: McNemar test
> 2 independent samples: X2
Ordinal
2 independent samples: Wilcoxon rank sum, Mann-Whitney
2 related samples: Wilcoxon signed rank
> 2 independent samples: Kruskal-Wallis
Continuous (no factors)
2 independent samples: t-test
2 related samples: paired t-test
> 2 independent samples: ANOVA
Continuous variable
infinite number of values possible
Two types of continuous variables?
interval (change in units is consistent)
+
ratio (similar to interval, except have an absolute zero)
Discrete variable? 2 types
dichotomous or categorical
Nominal variable?
discrete variables classified into grps with no particular order (sex and mortality)
Ordinal variable?
discrete variable allowing rank but without a consistent size in between (pain score)
Range? IQR?
distance between min and max, distance between 75th and 25th percentile (middle 50%)
Secondary literature?
Indexing/abstract system of primary and tertiary literature found in journals
- provides a rapid method to search
Calculate OR/RR?
incidence of disease in exposed grp divided by incidence of disease in unexposed gp
RR < 1
RR > 1
RR of < 1 = event is less likely to occur in the experimental group than in the control group.
RR of > 1 = event is more likely to occur in the experimental group than in the control group.
Interpret RR and OR of 0.75
25% reduction in the risk/odds
Interpret RR and OR 1.5
50% increase in risk/odds
Test for nominal, 2 sample, independent data
X2
Test for nominal, 2 sample, related data
McNemar
Test for nominal >2 sample, independent data
X2
Test for nominal >2 sample, related data
Cochran Q
Test for ordinal, 2 sample, independent data
Wilcox Rank Sum or Mann Whitney-U
Test for ordinal, 2 sample, related data
Wilcox Signed Rank or Sign Test
Test for ordinal, >2 sample, independent data
Kruskal-Wallis
Test for ordinal, >2 sample, related data
Friedman ANOVA
Test for continuous, 2 sample, independent data
Unpaired, 2-sample, t-test
Test for continuous, 2 sample, related data
Paired t-test
Test for continuous, >2 sample, independent data
ANOVA
Test for continuous, >2 sample, related data
Repeated-measures ANOVA
Calculate NNT
1/ARR
Calculate RRR
Control - Active (ARR)/Control
How to control for confounders
Collect as much data as possible
Reported result for case-control trial
Odds-ratio
OR that includes 1 is non-significant
Reported result for cohort trial
Relative risk
RR that includes 1 is non-significant
Prevalence
Measures the rate of disease as a 'snapshot' in time
Incidence
Measures the rate of disease development
Expressed as persons/year
Does OR and RR measure cause or association?
Association
Most common type of equivalence trial
Bioequivalence
In order to detect a smaller difference in trmts, does sample size need to be larger or smaller
Larger
Common tests for meta-analysis heterogenaity
X2 or Cochran Q
Sensitivity
Proportion of true positives that are correctly identified
Specificity
Proportion of true negatives that are correctly identified
Positive predictive value
Proportion of pts with a positive test that are given a correct diagnosis
Negative predictive value
Proportion of patients with a negative test that are given a correct diagnosis
Test able to form predictive models
Regression analysis
Results range from -1 to 1
Test able to form association between variables
Correlation
Results range from -1 to 1
Type of data for Pearson correlation
Continuous
Type of data for Spearman correlation
Ordinal or non-normally distributed continuous data
Measure of correlation
r value (range -1 to 1)
Measure of regression
r2 value (range 0-1)
y=mx+b
y=slope(x)+intercept
Typical post hoc tests
Tukey HSD
Benfertoni
Sheffe
Newman-Kuels
Discrete Variables
Can only take a limited number of values within a given range.
1. Nominal
2. Ordinal
Nominal Data
Discrete Variable
Data of categories only, unordered, with no indication of relative severity. Data cannot be arranged in an ordering scheme (Gender, Race, Religion, mortality, disease state)
Ordinal Data
Discrete Variable
Placed in categories and rank ordered, distance between categories may not be the same. Ex: pain is low, moderate, high
Continuos Variables
Variable that can have unlimited number of possible values; can take on any value within a given range. Ex.
1. Interval
2. Ratio
Interval Data
Continuous Variable
Data are ranked in a specific order with a consistent change in magnitude between units; zero point is arbitrary (degrees Fahrenheit)
Ratio Data
Continuous Variable
Interval data with an absolute zero (degrees Kelvin, HR, BP, time, distance)
Standard Deviation
A measure of variability that indicates the average difference between the scores and their mean. Use only for continuous data that are normally distributed.
+/- 1 SD = 68% of sample values, +/- 2 SD = 95% of sample values, +/- 3 SD = 99% of sample values
Coefficient of Variation
Measure of dispersion calculated by dividing a distribution's standard deviation by its mean. SD/meanX100%
IQR interquartile range
The difference between the 25th and 75th percentiles; contains 50% of data
Normal Distribution
Gaussian dist. Approx distribution of scores expected when a sample is taken from a large population, drawn as a frequency polygon that often takes the form of a bell-shaped curve, called the normal curve. Mean and median will be about equal.
Parametric Data
Data has a tendency to group together, to have a common mean from which they individually vary. Normally distributed data are termed parametric.
Standard Error of the Mean
An estimation of the unaccounted for error within a mean. If the mean is 10 and the standard error of the mean is 2, then the true score is likely to fall somewhere between 8 and 12 or 10 +/- 2. Can be estimated by dividing the SD by the square root of n (sample size)
Confidence Interval
A range of numbers that encompasses the value that would be obtained if an experiment was performed many times (necessary because the value might change slightly each time) (a range from mean - Z to mean + Z; where Z is SEM; 95% (CI) = <0.05 (p) = 1.96 (Z)). ex. Baseline birth weight in a group with mean +-SD of 1.18 +-0.4kg. 95% CI ~mean +- 1.96 x SEM. 95% CI (1.07,1.29) meaning there is 95% certainty that the true mean of the entire population studied will have a mean weight between 1.07 and 1.29kg)
Nonparametric test
counted or ranked data, nominal or ordinal level, -Assumes no normal distribution
-do not necessarily require interval level data
-are statistical methods that do not make assumptions about population
-assume randomization
*chi-square
Null Hypothesis
No difference between two or more sets of data. Stating opposite of what you expect to find. If the null hypothesis is rejected, a statistically significant difference exists (unlikely attributable to chance)
Alternative Hypothesis
There is a difference between two or more sets of data.
Parametric test
a statistical test based on mathematical derivations that include assumptions about parameters of populations from which the samples were drawn. The t test, ANOVA, and tests involving the Pearson correlation are parametric, - statistical significance tests that make assumptions about population from which the data were drawn
-Assumes the variable's normal distribution in the population (i.e., the normal curve)
*t-test, z-test, f-test/ANOVA, paired t-test, etc.
Student t-test
To compare 2 groups. Appropriate when researchers randomly assign participants to the research groups. A statistical test that compares 2 group means or compares a group mean to a population mean to assess whether the differences between means are reliable.
1) the repeated measures t-test (paired t-test)—to compare two sets of scores in which the scores are paired, as in a before and after design. LDL today vs 3-mos later
2) Independent Groups T-test (2 sample, unpaired): Lets us compare two groups where there is no connection between the scores. women LDL vs men LDL
3) One sample t-test - Compare a single group's mean to the population mean. LDL in a group vs national avg.
Paired t-test
A test for differences in the means of paired samples (related populations)
One sample t-test
Used to determine if a single sample mean is different from a known population mean
Two sample t-test
1. Samples are Independent
2. Data in each sample are independent
3. Both populations are Normal
ANOVA
analysis of variance- a test for the significance of differences among three of more means; parametric.
One way ANOVA
an inferential statistical test for comparing the means of three or more groups using a between-participants design and one independent variable. Ex Group 1, Group 2, Group 3
Two way ANOVA
Hypothesis test used when there are two independent variables used to create groups;
- one continuous dependent variable is measured and a means is computed for each group. Ex Young age Group 1 Group 2 Group 3 vs Old age Group 1 Group 2 Group 3
Repeated Measures ANOVA
Statistical test that allows a researcher to determine if differences in the same interval/ratio variable occur over three or more measurements of the variable. Group 1 - Measurement 1, Measurement 2, Measurement 3
Wilcoxon Rank and Mann Whitney U-test
Statistical test is used for evaluating ordinal and continuous data. Nonparmetric test. Compares 2 independent samples. (related to t-test)
Sign test or Wilcoxon signed rank test
Ignore magnitude of change only direction.
Count patients improved, compare to probabilities according to chance. Compares 2 matched or paired samples. (related to paired t test)
Chi square test
For Nominal data - compares frequencies of proportions between groups to see if they're different. Compares expected and observed proportions between two or more groups. Ex. 80% men 20% female in one group versus 70%men 30%female in second group,
Fisher Exact test
nonparametric test used instead of chi-square if sample size is small or some cells in contingency table have no observations
Type I error
alpha error - An error caused by rejecting the null hypothesis when it is true; has a probability of alpha. Practically, a Type I error occurs when the researcher concludes that a relationship or difference exits in the population when in reality it does not exist.
p value
The probability that the association is due to chance (< 0.05 = statistically significant)
Type II error
beta error - (acceptable rate = 0.1 to 0.2) error of failing to reject a null hypothesis when in fact it is false ("false negative"). You think there is NO CAUSE EFFECT but THERE IS
Statistical power
A measure of how much ability exists to find a significant effect using a specific statistical tool. Mathematically, power is a direct function of Type I error rate (1 - β)
Correlation
Examines the strength of the association between two variables. A statistical measure that indicates the extent to which two factors vary together and thus how well one factor can be predicted from the other. Correlations can be positive or negative.
Regression
Examines the ability of one or more variables to predict another variable. The relation between selected values of x and observed values of y (from which the most probable value of y can be predicted for any value of x). Prediction model = Y=mx+b
Pearson correlation
A correlation statistic used primarily for two sets of data that are of the ratio or interval scale. The most commonly used correlational technique. Usually described as the degree of association between the two variables. Does not imply that one vaiable is dependent on the other (regression analysis will do that). Ranges from -1 to 1.
Spearman Rank correlation
nonparametric test that quantifies the strength of an association b/t 2 variables that does not assume normal distribution continuous data. Can be used for ordinal or nonnormally distributed continuous data.
r squared
Coefficient of determination - accuracy of prediction. How well the independent variable predicts the dependent variable. Can range between 0 and 1. If r squared = 0.8 then 80% of the variability in Y is "explained" by the variability in X
Survival analysis
An event history method, and others are used when the dependent variable of interest is a time interval (e.g., time from onset of disease to death).
Kaplan-Meier method
Estimates survival function: uses survival times to estimate the proportion of people who would survive a given length of time under the same circumstances.
Log-rank test
test used to compare survival analysis curves between 2 or more groups
Cox proportional hazards model
multivariate survival analysis controlling for other factors. Allows calculation of hazard ratio. (and CI)
Continuous variable?
an infinite number of values are possible
Two types of continuous variables?
interval (change in untis is consistent) and ratio (similar to interval except have an absolute zero)
Discrete variable? 2 types
cichotomous or categorical
Define prevalence
number of individuals who have condition/disease at any given time
how to calculate OR/RR?
incidence of disease in expose group divided by incidence of disease in unexposed group
RR < 1
RR > 1
RR of < 1 = event is less likely to occur in the experimental group than in the control group.
RR of > 1 = event is more likely to occur in the experimental group than in the control group.
Incidence?
number of new cases of disease per population in specified time (divide number of individuals who develop a disease during a time period by the number of individuals who were at risk of developing a disease during the same time period)
Prevalence
number of individuals who have condition/disease at any given time
Calculate OR/RR?
incidence of disease in exposed group divided by incidence of disease in unexposed group
Interpret RR and OR of 0.75?
25% reduction in the risk/odds
Interpret RR and OR 1.5?
50% increase in risk/odds
Observational study design? What is a drawback of this method?
Does not involve investigator intervention, only observation; these studies investigate associations, not causes
What is a case-control study (or retrospective study)
Study the exposure or what happened in a group that has the outcome of interest (example lung cancer, did they work in oil fields)
Minimize bias in case control study?
Cases and controls are selected in the same way
Measure of association in a case-control study?
OR - odds of exposure to a factor in those with a condition or disease compared to those who do not have condition
Cohort (another observational design) definition?
determine association between an exposure/or factors and a disease/condition.
have a study sample, some have a risk factor and others don't, some have the outcome being looked at and some don't in each group
Cross-sectional study?
prevalence study, identify prevalence or characteristics of a condition in a group of individuals (how many pregnant women are using headache medication)
Incidence?
number of new cases of disease per population in specified time (divide number of individuals who develop a disease during a time period by the number of individulas who were at risk of developing a disease during the same time period)
Prevalence
number of individuals who have condition/disease at any given time
Calculate OR/RR?
incidence of disease in exposed group divided by incidence of disease in unexposed group
RR < 1
RR > 1
RR of < 1 = event is less likely to occur in the experimental group than in the control group.
RR of > 1 = event is more likely to occur in the experimental group than in the control group.
RR and OR of 0.75?
25% reduction in the risk/odds
RR and OR 1.5?
50% increase in risk/odds
Nuremberg code?
1948 germany, subjects need to give informed consent and benefits must outweight risks
Helsinki Code?
Governs research ethics, defines rules, basis for good clinical practices used today
Tuskegee syphilis study?
increased pt risks, this heightened awareness of the need to protect human subjects and to ensure their informed voluntary consent
Belmont report?
summarized the basic ethical principles identified in deliberations
Code of federal regulations (CFR) DHHS title 45 part 46 protection of human subjects
none
Informed consent
informed consent is a process not a form, pt must make a voluntary decision to participate as a research subject
95% CI
95% CI is ~equal to the mean ± 1.96 × SEM (or 2 × SEM).
What is the 95% CI? (1.07, 1.29), meaning there is 95% certainty that the true mean of the entire
population studied will be between 1.07 and 1.29 kg.
How do you increase power in a research study?
Increase sample size, decrease variability in measurements amongst subjects, studying an outcome that happens more frequently, increase effect size
;