BCPS Stats/Study Design
Terms in this set (75)
Dependent= outcome of interest, change in response to an intervention
only take a limited number of values within a given range, qualitative values
Two types: nominal and ordinal
Types of random variables
Discrete (nominal, ordinal);
Continuous (interval, ratio)
Nominal variables "named"
categorical or grouped with NO specific order, no indication of relative severity, no numerical or quantitate value.
Ex. sex (M/F), mortality (dead/alive), disease presence (Y/N), race, marital status
ranked in a specific ORDER but no consistent magnitude.
Ex. NYHA class, severity scores, disease classes
take any value within a given range, counting variables, quantitative values
Two types: interval and ratio
values ranked in a specific order with a consistent change in magnitude between units, NO absolute zero
Ex. temperature (F)
values ranked in a specific order with a consistent change in magnitude between units, Absolute zero
Ex. temperature (kelvin), heart rate, BP, time, distance, height, weight
Types of statistics
Descriptive and inferential
Describes numerically/visually the data collected from the study, confined to the study population only.
Ex. mean, median, mode
Average, used for continuous and normally distributed data (not ordinal data), very sensitive to outliers
Midpoint, aka 50th percentile, used for continuous or ordinal data, insensitive to outliers
used for nominal, ordinal, or continuous. can have more than one (bimodal, trimodal). does not describe meaningful distribution with a large range of values.
measure of variability about the mean; square root of the variance (average squared difference of each observation from the mean); 1SD=68% of sample values, 2SD=95%, 3SD=99%.
Coefficent of the variance= SD/mean x100%
Smaller SD means that other values fall closer to the sample mean, whereas values are farther from the sample mean with a larger SD.
SEM (standard error of the mean)
Estimation of SD of the means (SD of multiple sample means, smaller than SD). Quantifies uncertainity in the estimate of the mean, not variability in the sample. Divide the SD by the sqaure root of the sample size (n).
Confidence Intervals are based on this value
(95%CI =mean +/- 2xSEM)
IQR (Interquartile range)
Decribes middle 50% values, 25-75th percentile
Predictions/inferences about a population as a whole from the study population. Test hypotheses and determine relationships.
Ex. hypothesis testing, regressions, correlations, analysis of variance
Likelihood (probability) that the result obtained was due to chance. When <0.05 means <5% probability that he result occurred by chance, Statistical significance and fail to accept null hypothesis.
Way to estimate a population parameter, in repeated samples 95% of all CIs include true population value (likelihood that that population value/mean is contained within the interval). Give an idea of the magnitude of the difference between groups and the statistical significance. Give a range and point estimate of the difference.
CI=1-alpha (type 1 error).
If encompass 0 for a difference= not statistically significant
If encompass 1 for a ratio= not statistically significant
Wider than 90% CI, the wider the CI the more likely it is to encompass the true population mean
Give an idea of statistical significance but not of magnitude.
Null hypothesis (H0)
No difference between groups (treatment A equals treatment B).
If you reject the null hypothesis= statistically significant difference bt the groups.
If not rejected= No statistically significant difference bt the groups (but not saying they are equal)
Alternative hypothesis (HA)
There is a difference bt groups (treatment A does not equal treatment B).
Alpha= a priori significance level
Developed after the research question has been stated in hypothesis form. Used to determine level of acceptable error caused by the FALSE POSITIVE. Generally 0.05
Type 1 error (alpha error)
Reject Ho (null hypothesis) saying there is a difference when Ho is true (there is no difference). Convention to set alpha at 0.05, so 5% of the time a researcher will conclude that there is a stat sig difference when one does not actually exist. False positive (drug is concluded to be better than placebo when it is not).
Type 2 error (beta error)
Accept Ho (null hypothesis) saying there is no difference when Ho is false (there is a difference). Beta usually set between 0.2 and 0.1. False negative (drug concluded to have no benefit over placebo when it actually has benefit).
Probability of making a correct decision when Ho is false; ability to detect differences if they actually exist. The higher the statistical power means we can be more certain that the null hypothesis was correctly rejected. Generally 80-90% acceptable. Used to determine sample size needed to minimize type 2 errors; too small size may not be enough to detect differences and too large size may find even small differences significant.
Strength of ASSOCIATION between two variables. Correlation coefficient (r) used, -1= perfect negative linear relationship, 0=no linear relationship, 1=perfect positive linear relationship
Ability of one or more variables of PREDICT another variable. Prediction model (y=mx+b) of continuous variables, dependent and independent, coefficient of determination is r squared (0-1). ex r2 is 0.8 means that 80% of the variability in Y is explained by the variability in X.
Studies time bt entry in a study and some event (MI, death). Censoring to take into consideration subj leave the study for reasons other than the event and not all enter the study at the same time. Cannot apply standard stat analysis like t-test, linear or log regression bc of censoring.
Kaplan-Meier (proportion, censored survival times, visualize the curve but need to test to evaluate them formally), Log-rank test (survival distribution of 2+ groups), Cox proportional hazards model (most popular, for multiple variables, HR and 95% CI).
Normal distribution (mean=median), continuous data (interval, ratio), assume the data being investigated has variances that are homogenous bt groups investigated (homoscedasticity)
Don't meet parametric criteria, not normal distribution, discrete data
Hierarchy of study designs
Systematic reviews/meta-analysis> RCT> Cohort> Case-control> Cross-sectional> Case series/case reports> reviews
Systematic, non-random variation in study methodology and conductance, introducing error in outcome interpretation, can occur in all aspects of the study design.
Types= selection (subj chosen for case and control group differ in characteristics that alter outcome of the study); observational/information (inaccurate recording); recall (birth defects secondary to medications); interviewer (interviews not conducted same or by the same person); misclassification (differential, non-differential).
Variable that affects the independent/dependent variable, altering ability to determine the true effect on the measured outcome; hide or exaggerate true association. Can be minimized via: design (randomization, restriction, matching) and analysis (stratification, multivariate analysis), collecting alot of info and good inclusion/exclusion criteria.
Case reports/case series
Document/describe experiences, not a study, allows hypothesis generation, easy, inexpensive, does not est causality/association. Case report=one patient. Case series= more than one patient.
No investigator intervention, for ASSOCIATION, (impact) not causation. Three kinds: cross-sectional, case control, cohort
Cross-Sectional studies (prevalence study)
Identify prevalence of characteristics of a condition in a group.
ADV=easy design, snapshot in time/all data collected at one time, use questionnaire/interview/or biomedical info (lab values)
DisADV=cannot study factors over time/just at the time of assessment, hard to study rare dx
Determine the association between exposure/risk factors and disease/outcome. Retrospective. Useful for rare diseases or ones that take a long time to develop. Ex. ASA use and Reyes syndrome.
ADV=inexpensive, quick, allow investigation of several possible exposures/associations.
DisADV=hard to control for confounding, obs/recall bias, selection bias (case and control matching is difficult).
Use Odds Ratio
Outcome/Disease -> riskfactor/exposure
Determine the association between exposure/risk factors and disease/outcome DEVELOPMENT. Allows estimation of the risk of outcome (and RR bt exposure groups). Described the incidence of natural hx of a disease and measure it in time sequence. Good study when randomization is unethical.
Ex. Framingham study.
Retrospective vs. Prospective
Uses Risk Ratio
Riskfactor/exposure -> outcome/disease
Retrospective (historical) Cohort studies
Begin and end in the present but backward look to collect info.
ADV=less$, less time, no loss to follow-up, can investigate issues not able to by RCT (or ethical/safety issues)
DisADV=only good as data available, hard to control for confounding via non statistical approaches, recall bias
Prospective (longitudinal) Cohort studies
Begin in present and progress forward, outcomes let in the future.
ADV=can control for confounding to a greater extent, easier to plan for data collection
DisADV=more$, more time, loss of subj follow-up, hard to study rare dx at a good cost
Odds vs. Risk
Risk is the probability that a person who has not developed the event will develop the event.
Odds is the probability of the event occurring compared with the probability that it will not occur.
Odds and Risk Ratios
For obs study only show association (not cause)
<1 = negative association (odds of exposure lower in dx group; risk of dx lower in exposed group)
1 = no association
>1 = positive association (odds of exposure greater in dx group; risk of dx greater in exposed group)
Odds of exposure to a factor in those with a condition or disease compared with those who do not have the condition or disease.
Used more for retrospective
Risk Ratio/Relative Risk (RR)
Risk of an event/dx relative to the exposure, risk of someone developing a condition when exposed compared with someone who has not been exposed.
Hazard Ratio (HR)
Used in trails with time-to-event (or survival analysis). Assume that the ratio is constant over time. Type of RR, where HR is at any given point in time during the trial. RR is at the end of the trial.
Relative risk reduction (RRR)
The proportional difference in rates of negative outcomes bt experimental and control groups. How much is the risk reduced in the tx group vs. the control group. Sometimes larger numbers so studies might report this over ARR and this can be misleading.
= [Experiment event rate-control event rate]/control event rate
Absolute risk reduction (ARR)
The absolute mathematical difference in rates of negative outcomes bt experimental and control groups
More imp than relative risk reduction.
= [Experimental event rate-control event rate]
Number needed to treat (NNT)
The number of ppl who need to be treated with the intervention for a certain period of time in order to achieve the desired outcome in one patient.
Round up to nearest whole number. (Number needed to harm you round down).
Applied outcomes with dichotomous data (yes/no, alive/dead). Assumes the baseline risk is the same for all pt (or unrelated to RR).
Only done for statistical significance effects.
Probability of developing disease.
Incidence rate: # new cases of dx per population in a specified time.
= (ppl who develop dx during given time period)/(ppl who were at risk of developing a disease during same time period)
Number of ppl with have dx during a given time.
Point prevalence= prevalence on a given date.
Period prevalence= prevalence on a given time period.
Investigator makes intervention and evaluates cause/effect. Allows assessment of causality (sufficient cause, necessary cause, risk factor). Minimize bias (randomization, stratification). Treatment controls (placebo, active, historical). Blinding (triple=sub/investigator/analysis group unaware. double dummy=two placebos to match active and control therapies. open label=everyone aware. Crossover designs provide practical and statistical efficiency. Factorial design answers two research questions in one study.
Intention to treat analysis
Compares outcomes on the basis of initial group assignment or as randomized. Mimics real life clinical practice. Gives conservative estimate/may underestimate. Preferred type in superiority trial.
Per protocol analysis
Compares only those who completed the trial and adhered to the protocol. Gives info on tx efficacy, can be an overestimate, not generalizable to all patients (non-adherent)
As treated analysis
Subj analyzed by the actual intervention received. If subj in tx group did not take the active tx then data analyzed as if in placebo group. This analysis destroys randomization process for those non-adherent. These results should be interpreted with caution.
Literature review that collects and critically analyzes multiple research studies or papers. Meta-Analysis uses statistical methods to summarise the results of these studies.
Combined via forrest plot. Funnel plot for publication bias (symmetric inverted funnel means heterogeneity among the studies and less likely to have publication bias). Assessment of heterogeneity (statistical, x2 and Cochran Q).
Strobe- observational (cohort, case control)
PRISMA- systematic review, meta-analysis
Cost minimization analysis
Cost comparison with equal outcomes (already demonstrated equivalency)
ex. Comparing a brand to a generic drug equivalent
Cost effectiveness analysis (most common one used)
Comparison of alternative choices based on costs and effects simultaneously. Outcome measured in clinical units or cost per unit health outcome (ex. years of life save, number of symptom free days, BG, BP). Useful to measure health impact when health outcomes are improved.
Cannot compare different types of outcomes, ex cost effectiveness of a diabetes program (BG) vs asthma program (exacerbations).
Cost utility analysis (type of CEA)
Outcomes/consequences measured in "weighted" time gained (quality adjusted years or months of life), compares outcomes related to mortality when mortality may not be the most important outcome
Cost benefit analysis
Compares outcomes & costs using monetary units. Analysis of both cost of treatment and costs saved with beneficial outcome. Both benefit and outcomes expressed in dollars.
Proportion of true positives that are correctly identified by a test. High sensitivity means a negative test can rule OUT the disorder.
=1-Type II error
Proportion of true negatives that are correctly identified by a test. High specificity means a positive test can rule IN the disorder
=1-type I error
Positive Predictive Value
Proportion of patients with a positive test who actually HAVE the disease
Negative Predictive Value
Proportion of patients with a negative test who actually DON'T HAVE the disease
Postive Likelihood ratio
Negative likelihood ratio
Nuremberg code (1948)
Subj give informed voluntary consent; benefit should outweigh risk
Declaration of Helsinki (1964)
Governs international research ethics, defines rules for research combined with clinical care; basis for good clinical practice today
Tuskegee Syphilis Study (1972)
The study increased risks for subj; heightened awareness of a the need to protect human subj
Belmont Report (1978)
Statement of basic ethical principles and guidelines that assist in resolving the ethical problems that surround the conduct of research with human subj
Code of Federal Regulations (CFR 1981)
Regulations set by dept of health and human services and FDA based on the Belmont Report
Common Rule (1991)
Obtain/document consent, IRB membership, protection for certain subj (prego, prisoners, children), all federal funded research provide assurance on how it will protect human subj rights/welfare
Three levels of IRB review
Exemption from full IRB review; Expedited IRB review (cross sectional, case control); Full IRB review (RCT, cross sectional requiring bronchoscopy after admin of rx)