Search
Create
Log in
Sign up
Log in
Sign up
BCPS Statistics
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (89)
NOIR
Data types: nominal, ordinal, interval, ratio
Nominal
Yes or No variables or sex, mortality, dz presence, race, marital status
Ordinal
Ranked in order but no magnitude of ranks. Ex: NYHA, scale
*means and SD should not be reported
Interval
Data ranked in order. Zero is arbitrary. Points can pass zero. Ex: Fahrenheit temperature
Ratio
Measurables and continuous. The values don't cross zero. Ex: HR, BP, time, distance
Visual methods of describing data (3)
Frequency distribution, histogram, scatterplot
Measures of central tendency
Mean, median, mode
Mean
Generally used for continuous and normally distributed data
Median
Midpoint of values when placed in order from highest to lowest
*ordinal or continuous data
Mode
Most common value
* used for nominal, ordinal or continuous
*does not help describe meaningful distributions with a large range of values each occurring infrequent
2 types of discrete variables
Nominal and ordinal
Nominal data
Unordered data. Ex-sex, mortality, disease presence, race, marital status
Ordinal data
Ranked in "order" but no consistent level of magnitude of difference between ranks. 1 and 2 in the hf scale doesn't match up as the regular number 1 and 2. The interval between the numbers doesn't mean the same thing.
Common error with ordinal data
Means and SD should not be reported
Interval scale
Data ranked in a specific order with a chance consistent change in magnitude between units. The zero point is arbitrary. Ex-degrees Fahrenheit. You can have a negative temperature
Ratio scale
Interval but with an absolute zero
-degrees kelvin
-age
-hr
-BP
-time
-distance
Visual methods of describing data (3)
-frequency distribution
-histogram
-scatterplot
Numerical methods of describing data (3)
-mean
-median
-mode
Mean used for what type of data?
-continuous and
-normally distributed
Median AKA
50th percentile
Median used for what type of data?
-ordinal or
-continuous
-especially good for skewed populations
-insensitive to outliers
Continuous variables AKA and types of
-counting variables
-interval & ratio
Mode used for what types of data?
-most common
Value in a distribution
-nominal, ordinal or continuous
Standard deviation is
Most common measure used to describe a spread of data
Standard deviation data has to be...
-continuous
-normally distributed
Variance equation
Standard deviation squared
Coefficient of variation meaning
Relates the mean and the SD
Coefficient of variation equation
SD/mean * 100%
Standard deviation sample values
68% of the sample values falls in 1 SD, 95% in 2 SD, and 99% in 3 SD
Variance describes
The variability
Range
-The difference between the smallest and largest value in a data set.
-does not provide a lot of info
-sensitive to outliers
3 ways to measure variability
-standard deviation
-range
-percentiles
Percentiles and population reporting
-iqr
What does iqr encompass?
-the 25th to 75th percentile
Which measure of central tendency should NOT be used for ordinal data?
-means and sd should not be used
Best measure of central tendency for ordinal data
Median and IQR
Best measurement of central tendency for continuous data
Means and SD
How to tell if data normally distributed?
Eyeball it or if mean and median look close its likely it is
Formal test to tell if data is normally distributed
Kolmogorov-Smirnov test
Another indicator data isn't normally disttibuted
If the SD is larger relative to the mean it's likely it isn't normally distributed
Parametric data
Mean and SD parameters that define a normally distributed population
SEM
Quantifies the uncertainty and the estimate of the mean
SEM calculation
SD/square root of the n
Why do you need to calculate SEM?
-calculate confidence intervals
-hypothesis testing
-deception
Does the CI and the p-value need to be reported?
No
Is a CI that includes zero statistically significant?
No
Null hypothesis (Ho)
No difference between groups being compared. If this is rejected there is a statistically significant difference
Alternative hypothesis (Ha)
Opposite of null hypothesis. It is stated there is a difference
Parametric test assumptions
-normally distributed
-continuous data
-variances are approximately equal
Non parametric tests
-Data is not normally distributed
-discrete data
Student t test-parametric or non parametric
-parametric
Student t test
-compares study sample with the known population mean
Ex: ldl samples of meeting goers versus the population average
3 types of t tests
-Student t test
-2 sample, independent sample or unpaired test
-paired t test
2 sample, independent or unpaired test
Compared the means of two independent samples
Paired t-test
Compares the mean difference of paired or matched samples
-2 samples from each person at a different time period
Paired t-test parametric or non parametric test
Parametric
Anova
-compares the means of three or more groups in a study
Anova test parametric or non parametric test
Parametric
Anova limitations
-we only know one of the groups are different when the p value is statistically significant. We don't know which one of the three is different, post hoc analysis needs to be done
Post hoc done
-after Anova shows a statistically significant difference
Post hoc tests (4)
-Tukey HSD
-bonferroni
-scheffe
-Newman-keuls
Non parametric tests
-wilcoxon rank sum or Mann Whitney U (same test)
-Kruskal-wallis one way Anova
Non parametric tests, type of data and distribution
-ordinal or counting data
-data not normally distributed
Wilcoxon rank sum/Mann Whitney u test
-2 independent samples
-not normally distributed
-related to the t-test
Kruskal-wallis one way Anova
-related to one way Anova
-not normally distributed data
Test done after Kruskal-Wallis for post hoc testing
Wilcoxon rank
Sum test
Nominal data tests
-chi squared tests
-fisher exact test
Chi square test
-nominal data
-see them as baseline characteristic tables
Fishers exact
-specialized version of the chi square test for small groups containing <5
Type I error
-significance level
-alpha=0.05
Type II error
-concluding no
Difference exists when one truly does
-beta set between 0.2 and 0.1
Power equation
1-beta
Power of the study
Our ability to detect statistical differences when they do exist
P-value
probability the results happened by chance alone:
0.05
Lower P value
-doesn't suggest more importance to that finding but less likely to occur due to chance
Correlation
The strength of the association between 2 variables
Regression
The ability of one or more variable (independent) to predict the dependent variable
Pearson Correlation
-measured with a correlation coefficient
-degree of association between two variablesq
-normally distributed data
-continuous data
-influenced by sample size
Pearson Correlation Value
-1 to 1
1-positive linear relationship. More highly correlated the two variables.
-1: perfect negative linear relationship
-it's value is the (r)
Do not use correlation
When there is a non linear relationship
Kaplan Meier
-Studies the time between entry in a study and some event
-not all subjects enter the study at the same time
2 Kaplan Meier tests
-log rank test
-cox proportional hazards model
Cox proportional Hazards model
-most popular method to evaluate the impact of covariates
-allows calculation of a hazard ratio (and CI)
Regression
-most effective way to develop models to predict outcomes or variables
-makes predictions
Correlation analysis
-used to assess the association between two or more variables
-does not make predictions
RR: incidence of exp/incidence of control
Incidence of MI in aspirin group is 80% of that in the control group
RRR: relative risk reduction
Aspirin decreased risk of MI by 20%
RRR reduction is basically what matters.
Odds ratio:
estimates the RR in risk in retrotspective studies such as case series.
RR calculated for what type of studies
case series.
;