Chapter 18: Biostatistics - COMMUNITY AND PUBLIC HEALTH
Terms in this set (53)
Pieces of information collected by a researcher
-A variable that has no numeric representation
-If there are only two categories it is called: dichotomous
-Counted only in whole numbers
-# of children you have
-Qualitative or quantitative
Discrete data can be either:
nominal-named categories only
ordinal- categories in order
-Values range along a continuum, can be broken down into smaller and smaller units.
Continuous data is either
interval- same as ordinal, but with equal distances between variables (no true zero point)
ratio- same as interval with true zero point
Nominal and ordinal are _______.
Interval and ratio are ________.
Measures of central tendency
-Describes the central tendency of sample group
--The middle point of our data
--The purpose of central tendency is to identify a single score that serves as the best representative for an entire distribution, usually a score from the center of the distribution
--mean, median, mode
Extreme scores can affect the mean.
-Exact middle score in an ordered distribution of scores
-Not affected by extreme scores
-The point where 50% of the scores are above it and 50% of the scores are below it.
-Most frequently occurring number
-Can be unimodal, bimodal, multimodal, or have no mode at all
25th percentile=lower quartile or Q1
50th percentile=median or Q2
75th percentile=upper quartile or Q3
Positively skewed distribution
Negatively skewed distribution
Measures of dispersion aka measures of variation
Ordering the test scores from least to greatest, then subtract lowest from greatest
-This is the average of the squared differences from the mean
-For each number in your distribution, subtract the mean and then square the result
-Then average the squared numbers
the square root of the variance
-Measures how much variation there is from the average.
-It is the average distance from the mean
-Low standard deviation means score are more tightly clustered around the mean
-Large standard deviation scores means the scores are more spread out
Approximately __ % will score within 1 standard deviation of the mean
Approximately __% will score within 2 standard deviations of the mean
-Determines if a relationship exists between variables
Examples: Height and weight ,Hot weather and ice cream sales, demand for a product and its price
-Correlation does not imply causation
-The statistic used to measure correlation
signified by r
-Value of r signifies the strength and the direction of the relationship
-as one variable increases, the other also increases
Example: as the number of minutes spent lifting weights increases, the amount of weight a person is able to life also increases
-Income and education
-Inverse relationship between 2 variables
-As the value for one variable goes up, the value for the other variable goes down
-Amount of education and years spent in jail
GPA and amount of hours spent watching tv
-Either 1(positive) or -1(negative
-No correlation at all=0
-Some consider > 7 satisfactory to say there is a significant relationship
Pearson product moment correlation coefficient
-Both variables are continuous
-At least interval scaled
-Have a linear relationship
Spearman rank-order correlation coefficient
Used to correlate two ordinal variables
Low positive correlation
spread out on either sides on / line on graph (see slide 32)
U shaped across y axis (slide33)
Low negative correlation
spread out around \ line 34
dots everywhere, no line. 35
Presentation of the data: bar graph displays what?
nominal or ordinal data
Frequency polygon represents what?
what is a histogram
-A type of bar graph that is used to represent continuous data
-No spaces between the bars to indicate that it is continuous data
Frequency distribution presents data how?
in a way that shows the number of times each score occurs in the group of scores
what are Inferential statistics? two types?
Statistical tests used to test a hypothesis
hypothesis testing under certain criteria:
-Data is continuous
-Adequate sample size is used
-Population distribution is normal
-Group variances are equal
Parametric statistical tests
-T-test or student t-test: Used to analyze the difference between TWO means
-ANOVA:Analysis of variance- used to compare differences among THREE or more scores
T-test for independent samples compares what?
compares the means of two groups in an experiment
-The groups are randomly drawn from a population, unrelated
T-test for dependent samples t-test for correlated samples are used for what? describe groups involved...
to analyze data when only one independent variable is used
-One group before and after treatment
-Two groups who are matched on the basis of a variable known to be correlated with the independent variable
-Example: in fluoride toothpaste studies, experimental and control group are frequently matched for baseline DMF, age and gender because these variables correlate with dental caries.
Nonparametric tests most useful for what?
data on the nominal or ordinal scale
-Sample size may be small,
-Variables are discrete
-Used to determine whether a significant difference exists between frequency counts of nominal data by comparing observed frequencies with expected frequencies
-What you expect to happen and what actually happens
Chi square test of the independence of categorical variables compares what? similar to what?
-two or more data sets from different sample groups
-Similar to independent samples t-test
Chi square test for goodness of fit compares what?
Compare observed frequency in one group to the expected frequencies
Fisher test is used for
Used in place of chi square test when sample group is less than five
describe Nonparametric test
Kruskall-Wallis and Friedman matched pairs test are the nonparametric equivalent of the ANOVA
what is Power analysis?
1.Too few participants may mean what?
2. Too many participants may...
-A determination of how many subjects are needed for your study
1. that the results can't be generalized to the population
2. Too many participants may yield a statistically significant result that has no clinical significance
-Infers the true value of an unknown population parameter
-A range around a measurement that indicates how precise that measurement is
-Example: confidence interval of 95% indicates we are 95% certain that the values are between a certain range
95 % confidence interval for the mean plaque index of 1.16 in a sample of school children -we are 95% certain that the mean plaque index of the population is between 1.08 and 1.24
-How likely is it that you have come to a false conclusion?
-P value = the probability that the findings from a study are due to chance
-P value of .05 is commonly accepted as statistically significant in oral health research.
-Also see .01 and .001
-If p value is larger than .05 results are not statistically significant
P value is a calculated test statistic using the
sampling distribution of the test statistic and the size of the sample
It can be conducted with a mathematical formula, but usually done by computer
Regardless of what statistical test was used, p value is computed the same way
Type I error=
Alpha error=-null hypothesis rejected, but it is actually true
Type II error=
Beta error=null hypothesis accepted but it is actually false and should have been rejected