Lecture: Biostatistics
Terms in this set (47)
Descriptive Methods
______: Shape, Center,
Spread, Relative Position
______: Correlation, regression
______: Multiple regression
Univariate
Bivariate
Multivariate
_________
Applied to comparing the
means: t-test, ANOVA, Dep group t-tests
Other statistical methods
inferential statistics
There are three main measures of central tendency:
____
___
___
Mean
Median
Mode
The purpose of measures of _____ is to identify the location of the center of various distributions.
central tendency
The ____ is the observation that occurs the most frequently.
In the data mentioned in our example, we simply locate the observation that occurs the most frequently.
In this case, 16 occurs 9 times, hence 16 is the MODE for the data show in the example.
mode
is the middle observation in the data.
this means that 50% of the data is below the
median and 50% of the data is above the median.
To find median, organize the data in order from the smallest to the largest observation.
median
The ____ is the arithmetic average of all the observations in the data.
the mean is found by adding up all of the observations and dividing by the total number of observations, either N or n depending upon whether you are dealing with the population or sample.
mean
Relationship between mean, median, and mode.
The ___ of a distribution and whether any outliers are present have an affect on the '____' of the mean, median, and mode.
When a distribution is symmetric with no outliers, the mean, median, and mode will generally have values close to each other.
shape
closeness
When a distribution is skewed to the _____ the relationship between mean, median and mode is usually described by:
mode < median < mean (mode is the smallest and mean is the largest).
right
When a distribution is skewed to the ____ the opposite is generally true:
mean < median < mode (mean is the smallest and mode is the largest
left
The best measure of central tendency for skewed data is the _____.
This is because the ___is resistant to the more extreme values in a data set.
Even though, _____ is poor measure of central tendency, it is the only measure of central tendency that can be used for categorical data.
median
median
mode
Summary: The Mode is appropriate when...
The observation that is most frequently observed is desired.
A quick estimate of ____ is desired.
The data is _____. Do not use ___ when:
The data is ____, highly skewed, or ___, bc in these situations, the mode may provide an extremely poor estimate of the ____ of the distribution.
central tendency
categorical
mode
multi-modal
uniform
center
A more accurate measure of central tendency such as the __ or ___ is available.
mean or median
The median is appropriate when ...
The center or the middle value of the data set is desired.
One needs to determine whether additional data points fall either above or below the midpoint.
IF The data is ____. Outliers exist that will affect the ____..
highly skewed
mean
Do not use when MEDIAN:
data is ____ because the mean is preferred
symmetrical
The Mean is appropriate when....
The data is ___ or at least not really skewed. When the data is roughly symmetrical, the mean, median, and mode are all somewhat decent measures of central tendency.
Do not use when:
____
Outliers exist and affects ___ more than an acceptable amount.
symmetrical
skewed.
MEASURES OF VARIABILITY (4)
Range
Interquartile range
Variance
Standard Deviation
The most elementary measure of variation is _____.
RANGE
____- is defined as the difference between the largest and smallest values.
Example: 1,3,4,5,5,6,7,11 Range: 11-1 = 10
Range
The Nth percentile is defined as the value such that N percent of the values lie below it.
Quartiles and interquartile range
The lower quartile ( Q1) is defined as the ____ percentile; thus, ___ percent of the measures are above the lower quartile.
25th
75
The middle quartile ( Q2) is defined as the ____ , which is, in fact, the median of all the measures.
50th percentile
The upper quartile ( Q3) is defined as the _____ ; thus, only 25 percent of the measures are above the upper quartile.
75th percentile
is the value for Q3 - Q1. Like the range, it is a single value.
Example: 1, 3, 4, 5, 5, 6, 7, 11 IQR: 6 - 4 =2
The interquartile range ( IQR)
_____- is defined as the distance of the measurements away from the mean.
deviation
_____- is defined as the average of the squared differences from the Mean.
variance
It is the square root of variance ____ :
σ = √21,704 = 147.32... = 147
Thus, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra small.
Standard Deviation
Example: Your score in a recent test was 0.5 standard deviations above the average, how many people scored lower than you did?
Answer: Between 0 and 0.5 is 19.1% Less than 0 is 50% (left half of the curve) So the total less than you is: 50% + 19.1% = 69.1%
...
________- are a method for showing the frequency with which certain classes of values occur.
For instance, suppose you have the following list of values: 12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41.
look in notes for example*
Stem-and-leaf plots
______- Midpoints of the interval of corresponding rectangle in a histogram are joined together by straight lines. It gives a polygon
frequency polygon
_____- is a way of summarizing a set of data measured on an interval scale.
The picture produced consists of the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median.
boxplot (box and whisker plot)
line graph-
...
(based on pic in notes)
A public opinion survey explored the relationship between age and support for increasing the minimum wage.
In the 21 to 40 age group, what percentage supports increasing the minimum wage?
(A) 12.5%
(B) 20%
(C) 25%
(D) 50%
(E) 75%
50%
State the _____
Formulate an ____
Analyze _____
Interpret _____
hypothesis
analysis plan
sample data
results
_____- a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation
hypothesis
the null hypothesis is often denoted ___(read "H-nought")
H0
______- (H0) is a hypothesis which the researcher tries to disprove, reject or nullify
The null hypothesis
example of hypothesis and null hypothesis
H1: Tomato plants exhibit a higher rate of growth when planted in compost rather than in soil.
H0: Tomato plants do not exhibit a higher rate of growth when planted in compost rather than soil.
...
_______: The probability of incorrectly rejecting a true statistical null hypothesis. It occurs when we are observing a difference when in truth there is none, thus indicating a test of poor specificity.
i.e., that , the alpha, must be kept at or below .05
Type 1 error (Alpha)
_____- represents a "false positive" for the researcher's theory. From society's standpoint, such false positives are particularly undesirable. Follow-up research will usually not replicate the (incorrect) original work, and much confusion and frustration will result.
Type I error
_____- the error of failing to reject a null hypothesis when in fact we should have it.
In other words, this is the error of failing to observe a difference when in truth there is one, thus indicating a test of _______.
E.G: If a drug designed to improve a medical condition is found (incorrectly) not to produce an improvement relative to a control group, a worthwhile therapy will be lost, at least temporarily, and an experimenter's worthwhile idea will be discounted.
Type II error (Beta)
poor sensitivity
____- The null hypothesis is true, and we mistakenly reject it (____)
Type I
false positive
____- The null hypothesis is false, but we fail to reject it (____)
Type II
false negative
Specifically, the ______ represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population."
p-value
if ___ is less than the significance level we call it a ____.
p-value
null hypothesis
if the test statistic falls within the region of acceptance, the ____ is accepted
null hypothesis
Consider a study that shows a new therapy to be superior to the existing therapy. _____ calculates the probability that the results observed in a study may have been merely a chance finding.
Statistical significance
________- Is the difference between new and old therapy found in the study large enough for you to alter your practice? Because there is always a leap of faith in applying the results of a study to your patients (who, after all, were not in the study), perhaps a small improvement in the new therapy is not sufficient to cause you to alter your clinical approach. Note that you would almost certainly not alter your approach if the study results were not statistically significant (i.e. could well have been due to chance).
clinical significance
