Upgrade to remove ads
Statistical Tests, BIOE 315 Final
Terms in this set (132)
Do SAT tests differ for low, middle, and high income students?
*Compares observed frequencies to expected frequencies
* Nonparametric test
* Can analyze data on 1 or 2 dimensions
Is the distribution of sex and voting behavior due to chance? Or is there a difference between the sexes on voting behavior?
t - test
Look at differences between two groups on a variable such as male/female; undergrad / grad, etc.
Do males and females differ in the amount of time they spend shopping in a given month?
*Tests the significance of group differences between two or more groups.
*The IV has 2 or more categories.
*This test only determines that there is a difference between the groups, but it doesn't tell which is different.
Same as ANOVA, but adds control of one or more covariates that may influence the DV.
Do SAT scores differe for low, middle, and high income students after controlling for single/dual parenting?
Does ethnicity affect reading achievement, math achievement, and overall scholastic achievement among 6th graders?
If there are two or more dependent variables, what test do you use?
Same as MANOVA, but adds control of one or more covariates that may influence the DV
Does ethnicity affect reading achievement, math achievement, and overall scholastic achievement among 6th graders after after controlling for social class?
The Chi-Square, Mann Whitney U test and the Wilcoxon Matched Pairs Test are all
Nonparametric tests are used to analyze data collected on variables that have been measured on ___ and ____ scales.
This type of test requires data to be reported in terms of frequences and the expected frequency for any one category be no less than 5 and that the observations be independent.
You want to compare two groups (males and females) on their responses on a 3-point rating scale. What is the appropriate statistical test?
a)Paired samples t-test
b)Independent samples t-test
What test would you use to compare the number of people who prefer one of four political candidates:
b)Independent samples t-test
a)Type I error.
A rejection of the null hypothesis when in reality it should have been accepted is described as a
a)Type I error.
b)Type II error.
d)test of significance.
a)t test for independent samples
Mr. Marino has identified two groups of students to participate in his study examining the effectiveness of using algebra tiles. One group will use these manipulatives while a second group will receive a traditional lecture approach. Which test should be used to test the differences between the mean scores for the two classes?
a)t test for dependent samples
b)t test for independent samples
d)Scheffé post hoc comparison
t test for dependent samples
Mr. Arroyo is interested in the gains made by his students from pretest to posttest. Which statistical test of significance is appropriate for him to use?
b)t test for dependent samples
c) t test for independent samples
d)factorial analysis of variance
Type of anova used when an experiment involves more than one independent variable. This analysis can separate the effects of different levels of different variables. a) ANOVA b)MANOVA c) ANCOVA d)factorial analysis of variance
You want to compare three groups on a measure of extraversion. You are satisfied that the dependent variable is normally distributed. What is the appropriate statistical test?
a)Paired samples t-test
d)Independent samples t-test
Mann-Whitney U Test
Test like a t test that is for ordinal (rank) data; ex. compare students in grades 9-12 in terms of educational goals. Uncorrelated means / uncorrelated / unmatched means.
Statement that there is no relationship between an IV and DV
Continuous normal distribution
The ____ test is used to determine whether 2 sample means are significantly different.
The ___ is used when determining whether sample means from more than 2 groups are significant.
ANCOVA (or covariance)
A test for two or more groups while controlling for extraneous variables (covariates) is
____ is used when there's more than one level of a single IV.
interval or ratio
Pearson correlation (r) is used for ____ or _____ data.
Spearman correlation is used for ____ data.
Statistics Experimental Method
Hypothesis is tested & how ---> Data ---> Descriptive Statistics & Inferential Statistics
Numeric Variables ( 2 types)
Ratio and Interval
Numbers start at 0 and move up
2 types of Ratio Variables
numeric but not in order from 0 (temp eg.)
Categorical Variables (2 types)
ordinal and nominal
categorical variables that are related
categorical variables that are not related
Measures of Accuracy
Accuracy- ability to hit target
Precision- ability to hit same spot over & over
Sample Vs. Population
Eg sample: respondents to a poll
Eg. population: every student at Lehigh
cumulative frequency distribution
sum of values from frequency distribution; used to determine frequency up to a certain threshold
-can only be used for numeric or ordinal data
-can be absolute or relative
-very commonly used in pharma/drug studies
Types of studies ( 2 types)
Experimental and Observational
+ Hard numbers
+ Run stats
- Hard to do some things
- more resources
+ dig through data and pull out
+ unexpected correlations
+ less resources
+ wider array of possibilities
+ stays intensive
- poor control
mu = sum xi / N
x = sum xi / n
frequency distribution (table)
-describes the frequency of specific measurement in a sample
-not necessarily normal
-should scale starting from 0
-if numeric and continuous bars should be touching (histogram)
-width of bars should be even
ideal # of intervals in a frequency distribution bin size= 1+ ln (n)/ln (2)
set of numbers of intervals to twice the cube root of the number of observations on a frequency table
poorly designed experiments can lead to sampling error
3 Types of Measures of central tendency (Mean)
Sample mean (x)
Population Mean (mu)
Weighted Mean (use frequency table)
-middle observation in set of observations
-if data is symmetric, median = mean
-median and mean are different if distribution is not symmetric
Advantages of Each Descriptive Stat
Mean: easy to find through sorting, tells about all data points, usually the best
Median: robust statistic, works well for larger amount of distributions
Mode: finding frequency adv. & which shows up most
-used to describe a population with two modes
-outliers make it less advantageous
- double peak on freq. dis plot
Type of Variable : best measure of central tendency
Nominal --> Mode
Ordinal --> Median
Interval/ Ratio (not skewed) --> Mean
Interval/Ratio (skewed) --> Median
Symmetric vs. (+) skewed vs. (-) skewed
Symmetric: Median = Mode = Mean
(+) Skew: Mode ---> Median --> Mean (L to R)
(-) Skew: Mean --> Median --> Mode (L to R)
smallest to largest numbers duh
-how much samples deviate from the mean
- population (mu ^2)
-SD is most common measure of variability
- larger sample size increases confidence in ability to tell difference between samples
assumes normal distribution, measure of variance
Coefficient of Variability (CoV)
CoV = S / x * 100
standard dev/ mean
Determining how probable an outcome is...
P3 * P4 .......etc.....
-arrangements of objects in a particular sequence
- Linearly (order matters): n!/ (n-x)!
- Circularly: n!/ (n-x)!x
collections of items
each item in a set
two elements have no common elements; the sets are disjoint
Event probability (f/ n)
f (# of observations) / n (total events)
P( AnB): prob A and B
P(AuB): Prob A or B ( add them)
P(A|B): prob A given B
-shape of graph changes depending on value of sigma
-standard z score graph
-data must be normally distributed and you must know SD and mean
-used on smaller sample sizes typically
- Sx = ( s / n) ^ 1/2 or use variance to find error
- Ho is that means equal each other
- 2 categorical levels, 1 numerical (mean)
Type 1 Error
Reject Ho but Ho true
Type 2 Error
Fail to Reject Ho but Ho false
Correct decision in Error
Fail to reject Ho AND Ho true
Correct Rejection in Error
Reject Ho AND Ho false
Kurtosis: Deviations away from normalcy
(+) Leptokurtic: fewer values in shoulders, narrower
(-) Platykurtic: has more values in shoulders, wider
The Weinbull Distribution
- changing the data to fit a straight line
continous probability dist
- alpha = scale parameter
-failure rate remains consistent
Displaying Variability BOX PLOT
know outlier, range, mean, SD on box plot
Two sample t-test
compare two random samples
-the larger the n, the more certain you can be that the difference in variance is statistically significant
- EG men and women GPA at Lehigh
F - test
- finds variance between 2 samples
- F = larger S^2 / smaller S^2
Mann- Whitney test ( U table)
- alternative to a two-sample t- test
-take values for both groups and rank
- is there a stat difference between heights of men and women
FOR NON NORMAL
can answer: Did the horse racing finishes differ by horse sex (male vs. female)?
- data points are paired, like before and after treatments
- each sample measured twice (like coke vs. pepsi at a stand)
- dependent t-test
t= d/ (sd/n^.5)
(sd/n^.5)= stanadard error!!
ANOVA: analysis of variance
- MORE THAN 2 SAMPLES
- null hypothesis is that means equal each other
-used to test whether there are differences among more than 2 samples
- does not say which samples are different
- eg. compare mass of pigs with 4 different types of feed
Post Hoc Tests
- after running an ANOVA, post hoc tests are run to confirm differences between groups
- 3 types (Tukey, Tukey- Kramer, Dunnett)
all groups have the same number of samples in them
some groups have the different numbers of samples
-comparing a control population to other populations
- only comparing multiple samples against a single control
-unlike other tests, it can be 1-tailed or 2-tailed
2 Factor ANOVA
eg. 4 drugs (factors) on men and women (samples), 8 combinations (cells)
-continuous, dependent variables, normally distributed
- 2 or more categorical, independent variables
- can look at effect of several factors using a single set of data
- simpler than performing a 1 way ANOVA for each factor
fixed effects in 2 factor ANOVA
-data has been gathered from all levels of the factor of interest
eg. purpose compare effects of 3 specific dosages of a drug on response
random effects in 2 factor ANOVA
-the factor has many possible levels of interest is in all possible levels, but only random sample of levels is included in the data
- used when there are different subgroups within each of the different groups
-eg. which athletes have highest vertical leap, between soccer players at various positions, basketball players at various positions, and football players at various positions
MANOVA: multiple-variable ANOVA
ANOVA with several dependent variables
-cannot run data multiple ANOVAs on same data from different dependent variables
-textbook A influence on math and physics ability, textbook B influence on math and physics ability
-- math and physics ability are DEPENDENT variables
- advantages: one single test
- multiple types of MANOVA tests
types of MANOVA tests
1. Wilks Lambda: most common, lower values better
2. Pillais Trace V: more powerful, general use
3. Lawley Hotelling Tace U
4. Roy's Max root Theta: used when variables are uncorrelated
- two variables have ratio/interval scale
- 1 variable is dependent on the other, when they are not dependent you need a correlation
- yi = alpha + beta (xi)
when to perform a regression?
two or more continuous! variables, where one is dependent on the other and a linear relationship that homoscedastic
Linear regression 4 criteria
1. for any value x, there exists a normally distributed y-value
2. variances of distributions of y must be equal
3. relationship between x and y is linear
4. x value measurements contain negligible error compared to y
F test for regression
F = regression MS/ residual MS
[regression SS/ DF] / [residual SS/ DF]
Ha= not equal
r ^2 for regression
regression SS/ total SS
- values between 0 and 1
- describe how close points are to the line
ANCOVA: analysis of covariance
-testing more than 2 slopes!
- not means, slopes (Beta value)!
- mean is essentially a data point- slope describes the whole line
-used to remove effects of Covariate= confounding factor (controlled for parts of experiment)
Pie Chart of Variance
Group: between group variance
Error: within group variance
Linear Correlation Analysis (r^2)
- need bivariate normal distribution (both variables normal)
-r ( correlation coefficient)
- between -1 and 1
- p = population version of r
V = n-2 (# pairs of data)
-multiple different ways to adjust for type 1 error
-alpha/ #pairwise comparisons
- very conservative
Assumptions for multiple regressions
-y values are random and independent
-yi are normally distributed
-xi are fixed effects with no error
-xi are not highly correlated
-more robust if n is large
CHI Squared (x^2)
-most useful to test goodness of fit because it works well with categorical data
-tests difference between two categorical variables
- does gender influence which holiday you prefer: beach or snow
- use p value
- Ho is they don't
-Ha is they do
- find p value compared to p= 0.05, if less (reject Ho) is more (fail to reject Ho)
Yates correction of continuity for x^2
-used to prevent overestimation of statistical significance for small data
-typically employed when one cell has value below 5
-over correct/ overly conservative
-reduces the chi-squared value obtained and thus increases its p-value.
Kolmogorov - Smirnov (KS) Goodness of Fit
-instead of determining if data fits ind. values ---> test if it fits an expected distribution
-cumulative trace of expected values to be compared to where they actually are
-treatment levels: fixed factors
-source: random factors
-performed model: mixed model ANOVA
-randomized blocks with repetition
- allows to get better data with fewer subject
Testing for randomness sigma ^2/ mu
-prob a given # of events occurring in a fixed interval of time
often times you have a uncertainty in measurements and then need to do math with measurements
- eg. PCR
subset of AI
Supervised -- either classification (color) or regression (length)
Unsupervised -- either clustering (groups of clothes), dimension reduction (best outfits), or association (clothes most often wear)
-technique used for analyzing multiple regression data that suffers from multicollinearity
quantifying data --> setup + and - controls --> turn quantified data into decision function
which study has higher probability of a type 2 error the 60 subject or the 100 subject?
which study has higher power, 60 subject or 100?
the power os a statistic is the ability to correctly reject null hypothesis when it is not correct,
which study has higher probability of type 1 error, 60 or 100?
both same proabbility since signficance level is the same
what is the shape of the F distribution?
skewed to the right (positively skewed)
F statistic in ANOVA, one tailed or two?
underlying assumptions for a one way ANOVA
1. variables are drawn from normally distributed populations
2. groups have equal variances
why do we use u to represent mean in hypotheses?
because we are making assumptions about mean of the populations from which the sample was drawn
2 way Anova hypothesis:
-factor A: anti depressent vs no medication
-factor B: male vs female
set up each Ho
Ho: HR is the same with antidepressent vs no medication (udrug= u no medication)
Ho: there is no difference in HR btwn men or women
Ho: no interaction between facors A and B (ie: between gender and anti dep use on the mean HR)
If the p-value is less than 0.05,
we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist
if p value is greater than .05
fail to reject the null
The measurable effect, outcome, or response in which the research is interested. (thing being measured- leaf damage!)
Data has been gathered from all the levels of the factor that are of interest
(ex: light level, drug dosage)
The factor has many possible levels, interest is in all possible levels, but only a random sample of levels is included in the data
ex: trees ?
nested anova example: leaf damage, light level= factor A, trees= factor B
Factor A Ho: Leaf damage in the shade = leaf damage in sun
Factor B Ho: Leaf damage for all trees are equal
you have 1 thing being measured: days until flowers wilt
severeal different factors that affect it (refrigriration, citric acid, length)
what type of ANOVA do you run?
3 way anova
2 major assumptions of MANOVA
1. equal variances
2. bivariate normality
how to prove equal variance in MANOVA?
standard deviations should all be rougly similar (variance is closely related to standard deviation of the normal distribution)
how to prove bivariate normality in MANOVA
historgrams showing they are all normally distrbuted (should be a freq plot)
linear regression hypothesis
Ha: b does not equal 0
YOU MIGHT ALSO LIKE...
Essentials of Business Research | Silver, Stevens,…
BCPS 2013: Biostatistics
10 - Inferential Statistics
Statistics for Data Analytics
OTHER SETS BY THIS CREATOR
BIOE 315 Final
immunology test 2