Search
Browse
Create
Log in
Sign up
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Statistical Tests, BIOE 315 Final
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Key Concepts:
Terms in this set (132)
ANOVA
Do SAT tests differ for low, middle, and high income students?
Chi-Square Test
*Compares observed frequencies to expected frequencies
* Nonparametric test
* Can analyze data on 1 or 2 dimensions
Chi-Square Test
Is the distribution of sex and voting behavior due to chance? Or is there a difference between the sexes on voting behavior?
t - test
Look at differences between two groups on a variable such as male/female; undergrad / grad, etc.
Chi-Square Test
Do males and females differ in the amount of time they spend shopping in a given month?
ANOVA
*Tests the significance of group differences between two or more groups.
*The IV has 2 or more categories.
*This test only determines that there is a difference between the groups, but it doesn't tell which is different.
ANCOVA
Same as ANOVA, but adds control of one or more covariates that may influence the DV.
ANCOVA
Do SAT scores differe for low, middle, and high income students after controlling for single/dual parenting?
MANOVA
Does ethnicity affect reading achievement, math achievement, and overall scholastic achievement among 6th graders?
MANOVA
If there are two or more dependent variables, what test do you use?
MANCOVA
Same as MANOVA, but adds control of one or more covariates that may influence the DV
MANCOVA
Does ethnicity affect reading achievement, math achievement, and overall scholastic achievement among 6th graders after after controlling for social class?
nonparametric
The Chi-Square, Mann Whitney U test and the Wilcoxon Matched Pairs Test are all
nominal, ordinal
Nonparametric tests are used to analyze data collected on variables that have been measured on ___ and ____ scales.
Chi-Square test
This type of test requires data to be reported in terms of frequences and the expected frequency for any one category be no less than 5 and that the observations be independent.
Mann-Whitney U-test
You want to compare two groups (males and females) on their responses on a 3-point rating scale. What is the appropriate statistical test?
a)Paired samples t-test
b)Independent samples t-test
c)Mann-Whitney U-test
d)One-way ANOVA
a)Chi-square test
What test would you use to compare the number of people who prefer one of four political candidates:
a)Chi-square test
b)Independent samples t-test
c)Mann-Whitney U-test
d)One-way ANOVA
a)Type I error.
A rejection of the null hypothesis when in reality it should have been accepted is described as a
a)Type I error.
b)Type II error.
c)standard error.
d)test of significance.
a)t test for independent samples
Mr. Marino has identified two groups of students to participate in his study examining the effectiveness of using algebra tiles. One group will use these manipulatives while a second group will receive a traditional lecture approach. Which test should be used to test the differences between the mean scores for the two classes?
a)t test for dependent samples
b)t test for independent samples
c)Chi square
d)Scheffé post hoc comparison
t test for dependent samples
Mr. Arroyo is interested in the gains made by his students from pretest to posttest. Which statistical test of significance is appropriate for him to use?
a)Gain scores
b)t test for dependent samples
c) t test for independent samples
d)ANOVA
d)factorial analysis of variance
Type of anova used when an experiment involves more than one independent variable. This analysis can separate the effects of different levels of different variables. a) ANOVA b)MANOVA c) ANCOVA d)factorial analysis of variance
c)One-way ANOVA
You want to compare three groups on a measure of extraversion. You are satisfied that the dependent variable is normally distributed. What is the appropriate statistical test?
a)Paired samples t-test
b)Mann-Whitney U-test
c)One-way ANOVA
d)Independent samples t-test
Mann-Whitney U Test
Test like a t test that is for ordinal (rank) data; ex. compare students in grades 9-12 in terms of educational goals. Uncorrelated means / uncorrelated / unmatched means.
null hypothesis
Statement that there is no relationship between an IV and DV
Parametric tests
Continuous normal distribution
t test
The ____ test is used to determine whether 2 sample means are significantly different.
ANOVA
The ___ is used when determining whether sample means from more than 2 groups are significant.
ANCOVA (or covariance)
A test for two or more groups while controlling for extraneous variables (covariates) is
ANOVA
____ is used when there's more than one level of a single IV.
interval or ratio
Pearson correlation (r) is used for ____ or _____ data.
ordinal
Spearman correlation is used for ____ data.
Statistics Experimental Method
Hypothesis is tested & how ---> Data ---> Descriptive Statistics & Inferential Statistics
Numeric Variables ( 2 types)
Ratio and Interval
Ratio Variables
Numbers start at 0 and move up
2 types of Ratio Variables
continuous- measuring
discrete- counting
Interval Variables
numeric but not in order from 0 (temp eg.)
Categorical Variables (2 types)
ordinal and nominal
Ordinal Variables
categorical variables that are related
Nominal Variables
categorical variables that are not related
Measures of Accuracy
Accuracy- ability to hit target
Precision- ability to hit same spot over & over
Sample Vs. Population
Eg sample: respondents to a poll
Eg. population: every student at Lehigh
cumulative frequency distribution
sum of values from frequency distribution; used to determine frequency up to a certain threshold
-can only be used for numeric or ordinal data
-can be absolute or relative
-very commonly used in pharma/drug studies
Types of studies ( 2 types)
Experimental and Observational
Experimental Studies
+ Hard numbers
+ Run stats
+ Reproducible
+ Causation
- Hard to do some things
- more resources
Observational Studies
+ dig through data and pull out
+ unexpected correlations
+ less resources
+ wider array of possibilities
+ stays intensive
- poor control
parameter
mu = sum xi / N
statistic
x = sum xi / n
frequency distribution (table)
-describes the frequency of specific measurement in a sample
-not necessarily normal
-should scale starting from 0
-if numeric and continuous bars should be touching (histogram)
-width of bars should be even
Sturges Rule
ideal # of intervals in a frequency distribution bin size= 1+ ln (n)/ln (2)
Rice Rule
set of numbers of intervals to twice the cube root of the number of observations on a frequency table
outliers
poorly designed experiments can lead to sampling error
Descriptive Statistics
Mean
Median
Mode
--------
Range
Percentiles
Variance
Variability
3 Types of Measures of central tendency (Mean)
Sample mean (x)
Population Mean (mu)
Weighted Mean (use frequency table)
Median
-middle observation in set of observations
-if data is symmetric, median = mean
-median and mean are different if distribution is not symmetric
Advantages of Each Descriptive Stat
Mean: easy to find through sorting, tells about all data points, usually the best
Median: robust statistic, works well for larger amount of distributions
Mode: finding frequency adv. & which shows up most
Bi-modal distributions
-used to describe a population with two modes
-outliers make it less advantageous
- double peak on freq. dis plot
Type of Variable : best measure of central tendency
Nominal --> Mode
Ordinal --> Median
Interval/ Ratio (not skewed) --> Mean
Interval/Ratio (skewed) --> Median
Symmetric vs. (+) skewed vs. (-) skewed
Symmetric: Median = Mode = Mean
(+) Skew: Mode ---> Median --> Mean (L to R)
(-) Skew: Mean --> Median --> Mode (L to R)
Range
smallest to largest numbers duh
Vaiance
-how much samples deviate from the mean
- population (mu ^2)
-sample (x^2)
-SD is most common measure of variability
- larger sample size increases confidence in ability to tell difference between samples
Standard Deviation
assumes normal distribution, measure of variance
Coefficient of Variability (CoV)
CoV = S / x * 100
standard dev/ mean
Determining how probable an outcome is...
P1
P2
P3 * P4 .......etc.....
Permutations
-arrangements of objects in a particular sequence
- Linearly (order matters): n!/ (n-x)!
- Circularly: n!/ (n-x)!x
sets
collections of items
elements
each item in a set
mutually exclusive
two elements have no common elements; the sets are disjoint
Event probability (f/ n)
f (# of observations) / n (total events)
P( AnB): prob A and B
P(AuB): Prob A or B ( add them)
P(A|B): prob A given B
Normal distributions
-shape of graph changes depending on value of sigma
-standard z score graph
z score
-data must be normally distributed and you must know SD and mean
t-test
-used on smaller sample sizes typically
- Sx = ( s / n) ^ 1/2 or use variance to find error
- Ho is that means equal each other
- 2 categorical levels, 1 numerical (mean)
Type 1 Error
Reject Ho but Ho true
Type 2 Error
Fail to Reject Ho but Ho false
Correct decision in Error
Fail to reject Ho AND Ho true
Correct Rejection in Error
Reject Ho AND Ho false
Kurtosis: Deviations away from normalcy
(+) Leptokurtic: fewer values in shoulders, narrower
Normal
(-) Platykurtic: has more values in shoulders, wider
The Weinbull Distribution
- changing the data to fit a straight line
continous probability dist
- alpha = scale parameter
-failure rate remains consistent
Displaying Variability BOX PLOT
know outlier, range, mean, SD on box plot
Two sample t-test
compare two random samples
-the larger the n, the more certain you can be that the difference in variance is statistically significant
- EG men and women GPA at Lehigh
F - test
- finds variance between 2 samples
- F = larger S^2 / smaller S^2
Fcrit= dfnum/dfden
Mann- Whitney test ( U table)
- alternative to a two-sample t- test
-take values for both groups and rank
- is there a stat difference between heights of men and women
FOR NON NORMAL
can answer: Did the horse racing finishes differ by horse sex (male vs. female)?
paired t-test
- data points are paired, like before and after treatments
- each sample measured twice (like coke vs. pepsi at a stand)
- dependent t-test
t= d/ (sd/n^.5)
(sd/n^.5)= stanadard error!!
ANOVA: analysis of variance
- MORE THAN 2 SAMPLES
- null hypothesis is that means equal each other
-used to test whether there are differences among more than 2 samples
- does not say which samples are different
- eg. compare mass of pigs with 4 different types of feed
Post Hoc Tests
- after running an ANOVA, post hoc tests are run to confirm differences between groups
- 3 types (Tukey, Tukey- Kramer, Dunnett)
Tukey test
all groups have the same number of samples in them
Tukey Kramer
some groups have the different numbers of samples
Dunnett
-comparing a control population to other populations
- only comparing multiple samples against a single control
-unlike other tests, it can be 1-tailed or 2-tailed
2 Factor ANOVA
eg. 4 drugs (factors) on men and women (samples), 8 combinations (cells)
-continuous, dependent variables, normally distributed
- 2 or more categorical, independent variables
- can look at effect of several factors using a single set of data
- simpler than performing a 1 way ANOVA for each factor
fixed effects in 2 factor ANOVA
-data has been gathered from all levels of the factor of interest
eg. purpose compare effects of 3 specific dosages of a drug on response
random effects in 2 factor ANOVA
-the factor has many possible levels of interest is in all possible levels, but only random sample of levels is included in the data
Nested ANOVA
- used when there are different subgroups within each of the different groups
-eg. which athletes have highest vertical leap, between soccer players at various positions, basketball players at various positions, and football players at various positions
MANOVA: multiple-variable ANOVA
ANOVA with several dependent variables
-cannot run data multiple ANOVAs on same data from different dependent variables
-textbook A influence on math and physics ability, textbook B influence on math and physics ability
-- math and physics ability are DEPENDENT variables
- advantages: one single test
- multiple types of MANOVA tests
types of MANOVA tests
1. Wilks Lambda: most common, lower values better
2. Pillais Trace V: more powerful, general use
3. Lawley Hotelling Tace U
4. Roy's Max root Theta: used when variables are uncorrelated
Regression Analysis
- two variables have ratio/interval scale
- 1 variable is dependent on the other, when they are not dependent you need a correlation
- yi = alpha + beta (xi)
when to perform a regression?
two or more continuous! variables, where one is dependent on the other and a linear relationship that homoscedastic
Linear regression 4 criteria
1. for any value x, there exists a normally distributed y-value
2. variances of distributions of y must be equal
3. relationship between x and y is linear
4. x value measurements contain negligible error compared to y
F test for regression
F = regression MS/ residual MS
or
[regression SS/ DF] / [residual SS/ DF]
Ho: b1=b2
Ha= not equal
r ^2 for regression
regression SS/ total SS
- values between 0 and 1
- describe how close points are to the line
ANCOVA: analysis of covariance
-testing more than 2 slopes!
- not means, slopes (Beta value)!
- mean is essentially a data point- slope describes the whole line
-used to remove effects of Covariate= confounding factor (controlled for parts of experiment)
Pie Chart of Variance
Group: between group variance
Error: within group variance
CoV
Linear Correlation Analysis (r^2)
- need bivariate normal distribution (both variables normal)
-r ( correlation coefficient)
- between -1 and 1
- p = population version of r
V = n-2 (# pairs of data)
Bonferroni Correction
-multiple different ways to adjust for type 1 error
-alpha/ #pairwise comparisons
- very conservative
Assumptions for multiple regressions
-y values are random and independent
-yi are normally distributed
-homoscedasticity
-xi are fixed effects with no error
-xi are not highly correlated
-more robust if n is large
CHI Squared (x^2)
-most useful to test goodness of fit because it works well with categorical data
-tests difference between two categorical variables
- does gender influence which holiday you prefer: beach or snow
- use p value
- Ho is they don't
-Ha is they do
- find p value compared to p= 0.05, if less (reject Ho) is more (fail to reject Ho)
Yates correction of continuity for x^2
-used to prevent overestimation of statistical significance for small data
-typically employed when one cell has value below 5
-over correct/ overly conservative
-reduces the chi-squared value obtained and thus increases its p-value.
Kolmogorov - Smirnov (KS) Goodness of Fit
-instead of determining if data fits ind. values ---> test if it fits an expected distribution
-cumulative trace of expected values to be compared to where they actually are
Repeated Measures
-treatment levels: fixed factors
-source: random factors
-performed model: mixed model ANOVA
-randomized blocks with repetition
- allows to get better data with fewer subject
Testing for randomness sigma ^2/ mu
Uniform: <1
Random: =1
Clumped: >1
Poisson Distribution
-prob a given # of events occurring in a fixed interval of time
Error propagation
often times you have a uncertainty in measurements and then need to do math with measurements
- eg. PCR
Machine Learning
subset of AI
eg shirts
Supervised -- either classification (color) or regression (length)
Unsupervised -- either clustering (groups of clothes), dimension reduction (best outfits), or association (clothes most often wear)
Ridge regression
-technique used for analyzing multiple regression data that suffers from multicollinearity
classifying data
quantifying data --> setup + and - controls --> turn quantified data into decision function
which study has higher probability of a type 2 error the 60 subject or the 100 subject?
60 subject
which study has higher power, 60 subject or 100?
the power os a statistic is the ability to correctly reject null hypothesis when it is not correct,
100 subject
which study has higher probability of type 1 error, 60 or 100?
both same proabbility since signficance level is the same
what is the shape of the F distribution?
skewed to the right (positively skewed)
F statistic in ANOVA, one tailed or two?
one
underlying assumptions for a one way ANOVA
1. variables are drawn from normally distributed populations
2. groups have equal variances
why do we use u to represent mean in hypotheses?
because we are making assumptions about mean of the populations from which the sample was drawn
2 way Anova hypothesis:
-factor A: anti depressent vs no medication
-factor B: male vs female
testing HR
set up each Ho
Factor A
Ho: HR is the same with antidepressent vs no medication (udrug= u no medication)
Factor B
Ho: there is no difference in HR btwn men or women
Factor AxB
Ho: no interaction between facors A and B (ie: between gender and anti dep use on the mean HR)
If the p-value is less than 0.05,
we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist
if p value is greater than .05
fail to reject the null
dependent variable
The measurable effect, outcome, or response in which the research is interested. (thing being measured- leaf damage!)
fixed factor
Data has been gathered from all the levels of the factor that are of interest
(ex: light level, drug dosage)
random factor
The factor has many possible levels, interest is in all possible levels, but only a random sample of levels is included in the data
ex: trees ?
nested anova example: leaf damage, light level= factor A, trees= factor B
form hypothesis
Factor A Ho: Leaf damage in the shade = leaf damage in sun
Factor B Ho: Leaf damage for all trees are equal
you have 1 thing being measured: days until flowers wilt
severeal different factors that affect it (refrigriration, citric acid, length)
what type of ANOVA do you run?
3 way anova
2 major assumptions of MANOVA
1. equal variances
2. bivariate normality
how to prove equal variance in MANOVA?
standard deviations should all be rougly similar (variance is closely related to standard deviation of the normal distribution)
how to prove bivariate normality in MANOVA
historgrams showing they are all normally distrbuted (should be a freq plot)
linear regression hypothesis
Ho: b1=0
Ha: b does not equal 0
YOU MIGHT ALSO LIKE...
13
Essentials of Business Research | Silver, Stevens,…
BCPS 2013: Biostatistics
47 terms
10 - Inferential Statistics
62 terms
Statistics for Data Analytics
60 terms
OTHER SETS BY THIS CREATOR
gre math
21 terms
gre vocab
453 terms
BIOE 315 Final
82 terms
immunology test 2
92 terms