Biostats Assumptions and Interpretations
Terms in this set (42)
Which assumptions are this:
1. Data comes from SRS (Simple Random Sample
2. Sampling Distribution of (x-bar) 3. is approximately normal CL Theorem
4. Requirement must know to give population standard deviation (sigma)
Z-Score
Which assumptions is this:
1. Data comes from SRS
2. The sample size follows a normal distribution
3. The population standard deviation is not given
T-Score
Which Assumptions Are these:
1. The data must come from a random sample or a randomized experiment
2. Observations are independent (and, essentially, the population should be much larger than the sample -> )
3. The sampling distribution of p-hat is approximately normal
Population Proportion
Which Assumptions are these:
1. Data come from SRS
2. Observation are independent
3. Two groups and samples are independent
P1(phat)-P2(phat) should be approximately normal
Two-Sample Assumptions
Which Assumptions Are these: 1.Data comes from a SRS
2. All observation are independent
3. The k levels of the categorical variable are mutually exclusive ( every observations only belongs to ONE of the k levels)
4. All the Expected counts are greater than or equal to 1 and no more than 20% (or 1 in every 5) can be less than
Chi Square
Which Assumptions are these:
1. The k samples must be independent SRS. The individuals in each sample are completely unrelated.
2. Each population represented by the k samples must be Normally distributed. However, the test
3. Is robust to deviations from Normality (skew, mild outliers) for large-enough samples. 3. The ANOVA F-test requires that all k populations have the same standard deviation.
-(# large sd)/(smallest sd) < 2
ANOVA
Which Assumptions are these:
1. The observation are independent and random sample
2. The relationship is linear
3. The standard deviation of y1 sigma is the same for all values of x ----
4. The response y variables varies normally around its mean --------
*using residuals = yi - y(hat)
Linear Regression
What are nonparametric tests?
Alternative tests we can use when those assumptions are violated
Ex: When data are counts/frequencies rather than continuous variables
-Count/frequency data can't be normally distributed
-Ex: # of majors in each department in College of Liberal Arts (i.e., Psych, Engl, Soci, Phil, Comm, etc.)
What is the one sample normal test?
one-sample t-test
One sample Rank test?
Wilcoxon Signed Rank test
What causes a Wilcoxon Signed Rank Test?
- one sample
- Quantitative
- doesnt have a σ
- one sample t-procedure
- if NORMAlITY fails
What is a matched pairs test?
Apply one sample test to differences within pairs
What is a Two Independent Sample NORMAL test?
Two Sample T-Test
What is a Two Independent Sample Ranked Test?
WIlcoxon rank sum test (Mann-Whitney Test)
What causes a WIlcoxon rank sum test (Mann-Whitney Test)?
-One Quantitative and One Categorical Variable
- K(levels) = 2
- Normality Fails
What is a several independent samples Normal test?
One-way ANOVA F Test
What is a several independent samples Rank test?
Kruskal Wallis Test
What causes a Kruskal Wallis Test?
-One Quantitative & One Categorical Variable
- Levels are greater than or equal to 2
- Normality Fails
What is a Two Categorical Variable Test?
Two Sample proportions & Chi Square
What makes a two sample proportion test?
Both have 2 levels and one is success/ failure
What makes a chi Square test of independence?
-Any other combos
-It has relationship between two variables
What make fishers exact test?
-Any other combos
-It has relationship between two variables
- If expected counts assumption not satisfied
What is a one-variable categorical test?
-Asks for K?
-One sample proportion and chi square goodness of fit test
WHat Makes a one sample proportion test
K=2 success vs failures
WHat makes a chi square goodness of fit tets?
K is greater than or equal to and invested in all levels
Multivariable linear regression
Response variable: continuous variable Explanatory variables: continuous or categorical
Fisher's Exact Test (Chi-
Square Test)
Response variable - Categorical Explanatory variable - Categorical
Logistic Regression
Response variable - categorical
Explanatory Variables - continuous or categorical variables Binary
What is the Hypothesis test for Z Score
One Sided: Ho: u = 1Ha: u<1 or u>1
Two Sided:
Ho: u=1
Ha u =/ 1 (2P)
What is the Hypothesis test for T Score
One Sided: Ho: u = 1Ha: u<1 or u>1
Two Sided: Ho: u=1
Ha u =/ 1(2P)
Match Pairs (difference)
Ho: Md = 0 - no effect/ no difference Ha: Md > 0 (1 sided)
What is the Hypothesis test for Two Sample test?
Ho: p=,</=,>/= p0 ( a given value we are testing)
Ha: p<,>,=/ p0 ( one sided vs two sided alternative)
What is the Hypothesis test for chi square?
Hypothesis: Null Hypothesis - There is no association and Alternative is there is an association between two variables
Hypothesis- Goodness of Fit
H0: each value equals proportions
Ha: At least one of the values are different
Hypothesis for ANOVA
h0: U1=U2=U3
Ha: at least one population mean (ui) is different
Hypothesis for Linear Regression
H0: B1 = 0HA:
B1=/ 0 (2 tails )
Extra Information for T Score
When n< 15, the data must be close to Normal and without outliers.
When 15 > n > 40, mild skewness is acceptable, but no outliers.
When n> 40, the t statistic will be valid even with strong skewed
Extra Info for Two Sample
Two Sample Rule of Them for Confidence Interval
# of success >/= 10
# of failures >/= 10
Two Sample Rule of Them for Hypothesis Test
# of success >/= 5
# of failures >/= 5
Extra Info for Chi Square
High test statistic = smaller pvalue=reject null hypothesis
Small test statistic = large pvalue = fail to reject null hypothesis
Area to right
All expected counts have values >/= 1.0
No more than 20% of the k expected counts have values <5
Chi Square Goodness of Fit and Chi Square Independent
Chi Square Independence is 2 categorical variables and see if there is some association between the two
Pvalue Interpretation
If the null hypothesis were true and the study was repeated many times, we would expect to see atleast statistic of ___ or more extreme, PVALUE percent of the time.
slope interpretation
For every 1 increase in (PREDICTOR VARIABLE), the predicted mean for (RESPONSE VARIABLE) (increases/decreases) by SLOPE.
r squared interpretation
The proportion of variation is that explained by x. The regression model explains not even 10% of the variation in y.
y-intercept interpretation
When (x=0 context) the predicted (y-context) is (y-intercept).
R regression interpetation
The linear R coefficient measures the strength of the linear association between x and y of the quantitative values in the example. The correlation coefficient is the measure of the how strength and direction of a linear relationship.
