Search
Browse
Create
Log in
Sign up
Log in
Sign up
BIOE 315 Final
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (82)
Statistics Experimental Method
Hypothesis is tested & how ---> Data ---> Descriptive Statistics & Inferential Statistics
Numeric Variables ( 2 types)
Ratio and Interval
Ratio Variables
Numbers start at 0 and move up
2 types of Ratio Variables
continuous- measuring
discrete- counting
Interval Variables
numeric but not in order from 0 (temp eg.)
Categorical Variables (2 types)
ordinal and nominal
Ordinal Variables
categorical variables that are related
Nominal Variables
categorical variables that are not related
Measures of Accuracy
Accuracy- ability to hit target
Precision- ability to hit same spot over & over
Sample Vs. Population
Eg sample: respondents to a poll
Eg. population: every student at Lehigh
cumulative frequency distribution
sum of values from frequency distribution; used to determine frequency up to a certain threshold
-can only be used for numeric or ordinal data
-can be absolute or relative
-very commonly used in pharma/drug studies
Types of studies ( 2 types)
Experimental and Observational
Experimental Studies
+ Hard numbers
+ Run stats
+ Reproducible
+ Causation
- Hard to do some things
- more resources
Observational Studies
+ dig through data and pull out
+ unexpected correlations
+ less resources
+ wider array of possibilities
+ stays intensive
- poor control
parameter
mu = sum xi / N
statistic
x = sum xi / n
frequency distribution (table)
-describes the frequency of specific measurement in a sample
-not necessarily normal
-should scale starting from 0
-if numeric and continuous bars should be touching (histogram)
-width of bars should be even
Sturges Rule
ideal # of intervals in a frequency distribution
Rice Rule
set of numbers of intervals to twice the cube root of the number of observations on a frequency table
outliers
poorly designed experiments can lead to sampling error
Descriptive Statistics
Mean
Median
Mode
--------
Range
Percentiles
Variance
Variability
3 Types of Measures of central tendency (Mean)
Sample mean (x)
Population Mean (mu)
Weighted Mean (use frequency table)
Median
-middle observation in set of observations
-if data is symmetric, median = mean
-median and mean are different if distribution is not symmetric
Advantages of Each Descriptive Stat
Mean: easy to find through sorting, tells about all data points, usually the best
Median: robust statistic, works well for larger amount of distributions
Mode: finding frequency adv. & which shows up most
Bi-modal distributions
-used to describe a population with two modes
-outliers make it less advantageous
- double peak on freq. dis plot
Type of Variable : best measure of central tendency
Nominal --> Mode
Ordinal --> Median
Interval/ Ratio (not skewed) --> Mean
Interval/Ratio (skewed) --> Median
Symmetric vs. (+) skewed vs. (-) skewed
Symmetric: Median = Mode = Mean
(+) Skew: Mode ---> Median --> Mean (L to R)
(-) Skew: Mean --> Median --> Mode (L to R)
Range
smallest to largest numbers duh
Vaiance
-how much samples deviate from the mean
- population (mu ^2)
-sample (x^2)
-SD is most common measure of variability
- larger sample size increases confidence in ability to tell difference between samples
Standard Deviation
assumes normal distribution, measure of variance
Coefficient of Variability (CoV)
CoV = S / x * 100
standard dev/ mean
Determining how probable an outcome is...
P1
P2
P3 * P4 .......etc.....
Permutations
-arrangements of objects in a particular sequence
- Linearly (order matters): n!/ (n-x)!
- Circularly: n!/ (n-x)!x
sets
collections of items
elements
each item in a set
mutually exclusive
two elements have no common elements; the sets are disjoint
Event probability (f/ n)
f (# of observations) / n (total events)
P( AnB): prob A and B
P(AuB): Prob A or B ( add them)
P(A|B): prob A given B
Normal distributions
-shape of graph changes depending on value of sigma
-standard z score graph
z score
-data must be normally distributed and you must know SD and mean
t-test
-used on smaller sample sizes typically
- Sx = ( s / n) ^ 1/2 or use variance to find error
- Ho is that means equal each other
- 2 categorical levels, 1 numerical (mean)
Type 1 Error
Reject Ho but Ho true
Type 2 Error
Fail to Reject Ho but Ho false
Correct decision in Error
Fail to reject Ho AND Ho true
Correct Rejection in Error
Reject Ho AND Ho false
Kurtosis: Deviations away from normalcy
(+) Leptokurtic: fewer values in shoulders, narrower
Normal
(-) Platykurtic: has more values in shoulders, wider
The Weinbull Distribution
- changing the data to fit a straight line
- alpha = scale parameter
-failure rate remains consistent
Displaying Variability BOX PLOT
know outlier, range, mean, SD on box plot
Two sample t-test
compare two random samples
-the larger the n, the more certain you can be that the difference in variance is statistically significant
- EG men and women GPA at Lehigh
F - test
- finds variance between 2 samples
- F = larger S^2 / smaller S^2
Fcrit= dfnum/dfden
Mann- Whitney test ( U table)
- alternative to a two-sample t- test
-take values for both groups and rank
- is there a stat difference between heights of men and women
FOR NON NORMAL
can answer: Did the horse racing finishes differ by horse sex (male vs. female)?
paired t-test
- data points are paired, like before and after treatments
- each sample measured twice (like coke vs. pepsi at a stand)
- dependent t-test
t= d/ (sd/n^.5)
(sd/n^.5)= stanadard error!!
ANOVA: analysis of variance
- MORE THAN 2 SAMPLES
- null hypothesis is that means equal each other
-used to test whether there are differences among more than 2 samples
- does not say which samples are different
- eg. compare mass of pigs with 4 different types of feed
Post Hoc Tests
- after running an ANOVA, post hoc tests are run to confirm differences between groups
- 3 types (Tukey, Tukey- Kramer, Dunnett)
Tukey test
all groups have the same number of samples in them
Tukey Kramer
some groups have the different numbers of samples
Dunnett
-comparing a control population to other populations
- only comparing multiple samples against a single control
-unlike other tests, it can be 1-tailed or 2-tailed
2 Factor ANOVA
eg. 4 drugs (factors) on men and women (samples), 8 combinations (cells)
-continuous, dependent variables, normally distributed
- 2 or more categorical, independent variables
- can look at effect of several factors using a single set of data
- simpler than performing a 1 way ANOVA for each factor
fixed effects in 2 factor ANOVA
-data has been gathered from all levels of the factor of interest
eg. purpose compare effects of 3 specific dosages of a drug on response
random effects in 2 factor ANOVA
-the factor has many possible levels of interest is in all possible levels, but only random sample of levels is included in the data
Nested ANOVA
- used when there are different subgroups within each of the different groups
-eg. which athletes have highest vertical leap, between soccer players at various positions, basketball players at various positions, and football players at various positions
MANOVA: multiple-variable ANOVA
ANOVA with several dependent variables
-cannot run data multiple ANOVAs on same data from different dependent variables
-textbook A influence on math and physics ability, textbook B influence on math and physics ability
-- math and physics ability are DEPENDENT variables
- advantages: one single test
- multiple types of MANOVA tests
types of MANOVA tests
1. Wilks Lambda: most common, lower values better
2. Pillais Trace V: more powerful, general use
3. Lawley Hotelling Tace U
4. Roy's Max root Theta: used when variables are uncorrelated
Regression Analysis
- two variables have ratio/interval scale
- 1 variable is dependent on the other, when they are not dependent you need a correlation
- yi = alpha + beta (xi)
when to perform a regression?
two or more continuous! variables, where one is dependent on the other and a linear relationship that homoscedastic
Linear regression 4 criteria
1. for any value x, there exists a normally distributed y-value
2. variances of distributions of y must be equal
3. relationship between x and y is linear
4. x value measurements contain negligible error compared to y
F test for regression
F = regression MS/ residual MS
or
[regression SS/ DF] / [residual SS/ DF]
Ho: b1=b2
Ha= not equal
r ^2 for regression
regression SS/ total SS
- values between 0 and 1
- describe how close points are to the line
ANCOVA: analysis of covariance
-testing more than 2 slopes!
- not means, slopes (Beta value)!
- mean is essentially a data point- slope describes the whole line
-used to remove effects of Covariate= confounding factor (controlled for parts of experiment)
Pie Chart of Variance
Group: between group variance
Error: within group variance
CoV
Linear Correlation Analysis (r^2)
- need bivariate normal distribution (both variables normal)
-r ( correlation coefficient)
- between -1 and 1
- p = population version of r
V = n-2 (# pairs of data)
Bonferroni Correction
-multiple different ways to adjust for type 1 error
-alpha/ #pairwise comparisons
- very conservative
Assumptions for multiple regressions
-y values are random and independent
-yi are normally distributed
-homoscedasticity
-xi are fixed effects with no error
-xi are not highly correlated
-more robust if n is large
CHI Squared (x^2)
-most useful to test goodness of fit because it works well with categorical data
-tests difference between two categorical variables
- does gender influence which holiday you prefer: beach or snow
- use p value
- Ho is they don't
-Ha is they do
- find p value compared to p= 0.05, if less (reject Ho) is more (fail to reject Ho)
Yates correction of continuity for x^2
-used to prevent overestimation of statistical significance for small data
-typically employed when one cell has value below 5
-over correct/ overly conservative
Kolmogorov - Smirnov (KS) Goodness of Fit
-instead of determining if data fits ind. values ---> test if it fits an expected distribution
-cumulative trace of expected values to be compared to where they actually are
Repeated Measures
-treatment levels: fixed factors
-source: random factors
-performed model: mixed model ANOVA
-randomized blocks with repetition
- allows to get better data with fewer subject
Testing for randomness sigma ^2/ mu
Uniform: <1
Random: =1
Clumped: >1
Poisson Distribution
-prob a given # of events occurring in a fixed interval of time
Error propagation
often times you have a uncertainty in measurements and then need to do math with measurements
- eg. PCR
Machine Learning
subset of AI
eg shirts
Supervised -- either classification (color) or regression (length)
Unsupervised -- either clustering (groups of clothes), dimension reduction (best outfits), or association (clothes most often wear)
Ridge regression
-technique used for analyzing multiple regression data that suffers from multicollinearity
classifying data
quantifying data --> setup + and - controls --> turn quantified data into decision function
YOU MIGHT ALSO LIKE...
Essentials of Business Research | Silver, Stevens, Kernek, Wrenn, Loudon
AcademicMediaPremium
$9.99
STUDY GUIDE
Statistics Final
69 Terms
sheila_irungu
Statistics
69 Terms
PrestonFrasch
Psychology Stat Unit 1-3 (Test 1)
61 Terms
rcmcgarry
OTHER SETS BY THIS CREATOR
Statistical Tests, BIOE 315 Final
132 Terms
samigoldbergg
immunology test 2
92 Terms
samigoldbergg
immunology exam 1
61 Terms
samigoldbergg
4th psych
186 Terms
samigoldbergg
;