Biostats Final Terms
Terms in this set (54)
I have (one variable) categorical data what graph?
bar graph
i have one numerical variable what graph?
histogram or cumulative frequency distribution
i have 2 categorical variables what graph?
grouped bar graph or mosaic plots
i have one numerical and one categorical data what graph?
grouped histogram, cumulative frequency distribution, line plot,
2 numerical variables what graph?
scatter plot, line plot, map
If i multiply data by K what will happen to sd?
stay same
if multiply data by K what will happen to mean?
multiply by K
if multiply data by K what will happen to median?
multiply by K but stays in same position relative to data
if multiply data by K what will happen to variance?
multiply by K^2
if multiply data by K what will happen to coefficiant of variance?
stay the same
if multiply data by K what will happen to IQR?
multiply by K
the power of a significance test is the probability of what?
committing type II error
type II error is when
you accept a wrong null hypothesis
type I error is when
you reject a true null hypothesis
dbinom(expected #, n, p)
this command gives probability of getting that expected number with the entered # of trials and probability
sample of convienance
when a person is lazy and samples what is convienant to them
volunteer bias
when volunteers are not selected independantly and randomly
what 2 questions do I ask if I have numerical data?
discrete? or continuous?
what 2 questions do I ask if I have categorical data?
ordinal? or nominal?
if i have 2 numerical variables for data..what plot do I use?
scatterplot, histogram, or line graph (to show trend over time)
how do I calculate variance?
it is the sd ^ 2
what is equation for CV and what does CV tell me?
equation : 100% X s/y
tells me: ratio of sd to the mean...so how much variance is there in my data?
what are measurments found on box plot?
min and max (arms) 1 2 3 Q (2 is median) box= IQR
what does IQR tell me?
the spread of where 50% of the data is around the median
weird 1 - rule
at least: 1 - Pr(don't have it/none/didnt happen)
none: 1-Pr(have it/did happen)
probabilities and OR
add event probabilities
probabilities and AND
multiply event probabilities
when you see this card write out LAW of TOTAL PROBABILITY
when you see this card write out LAW of total probability
boxplot commands
boxplot(response~explanatory, data=data)
oorrr boxplot(data$colomn, data=data)
mosiac plot commands
mosaicplot(response~explanatory, data=data)
what does P value tell me?
the p value gives the probability of obtaining observations as extreme as or more so than x=_____, from a random sample of size n, if the null hypothesis is true
pnorm(p, mean= , sd= ,)
gives you areas to L of things chart in back of book gives things to the right
empirical rule
95% = data is within 2 sds
68% = data is within 1 sd
99.7%= data is within 3 sd
qnorm(.68)
this tells me where I should cut things off so I have 68% of values to the L (use this to find z*)
qnorm(percentile, mean from norm, sd from norm)
gives what values are at 75th percentile or whatever percentile you want
what command use for graphing a binom?
plotDist("binom", params= c(50, .5), col="Black")
rule of thumb for binoms
both np and n(1-p) has to be equal to at least 10 then binom is like norm(np, sqrt. np(1-p))
pbinom(x, n, p)
gives probability of everything greater than or equal to 10
What methods of CI are for z proportion tests?
agresti-Coull and wald (both equations given on final)
chisq.test's rules of thumb
- no expected count less than one
-no more than 20% of expected values <5
- if dont work use fisher.test
use t test when...
-dont know population deviation, but have s
- small sample size < 40
-n-1 = df
use z standardization test when...
-z value between sample and p values
use binom dist when...
-known # trials
-S or F for each trial
-same probability for each trial
-table, add up probabilities X 2 for each side
For Z test proportions
when mean = np
Reject when
- t
-z
x^2
reject when
- calculated values for each are bigger than star values then REJECT null
standard error measures what?
the precision of our mean measurement
when using random variable equations/properties and you are given an individual sd and want to know sd of a group use this equation:
sd of group = sd of individual / sqrt of n
when calculating expected counts what do you divide row total X column total by? what do you divide (o-e)^2 by?
to find expected values divide by grand total
- when calculating X^2 divide by expected count
what is df for X^2?
(#rows - 1) X (#coloumns-1)
true or false? : each t distribution follows 68-95-99.7% rule (empirical rule)
false
T or F? if x truly has a normal distribution then Pr(x=k), the probability that a randomly chosen X value equals a specified number K is zero
TRUE
if I decrease the sample size that makes the margin or error...
larger
a decreases in the level of confidence makes the margin of error
smaller
1 sample t test assumptions
independant, random, and identically distributed
