Search
Create
Log in
Sign up
Log in
Sign up
Stats Exam
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (73)
Individuals
objects described by a set of data. May be people, but they also may be animals or things.
Variable
Any characteristic of an individual. Can take different values for different individuals.
categorical variable
places an individual into one of several groups or categories
quantitative variable
takes numerical values for which arithmetic operations such as adding and averaging make sense
distribution
tells us what values the variable takes and how often it takes these variables
spread
give the lowest and highest value in data set
outliers
are there any values that stand out as unusual
center
what is the approximate value of the data (only an estimation)
shape
does the graph show symmetry, or is it skewed in one direction
time plot
plots each variable observation against the time at which it was measured
Mean
add values of observations and divide by the number of observation
Median
midpoint of the distribution
five number summary
consists of smallest observation, first quartile, the median, third quartile, and largest observation, written in order from smallest to largest
Quartiles
To calculate, arrange observation in increasing order and locate the median in ordered list of observations. Q1 is middle value less than Median, Q3 is middle number of values greater than median
IQR
distance between first and third quartiles (Q3-Q1)
Outliers equation
Q1 - 1.5 x IQR
Q3 + 1.5 x IQR
boxplot
graph of five number summary, with outliers plotted individually
Standard Deviation
average of the squares of the deviations of the observations from their mean.
Density Curve
curve that is always above the horizontal axis and has area exactly 1 underneath it. Describes the overall pattern of a distribution.
normal distribution
mound-shaped and symmetric, based on continuous variable, adheres to 68-95-99.7 rule
z score
standardized value. observed - predicted / standard deviation.
standard normal distribution
the normal distribution N(0,1) with a mean 0 and standard deviation 1
Inverse normal calculations
working backwards from area, we find z, then x. Value of z is found in table A in reverse.
response variable
measures an outcome of a study. Sometimes referred to as the dependent variable
explanatory variable
attempts to explain the observed outcomes. Sometimes referred to as the independent variable
scatterplot
shows the relationship between two quantitative variables measured on the same individuals.
form
linear or curved
direction
positive, negative neither
strength
weak, moderate, strong
positive association
when one variable increases, the other increases
negative association
when on variable increases, the other decreases
Correlation
measures strength and direction of the relationship between two quantitative variables. Usually represented by 'r'
regression line
straight line that describes how a response variable y changes as an explanatory variable x changes. Often to predict values of y for given values of x
least squares regression line
line that makes the sum of squares of the vertical distances from the data points to the line as small as possible
r-squared
square of the correlation coefficient, represents the percentage of the change in y-variable that can be attributed to the x-variable.
residual
the difference between an observed value of y and the value predicted by the regression line
residual plot
scatterplot of each x-value and its residual value. Used to determine whether a linear equation is a good model for a set of data. If it exhibits randomness, then a line is a GOOD model for data. If exhibits pattern, then a line is NOT a good model for data
influential point
when you remove this point and it has a large effect on the correlation and/or regression
population
statistical study in the entire group of individuals we want info about
census
collects data from every individual in the population
sample
subset of individuals in the population from which we actually collect data
bias
if the design of a study consistently underestimates or overestimates the value you want to know
convenience sample
chooses individuals who are easiest to reach.
voluntary response sample
consists of people who choose themselves by responding to a general invitation. Show bias because people with strong opinions are most likely to respond
simple random sample
of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected
stratified random sample
classifying the population into groups of similar individuals called a strata, then choose a separate SRS in each stratum and combine SRSs to form smaple
cluster sample
start by classifying the population into groups of individuals that are located near each other, called clusters. Then choose SRS of the clusters. All individuals in clusters are included in the sample
undercoverage
occurs when some members of the population cannot be chosen in a sample
nonresponse
occurs when an individual chosen from the sample can't be contacted ot refuses to participate
wording of questions
most important influence on the answers given to a survey
response bias
systematic pattern of incorrect responses in a sample survey
observational study
observes individuals and measures variable of interest but does not attempt to influence the responses
experiment
deliberately imposes some treatment on individuals to measure their responses. only source of fully convincing data to understand cause and effect
confounding
occurs when two variables associated in such a way that their effects on a response variable cannot be distinguished from each other.
treatment
specific condition applied to individuals in an experiment
experimental units
smallest collection of individuals to which treatments are applied
comparison
use a design that compares two or more treatments
random assignment
use chance to assign experimental units to treatments. Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variable among the treatment groups
control
keep other variables that might affect the response the same for all groups.
replication
use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups
statistically significant
observed effect so large that is would rarely occur by chance
completely randomized design
treatments are assigned to all experimental units completely by chance
control group
receives an inactive treatment or existing baseline treatment
placebo effect
response to dummy treatment
double blind experiment
neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received
block
group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to treatments
randomized block design
the random assignment of experimental units to treatments is carried out separately in each block
matched pairs design
randomized blocked experiment in which each block consists of a matching pair of similar experimental units. Chance is used to determine which unit in each pair gets treatment
parameter
is a number that describes some characteristic of the population
statistic
a number that describes some characteristic of a sample
sampling distribution
distribution of all values taken by a statistic in all possible samples of the same size from the same population
unbiased estimator
if the mean of its sampling distribution is equal to the parameter being estimated
point estimator
a statistic that provides and estimate of the population parameter
;