objects described by a set of data. May be people, but they also may be animals or things.
Any characteristic of an individual. Can take different values for different individuals.
places an individual into one of several groups or categories
takes numerical values for which arithmetic operations such as adding and averaging make sense
tells us what values the variable takes and how often it takes these variables
give the lowest and highest value in data set
are there any values that stand out as unusual
what is the approximate value of the data (only an estimation)
does the graph show symmetry, or is it skewed in one direction
plots each variable observation against the time at which it was measured
add values of observations and divide by the number of observation
midpoint of the distribution
five number summary
consists of smallest observation, first quartile, the median, third quartile, and largest observation, written in order from smallest to largest
To calculate, arrange observation in increasing order and locate the median in ordered list of observations. Q1 is middle value less than Median, Q3 is middle number of values greater than median
distance between first and third quartiles (Q3-Q1)
Q1 - 1.5 x IQR Q3 + 1.5 x IQR
graph of five number summary, with outliers plotted individually
average of the squares of the deviations of the observations from their mean.
curve that is always above the horizontal axis and has area exactly 1 underneath it. Describes the overall pattern of a distribution.
mound-shaped and symmetric, based on continuous variable, adheres to 68-95-99.7 rule
standardized value. observed - predicted / standard deviation.
standard normal distribution
the normal distribution N(0,1) with a mean 0 and standard deviation 1
Inverse normal calculations
working backwards from area, we find z, then x. Value of z is found in table A in reverse.
measures an outcome of a study. Sometimes referred to as the dependent variable
attempts to explain the observed outcomes. Sometimes referred to as the independent variable
shows the relationship between two quantitative variables measured on the same individuals.
linear or curved
positive, negative neither
weak, moderate, strong
when one variable increases, the other increases
when on variable increases, the other decreases
measures strength and direction of the relationship between two quantitative variables. Usually represented by 'r'
straight line that describes how a response variable y changes as an explanatory variable x changes. Often to predict values of y for given values of x
least squares regression line
line that makes the sum of squares of the vertical distances from the data points to the line as small as possible
square of the correlation coefficient, represents the percentage of the change in y-variable that can be attributed to the x-variable.
the difference between an observed value of y and the value predicted by the regression line
scatterplot of each x-value and its residual value. Used to determine whether a linear equation is a good model for a set of data. If it exhibits randomness, then a line is a GOOD model for data. If exhibits pattern, then a line is NOT a good model for data
when you remove this point and it has a large effect on the correlation and/or regression
statistical study in the entire group of individuals we want info about
collects data from every individual in the population
subset of individuals in the population from which we actually collect data
if the design of a study consistently underestimates or overestimates the value you want to know
chooses individuals who are easiest to reach.
voluntary response sample
consists of people who choose themselves by responding to a general invitation. Show bias because people with strong opinions are most likely to respond
simple random sample
of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected
stratified random sample
classifying the population into groups of similar individuals called a strata, then choose a separate SRS in each stratum and combine SRSs to form smaple
start by classifying the population into groups of individuals that are located near each other, called clusters. Then choose SRS of the clusters. All individuals in clusters are included in the sample
occurs when some members of the population cannot be chosen in a sample
occurs when an individual chosen from the sample can't be contacted ot refuses to participate
wording of questions
most important influence on the answers given to a survey
systematic pattern of incorrect responses in a sample survey
observes individuals and measures variable of interest but does not attempt to influence the responses
deliberately imposes some treatment on individuals to measure their responses. only source of fully convincing data to understand cause and effect
occurs when two variables associated in such a way that their effects on a response variable cannot be distinguished from each other.
specific condition applied to individuals in an experiment
smallest collection of individuals to which treatments are applied
use a design that compares two or more treatments
use chance to assign experimental units to treatments. Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variable among the treatment groups
keep other variables that might affect the response the same for all groups.
use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups
observed effect so large that is would rarely occur by chance
completely randomized design
treatments are assigned to all experimental units completely by chance
receives an inactive treatment or existing baseline treatment
response to dummy treatment
double blind experiment
neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received
group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to treatments
randomized block design
the random assignment of experimental units to treatments is carried out separately in each block
matched pairs design
randomized blocked experiment in which each block consists of a matching pair of similar experimental units. Chance is used to determine which unit in each pair gets treatment
is a number that describes some characteristic of the population
a number that describes some characteristic of a sample
distribution of all values taken by a statistic in all possible samples of the same size from the same population
if the mean of its sampling distribution is equal to the parameter being estimated
a statistic that provides and estimate of the population parameter