52 terms

AP Stats Chapter 4

STUDY
PLAY

Terms in this set (...)

Population
In a statistical study, it is the entire group of individuals about which we ant information
Sample
The part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population
Sample survey
A study that uses an organized plan to choose a sample that represents some specific population
Identify the population and sample:
The student council at a high school surveys 100 students at the school to get their opinion on a change in bell schedule
Population: student body
Sample: 100 students surveyed
Convenience sample
Choosing individuals who are easiest to reach results
Bias
The design of a statistical study shows bias if it systematically favors certain outcomes. A result of bad sampling. (Not always intentional)
Voluntary response samples
Consist of people who choose themselves by responding to a general appeal. Voluntary response samples show bias because people with a strong opinions (often in the same direction) are most likely to respond
Random sampling
The use of chance to select a sample, it is the central principle of standard sampling
Simple random sample (SRS)
An SRS of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an EQUAL CHANCE to be the sample actually selected
Stratified random sample
First classify the population into strata. Then shoes a separate SRS in each stratum and combine these to find the full sample
Strata
Smaller groups of individuals
Cluster sample
First divide the population into smaller groups. Ideally, these clusters should mirror the characteristics of the population. Then, choose an SRS of the clusters. All individuals in the chosen clusters are included in the sample
Table of random digits
Also known as table d, it is a way to formulate random selection
How to generate an SRS using a table
1) label- give each member of the population a numerical label of the same length
2) table- read consecutive groups of digits of the appropriate length from table d
Inference
The process of drawing conclusions about a population on the basis of sample data
Why random sampling?
-to eliminate bias in selecting samples from the list of available individuals
-the laws of probability allow trustworthy inference about the population.
---results from random samples come with a margin of error that sets bounds on the size of the likely error
---larger samples give better information about the population than smaller samples
Undercoverage
Occurs when some groups in the population are left out of the process of choosing the sample
Non-response
Occurs when an individual chosen for the sample cannot be contacted or refuses to participate
Response bias
A systematic pattern of incorrect responses in a sample survey leads to a response bias
Wording of questions
The most important influence on the answers given to a sample survey
Observational study
Observes individuals and measure variables of interest but does not attempt to influence the responses
experiment
Deliberately imposes some treatment on individuals to measure their responses
When the goal is to understand cause and effect, what are the only source of convincing data?
Experiments
when does confounding occur?
when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other
lurking variable
a variable that is not among the explanatory or response variables in a study, but that may influence the response variable
an experiment is a statistical study in which we actually do ______ to the experimental units to observe ______.
a treatment, the response
treatment
a specific condition applied to the individuals in an experiment - has several explanatory variables, a treatment is a combination of specific values of these variables
experimental units (subjects)
the smallest collection of individuals to which treatments are applied - when units are humans, they are often called subjects
factors
the explanatory variables in an experiment. many experiments study the joint effects of several factors.
*each treatment is formed by combining a specific value (often called a level) of each of these factors
what makes badly designed experiments yield worthless results?
confounding variables
comparative experiment
the remedy for confounding - some units receive one treatment, and similar units receive another
in an experiment, random assignment means...
that experimental units are assigned to the treatments at random , that is, using some sort of chance process
completely randomized design
the treatments are assigned to all the experimental units completely by chance
control group
the group that receives an inactive treatment or an existing baseline treatment - allows us to compare
PRINCIPLES FOR EXPERIMENTAL DESIGN
CONTROL: for lurking variables that might affect the response: use a comparative design and ensure that the only systematic difference between the groups is the treatment administered
RANDOM ASSIGNMENT: use impersonal chance to assign experimental units to treatments. this helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren't controlled on the experiment groups
REPLICATION: use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups
do all subjects have to be treated the same in order to get logical, true experiment results?
yes - all subjects must be treated identically except for the actual treatments being compared.
placebo effect
a response to a dummy treatment
double blind experiment
neither the subjects, nor those who interact with the subjects and measure the response variable know which treatment a subject recieved
statistically significant
an observed effect so large that it would rarely occur by chance
Does a statistically significant association in data from a well designed experiment imply causation?
yes
block
a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments
randomized block design
the random assignment of experimental units to treatments is carried out separately within each block
*form blocks based on the most important unavoidable sources of variability among the experimental units
_____ what you can, _____ __ what you can't control, and _________ to create comparable groups.
Control, block on, randomize
matched pairs design
a randomized blocked experiment in which each block consists of a matching pair of similar experimental units. chance is used to determine which unit in each pair gets each treatment.

a matched pair may consist of a single unit that receives both treatments. since the order of treatments may effect results, chance is used to determine which treatment is applied first for each unit

ex: twins, different boot on each foot
scope of inference
who does it apply to, and how does it apply? - use yes/no table
t/f: observational studies that use random sampling can make inferences about the population
true: while observational studies can make inferences about the population, they cannot be used to make an inference about cause and effect
Scope of inference table
Were individuals randomly selected? No
Were individuals randomly assigned to groups? Yes
-Inference about population: No
-inference about cause and effect: Yes

Were individuals randomly selected? Yes
Were individuals randomly assigned to groups? No
-Inference about population: Yes
-inference about cause and effect: No
lack of realism
can limit our ability apply the conclusions of an experiment to the settings of greatest interest
Ex: testing on animals, then applying results to humans
is it possible to build a strong case for causation in the absence of experiments by considering data from observational studies?
sometimes - even when it is considered unethical to carry out an experiment, cause an effect CAN be proven, it is just more difficult
When an experiment cannot be performed for ethical reasons, what criteria may be used to show causation?
-the association is strong
-the association is consistent
-larger values of the explanatory variable are associated with stronger responses
-the alleged cause predicts the effect in time
-the alleged cause is plausible
basic data ethics
-all planned studies must be reviewed in advance by an INSTITUTIONAL REVIEW BOARD charged with protecting the safety and well-being of the subjects
-all individuals who are subjects in a study must give their INFORMED CONSENT before data are collected
-all individual data must be kept CONFIDENTIAL. only statistical summaries for groups of subjects may be made public
What is the difference between non response bias and voluntary response bias?
Non response: When certain subjects are chosen randomly to participate but do not respond (ex: mailing letters out to selected individuals, but only 32/50 respond)
Voluntary response: when an experiment/survey is open to any one who choses themself to respond (ex online survey on a webpage)