The science of collecting, organizing, and interpreting data
The complete set of people or things being studied
A subset of the population from which data are actually obtained
Numbers describing characteristics of a sample found by summarizing the raw data
Actual measurements or observations collected from the sample
A number that describes the population. For example, say my population is all of the students at some high school. Not able to calculate.
Numerical and graphical summaries of data. Purpose is to summarize.
Drawing inferences from sample data to a population. Purpose is to predict or explain a hypothesis.
The collection of data from every member of the population.
Simple Random Sample
A sample is chosen by a method such that every member of the population is equally likely to be selected. Ex. A raffle
Sample of Convenience
A random sample that is not drawn by a well-defined random method. May differ systematically in some way from the population. If it is reasonable to believe that no important systematic difference exists, that it is acceptable to treat the sample as if it were a simple random sample.
Stratified Random Sampling
Population is divided into groups, called strata, then a simple random sample is drawn from each stratum.
Population is divided into groups (clusters), clusters are randomly sampled, and all the members of the selected clusters form the sample. Useful when the population is too large and spread out for simple random sampling to be feasible.
Items are ordered and every nth item is chosen to be included in the sample. Ex. Interviewing every 5th person or inspecting every 3rd car on an assembly line.
Voluntary Response Samples
Often used by the media to try to engage the audience. Example, a radio DJ will invite people to call the station to say what they think. Are never reliable sources because people who volunteer an opinion tend to have stronger and negative opinions than those in the typical population.
Any problem in the design or conduct of a statistical study that tends to favor certain results.
Studies conducted with methods that tend to overestimate or underestimate the true value.
Studies conducted by a procedure that produces the correct result on average.
People who have an interest in the outcome of an experiment have an incentive to use biased methods. Ex. Many advertisements use biased data against their competitors.
Social Acceptability Bias
People are reluctant to admit to behavior that may reflect negatively on them. Ex. Who will you vote for in the 2016 election?
Leading Question Bias
Sometimes questions are worded in a way that suggest a particular response. Ex. Do you favor decreasing the heavy tax burden on middle class families?
The opinions of non-responders tend to differ from the opinions of those who do respond. As a result, surveys with many non-responders are often biased.
Occurs when some members in the population are more likely to be included in the sample than others. Ex. Cell phone users
A characteristic that differs from one subject to the next. Ex. Age, grade level, GPA
The values of the variables that we obtain.
All of the information collected.
Classify individuals into categories. Ex. Marital status of survey respondents
Tells how much or many of something there is, an actual number. Ex. Age of survey respondents
Type of QUALITATIVE variable. Have no natural ordering. Ex. Shirt color: Red, black, blue, other
Type of QUALITATIVE variable. Have a natural ordering, but have no mathematically value. Ex. A, B, C, D, F... Excellent, Good, Fair, Poor...
Type of QUANTITATIVE variable. Variables whose possible values can be listed (possible values are countable). Ex. Number of people in line at the bank. Number of siblings someone has.
Type of QUANTITATIVE variable. Can take on any value in some interval (possible values are uncountable). Ex. Height, weight. Number of stars in each galaxy in the universe.
Individuals/things who are studied.
When the experimental units are people, they are referred to as this...
What is measured on each experimental unit.
The procedures applied to each experimental unit.
A study in which the investigator assigns the treatments to the experimental units at random.
The assignment to treatment groups is not made by the investigator. Ex. Health effects of long-term smokers
If neither the investigator nor the subjects know who has been assigned to which treatment. Ex. In an experiment to test the effectiveness of a new pain reliever, patients who know they are getting the drug may report their pain levels differently than those who know they are taking a placebo.
Makes it difficult to tell whether a difference in the outcome is due to the treatment or to some other difference between the treatment and control groups. Observational studies are more susceptible than randomized experiments.
Group of subjects (the cohort) is studied to determine whether various factors of interest are associated with an outcome.
Type of COHORT STUDY where the subjects are followed over time.
Type of COHORT STUDY where measurements are taken at one point in time.
Type of COHORT STUDY where subjects are sampled after the outcome has occurred.
A type of study where two samples are drawn. One sample consists of people who have the disease of interest (case), and the other consists of people who do not have the disease (the controls). The investigators look back in time to determine whether a factor of interest differs between the two groups. Ex. Are pesticides related to brain cancer in children?
The way in which a variable's values are spread over all possible values.
The number of times a variable occurs in a data set.
Presented in a table that gives the frequency for each category.
The proportion of observations in a category. Often expressed as a percent. Equation: Relative Frequency= Frequency/Sum of all Frequencies
Relative Frequency Distribution
Presented in a table that gives the relative frequency for each category.
A graphical representation of a frequency distribution.
Bar graph in which categories are represented in order of frequency, with the tallest frequency on the left and smallest frequency on the right.
Side-by-side Bar Graph
Bar graph that shows frequencies or relative frequencies in categories for more than one group at a time by placing group bars side-by-side within each category.
Stacked Bar Graph
Bar graph that shows frequencies or relative frequencies in categories for more than one group frequencies within each category.
A circle divided such that each wedge represents the RELATIVE frequency of a particular category.
Intervals of equal width that cover all values observed in the data set.
Lower Class Limit
The smallest value that can appear in that class.
Upper Class Limit
The largest value that can appear in that class.
The difference between consecutive LOWER CLASS LIMITS. Ex. 15-10= 5
Graphical representation of frequency (or relative frequency) distribution for quantitative data. Rectangle for each class.
When the first class has no lower limit or the last class has no upper limit. Ex. Age: "85 and older"
When a data set is symmetric, the MEAN and MEDIAN are EQUAL
A peak, or high point, of a histogram. The value that appears most frequently in a data set. Sometimes classified as a measure of center.
Only one mode
Two clearly distinct modes
A measure of center in a data set.
Ex. A sample size of n= 5 was taken from the population of exam scores for a large class. The scores are 78, 83, 92, 68, and 85. The sample mean is 81.2.
Another measure of center. Splits the data set in half, so that half the data values are less than the median and half the data values are greater than the median.
Finding the Median
Step 1: Arrange the data values in increasing order Step 2: Determine n, the number of data values Step 3: If n is odd, the median is the middle number. If n is even, the median is the average of the two middle numbers.
Its value is not affected much by extreme values (large or small) in the data set. The MEDIAN is resistant, but not the mean.
A measure of spread in a data set. The difference between the largest and the smallest value. Range= Maximum- Minimum
A measure of how far the values in a data set are from the mean, on the average.
Sample Variance Formula
The square root of the variance. Standard deviation is not resistant and will be affected by extreme values.
Histogram with single mode the near center of the data, and are approximately symmetric.
Approx. 68% of the data will be within ONE STANDARD DEVIATION of the mean. Approx. 95% of the data will be within TWO STANDARD DEVIATIONS of the mean. Almost all (99.7%) of the data will be within THREE STANDARD DEVIATIONS of the mean.
Coefficient of Variation (CV)
Tells how large the standard deviation is relative to the mean. It can be used to compare the spreads of data sets whose values have different units.
Tells how many standard deviations that value is from its population mean. Ex. A value one standard deviation above the mean has a Z-score of 1. A value two standard deviations below the mean has a Z-score of -2.
Minimum, 1st Quartile, Median, 3rd Quartile, Maximum
A value that is considerably larger or smaller than most of the values in a data set.
Interquartile Range (IQR)
Found by subtracting the 1st quartile from the 3rd quartile. One method for detecting outliers.
A graph that presents the five-number summary. Whiskers out to the Min and Max.
Whiskers out to the lowest/highest values that are not outliers. Symbols represent outliers.