Statistics

STUDY
PLAY

Terms in this set (...)

Statistics (Singular)
The science of collecting, organizing, and interpreting data
Population
The complete set of people or things being studied
Sample
A subset of the population from which data are actually obtained
Statistics (Plural)
Numbers describing characteristics of a sample found by summarizing the raw data
Raw data
Actual measurements or observations collected from the sample
Parameter
A number that describes the population. For example, say my population is all of the students at some high school. Not able to calculate.
Descriptive Statistics
Numerical and graphical summaries of data. Purpose is to summarize.
Inferential Statistics
Drawing inferences from sample data to a population. Purpose is to predict or explain a hypothesis.
Census
The collection of data from every member of the population.
Simple Random Sample
A sample is chosen by a method such that every member of the population is equally likely to be selected.
Ex. A raffle
Sample of Convenience
A random sample that is not drawn by a well-defined random method. May differ systematically in some way from the population. If it is reasonable to believe that no important systematic difference exists, that it is acceptable to treat the sample as if it were a simple random sample.
Stratified Random Sampling
Population is divided into groups, called strata, then a simple random sample is drawn from each stratum.
Cluster Sampling
Population is divided into groups (clusters), clusters are randomly sampled, and all the members of the selected clusters form the sample. Useful when the population is too large and spread out for simple random sampling to be feasible.
Systematic Sampling
Items are ordered and every nth item is chosen to be included in the sample.
Ex. Interviewing every 5th person or inspecting every 3rd car on an assembly line.
Voluntary Response Samples
Often used by the media to try to engage the audience. Example, a radio DJ will invite people to call the station to say what they think.
Are never reliable sources because people who volunteer an opinion tend to have stronger and negative opinions than those in the typical population.
Bias
Any problem in the design or conduct of a statistical study that tends to favor certain results.
Biased
Studies conducted with methods that tend to overestimate or underestimate the true value.
Unbiased
Studies conducted by a procedure that produces the correct result on average.
Self-Interest Bias
People who have an interest in the outcome of an experiment have an incentive to use biased methods.
Ex. Many advertisements use biased data against their competitors.
Social Acceptability Bias
People are reluctant to admit to behavior that may reflect negatively on them.
Ex. Who will you vote for in the 2016 election?
Leading Question Bias
Sometimes questions are worded in a way that suggest a particular response.
Ex. Do you favor decreasing the heavy tax burden on middle class families?
Non-response Bias
The opinions of non-responders tend to differ from the opinions of those who do respond. As a result, surveys with many non-responders are often biased.
Sampling Bias
Occurs when some members in the population are more likely to be included in the sample than others.
Ex. Cell phone users
Variable
A characteristic that differs from one subject to the next.
Ex. Age, grade level, GPA
Data
The values of the variables that we obtain.
Data Set
All of the information collected.
Qualitative Variables
Classify individuals into categories.
Ex. Marital status of survey respondents
Quantitative Variables
Tells how much or many of something there is, an actual number.
Ex. Age of survey respondents
Nominal Variables
Type of QUALITATIVE variable. Have no natural ordering.
Ex. Shirt color: Red, black, blue, other
Ordinal Variables
Type of QUALITATIVE variable. Have a natural ordering, but have no mathematically value.
Ex. A, B, C, D, F... Excellent, Good, Fair, Poor...
Discrete Variables
Type of QUANTITATIVE variable. Variables whose possible values can be listed (possible values are countable).
Ex. Number of people in line at the bank. Number of siblings someone has.
Continuous Variables
Type of QUANTITATIVE variable. Can take on any value in some interval (possible values are uncountable).
Ex. Height, weight. Number of stars in each galaxy in the universe.
Experimental Units
Individuals/things who are studied.
Subjects
When the experimental units are people, they are referred to as this...
Outcome/Response
What is measured on each experimental unit.
Treatments
The procedures applied to each experimental unit.
Randomized Experiment
A study in which the investigator assigns the treatments to the experimental units at random.
Observational Study
The assignment to treatment groups is not made by the investigator.
Ex. Health effects of long-term smokers
Double-Blind Experiments
If neither the investigator nor the subjects know who has been assigned to which treatment.
Ex. In an experiment to test the effectiveness of a new pain reliever, patients who know they are getting the drug may report their pain levels differently than those who know they are taking a placebo.
Confounding
Makes it difficult to tell whether a difference in the outcome is due to the treatment or to some other difference between the treatment and control groups. Observational studies are more susceptible than randomized experiments.
Cohort Study
Group of subjects (the cohort) is studied to determine whether various factors of interest are associated with an outcome.
Prospective
Type of COHORT STUDY where the subjects are followed over time.
Cross-Sectional
Type of COHORT STUDY where measurements are taken at one point in time.
Retrospective
Type of COHORT STUDY where subjects are sampled after the outcome has occurred.
Case-Control
A type of study where two samples are drawn. One sample consists of people who have the disease of interest (case), and the other consists of people who do not have the disease (the controls). The investigators look back in time to determine whether a factor of interest differs between the two groups.
Ex. Are pesticides related to brain cancer in children?
Distribution
The way in which a variable's values are spread over all possible values.
Frequency
The number of times a variable occurs in a data set.
Frequency Distribution
Presented in a table that gives the frequency for each category.
Relative Frequency
The proportion of observations in a category. Often expressed as a percent.
Equation:
Relative Frequency= Frequency/Sum of all Frequencies
Relative Frequency Distribution
Presented in a table that gives the relative frequency for each category.
Bar Graph
A graphical representation of a frequency distribution.
Pareto Chart
Bar graph in which categories are represented in order of frequency, with the tallest frequency on the left and smallest frequency on the right.
Side-by-side Bar Graph
Bar graph that shows frequencies or relative frequencies in categories for more than one group at a time by placing group bars side-by-side within each category.
Stacked Bar Graph
Bar graph that shows frequencies or relative frequencies in categories for more than one group frequencies within each category.
Pie Chart
A circle divided such that each wedge represents the RELATIVE frequency of a particular category.
Classes
Intervals of equal width that cover all values observed in the data set.
Lower Class Limit
The smallest value that can appear in that class.
Upper Class Limit
The largest value that can appear in that class.
Class Width
The difference between consecutive LOWER CLASS LIMITS.
Ex. 15-10= 5
Histogram
Graphical representation of frequency (or relative frequency) distribution for quantitative data. Rectangle for each class.
Open-Ended Classes
When the first class has no lower limit or the last class has no upper limit.
Ex. Age: "85 and older"
Positively Skewed
Negatively Skewed
Symmetric
When a data set is symmetric, the MEAN and MEDIAN are EQUAL
Mode
A peak, or high point, of a histogram. The value that appears most frequently in a data set. Sometimes classified as a measure of center.
Unimodal
Only one mode
Bimodal
Two clearly distinct modes
Mean
A measure of center in a data set.
Sample Mean
Ex. A sample size of n= 5 was taken from the population of exam scores for a large class. The scores are 78, 83, 92, 68, and 85.
The sample mean is 81.2.
Median
Another measure of center. Splits the data set in half, so that half the data values are less than the median and half the data values are greater than the median.
Finding the Median
Step 1: Arrange the data values in increasing order
Step 2: Determine n, the number of data values
Step 3: If n is odd, the median is the middle number. If n is even, the median is the average of the two middle numbers.
Resistant Statistic
Its value is not affected much by extreme values (large or small) in the data set. The MEDIAN is resistant, but not the mean.
Range
A measure of spread in a data set. The difference between the largest and the smallest value.
Range= Maximum- Minimum
Variance
A measure of how far the values in a data set are from the mean, on the average.
Sample Variance Formula
Standard Deviation
The square root of the variance. Standard deviation is not resistant and will be affected by extreme values.
Bell-Shaped Histogram
Histogram with single mode the near center of the data, and are approximately symmetric.
Empirical Rule
Approx. 68% of the data will be within ONE STANDARD DEVIATION of the mean.
Approx. 95% of the data will be within TWO STANDARD DEVIATIONS of the mean.
Almost all (99.7%) of the data will be within THREE STANDARD DEVIATIONS of the mean.
Coefficient of Variation (CV)
Tells how large the standard deviation is relative to the mean. It can be used to compare the spreads of data sets whose values have different units.
Z-Score
Tells how many standard deviations that value is from its population mean.
Ex. A value one standard deviation above the mean has a Z-score of 1.
A value two standard deviations below the mean has a Z-score of -2.
Five-Number Summary
Minimum, 1st Quartile, Median, 3rd Quartile, Maximum
Outlier
A value that is considerably larger or smaller than most of the values in a data set.
Interquartile Range (IQR)
Found by subtracting the 1st quartile from the 3rd quartile. One method for detecting outliers.
Boxplot
A graph that presents the five-number summary. Whiskers out to the Min and Max.
Modified Boxplot
Whiskers out to the lowest/highest values that are not outliers. Symbols represent outliers.

Flickr Creative Commons Images

Some images used in this set are licensed under the Creative Commons through Flickr.com.
Click to see the original works with their full license.