85 terms

Statistics (Singular)

The science of collecting, organizing, and interpreting data

Population

The complete set of people or things being studied

Sample

A subset of the population from which data are actually obtained

Statistics (Plural)

Numbers describing characteristics of a sample found by summarizing the raw data

Raw data

Actual measurements or observations collected from the sample

Parameter

A number that describes the population. For example, say my population is all of the students at some high school. Not able to calculate.

Descriptive Statistics

Numerical and graphical summaries of data. Purpose is to summarize.

Inferential Statistics

Drawing inferences from sample data to a population. Purpose is to predict or explain a hypothesis.

Census

The collection of data from every member of the population.

Simple Random Sample

A sample is chosen by a method such that every member of the population is equally likely to be selected.

Ex. A raffle

Ex. A raffle

Sample of Convenience

A random sample that is not drawn by a well-defined random method. May differ systematically in some way from the population. If it is reasonable to believe that no important systematic difference exists, that it is acceptable to treat the sample as if it were a simple random sample.

Stratified Random Sampling

Population is divided into groups, called strata, then a simple random sample is drawn from each stratum.

Cluster Sampling

Population is divided into groups (clusters), clusters are randomly sampled, and all the members of the selected clusters form the sample. Useful when the population is too large and spread out for simple random sampling to be feasible.

Systematic Sampling

Items are ordered and every nth item is chosen to be included in the sample.

Ex. Interviewing every 5th person or inspecting every 3rd car on an assembly line.

Ex. Interviewing every 5th person or inspecting every 3rd car on an assembly line.

Voluntary Response Samples

Often used by the media to try to engage the audience. Example, a radio DJ will invite people to call the station to say what they think.

Are never reliable sources because people who volunteer an opinion tend to have stronger and negative opinions than those in the typical population.

Are never reliable sources because people who volunteer an opinion tend to have stronger and negative opinions than those in the typical population.

Bias

Any problem in the design or conduct of a statistical study that tends to favor certain results.

Biased

Studies conducted with methods that tend to overestimate or underestimate the true value.

Unbiased

Studies conducted by a procedure that produces the correct result on average.

Self-Interest Bias

People who have an interest in the outcome of an experiment have an incentive to use biased methods.

Ex. Many advertisements use biased data against their competitors.

Ex. Many advertisements use biased data against their competitors.

Social Acceptability Bias

People are reluctant to admit to behavior that may reflect negatively on them.

Ex. Who will you vote for in the 2016 election?

Ex. Who will you vote for in the 2016 election?

Leading Question Bias

Sometimes questions are worded in a way that suggest a particular response.

Ex. Do you favor decreasing the heavy tax burden on middle class families?

Ex. Do you favor decreasing the heavy tax burden on middle class families?

Non-response Bias

The opinions of non-responders tend to differ from the opinions of those who do respond. As a result, surveys with many non-responders are often biased.

Sampling Bias

Occurs when some members in the population are more likely to be included in the sample than others.

Ex. Cell phone users

Ex. Cell phone users

Variable

A characteristic that differs from one subject to the next.

Ex. Age, grade level, GPA

Ex. Age, grade level, GPA

Data

The values of the variables that we obtain.

Data Set

All of the information collected.

Qualitative Variables

Classify individuals into categories.

Ex. Marital status of survey respondents

Ex. Marital status of survey respondents

Quantitative Variables

Tells how much or many of something there is, an actual number.

Ex. Age of survey respondents

Ex. Age of survey respondents

Nominal Variables

Type of QUALITATIVE variable. Have no natural ordering.

Ex. Shirt color: Red, black, blue, other

Ex. Shirt color: Red, black, blue, other

Ordinal Variables

Type of QUALITATIVE variable. Have a natural ordering, but have no mathematically value.

Ex. A, B, C, D, F... Excellent, Good, Fair, Poor...

Ex. A, B, C, D, F... Excellent, Good, Fair, Poor...

Discrete Variables

Type of QUANTITATIVE variable. Variables whose possible values can be listed (possible values are countable).

Ex. Number of people in line at the bank. Number of siblings someone has.

Ex. Number of people in line at the bank. Number of siblings someone has.

Continuous Variables

Type of QUANTITATIVE variable. Can take on any value in some interval (possible values are uncountable).

Ex. Height, weight. Number of stars in each galaxy in the universe.

Ex. Height, weight. Number of stars in each galaxy in the universe.

Experimental Units

Individuals/things who are studied.

Subjects

When the experimental units are people, they are referred to as this...

Outcome/Response

What is measured on each experimental unit.

Treatments

The procedures applied to each experimental unit.

Randomized Experiment

A study in which the investigator assigns the treatments to the experimental units at random.

Observational Study

The assignment to treatment groups is not made by the investigator.

Ex. Health effects of long-term smokers

Ex. Health effects of long-term smokers

Double-Blind Experiments

If neither the investigator nor the subjects know who has been assigned to which treatment.

Ex. In an experiment to test the effectiveness of a new pain reliever, patients who know they are getting the drug may report their pain levels differently than those who know they are taking a placebo.

Ex. In an experiment to test the effectiveness of a new pain reliever, patients who know they are getting the drug may report their pain levels differently than those who know they are taking a placebo.

Confounding

Makes it difficult to tell whether a difference in the outcome is due to the treatment or to some other difference between the treatment and control groups. Observational studies are more susceptible than randomized experiments.

Cohort Study

Group of subjects (the cohort) is studied to determine whether various factors of interest are associated with an outcome.

Prospective

Type of COHORT STUDY where the subjects are followed over time.

Cross-Sectional

Type of COHORT STUDY where measurements are taken at one point in time.

Retrospective

Type of COHORT STUDY where subjects are sampled after the outcome has occurred.

Case-Control

A type of study where two samples are drawn. One sample consists of people who have the disease of interest (case), and the other consists of people who do not have the disease (the controls). The investigators look back in time to determine whether a factor of interest differs between the two groups.

Ex. Are pesticides related to brain cancer in children?

Ex. Are pesticides related to brain cancer in children?

Distribution

The way in which a variable's values are spread over all possible values.

Frequency

The number of times a variable occurs in a data set.

Frequency Distribution

Presented in a table that gives the frequency for each category.

Relative Frequency

The proportion of observations in a category. Often expressed as a percent.

Equation:

Relative Frequency= Frequency/Sum of all Frequencies

Equation:

Relative Frequency= Frequency/Sum of all Frequencies

Relative Frequency Distribution

Presented in a table that gives the relative frequency for each category.

Bar Graph

A graphical representation of a frequency distribution.

Pareto Chart

Bar graph in which categories are represented in order of frequency, with the tallest frequency on the left and smallest frequency on the right.

Side-by-side Bar Graph

Bar graph that shows frequencies or relative frequencies in categories for more than one group at a time by placing group bars side-by-side within each category.

Stacked Bar Graph

Bar graph that shows frequencies or relative frequencies in categories for more than one group frequencies within each category.

Pie Chart

A circle divided such that each wedge represents the RELATIVE frequency of a particular category.

Classes

Intervals of equal width that cover all values observed in the data set.

Lower Class Limit

The smallest value that can appear in that class.

Upper Class Limit

The largest value that can appear in that class.

Class Width

The difference between consecutive LOWER CLASS LIMITS.

Ex. 15-10= 5

Ex. 15-10= 5

Histogram

Graphical representation of frequency (or relative frequency) distribution for quantitative data. Rectangle for each class.

Open-Ended Classes

When the first class has no lower limit or the last class has no upper limit.

Ex. Age: "85 and older"

Ex. Age: "85 and older"

Positively Skewed

Negatively Skewed

Symmetric

When a data set is symmetric, the MEAN and MEDIAN are EQUAL

Mode

A peak, or high point, of a histogram. The value that appears most frequently in a data set. Sometimes classified as a measure of center.

Unimodal

Only one mode

Bimodal

Two clearly distinct modes

Mean

A measure of center in a data set.

Sample Mean

Ex. A sample size of n= 5 was taken from the population of exam scores for a large class. The scores are 78, 83, 92, 68, and 85.

The sample mean is 81.2.

The sample mean is 81.2.

Median

Another measure of center. Splits the data set in half, so that half the data values are less than the median and half the data values are greater than the median.

Finding the Median

Step 1: Arrange the data values in increasing order

Step 2: Determine n, the number of data values

Step 3: If n is odd, the median is the middle number. If n is even, the median is the average of the two middle numbers.

Step 2: Determine n, the number of data values

Step 3: If n is odd, the median is the middle number. If n is even, the median is the average of the two middle numbers.

Resistant Statistic

Its value is not affected much by extreme values (large or small) in the data set. The MEDIAN is resistant, but not the mean.

Range

A measure of spread in a data set. The difference between the largest and the smallest value.

Range= Maximum- Minimum

Range= Maximum- Minimum

Variance

A measure of how far the values in a data set are from the mean, on the average.

Sample Variance Formula

Standard Deviation

The square root of the variance. Standard deviation is not resistant and will be affected by extreme values.

Bell-Shaped Histogram

Histogram with single mode the near center of the data, and are approximately symmetric.

Empirical Rule

Approx. 68% of the data will be within ONE STANDARD DEVIATION of the mean.

Approx. 95% of the data will be within TWO STANDARD DEVIATIONS of the mean.

Almost all (99.7%) of the data will be within THREE STANDARD DEVIATIONS of the mean.

Approx. 95% of the data will be within TWO STANDARD DEVIATIONS of the mean.

Almost all (99.7%) of the data will be within THREE STANDARD DEVIATIONS of the mean.

Coefficient of Variation (CV)

Tells how large the standard deviation is relative to the mean. It can be used to compare the spreads of data sets whose values have different units.

Z-Score

Tells how many standard deviations that value is from its population mean.

Ex. A value one standard deviation above the mean has a Z-score of 1.

A value two standard deviations below the mean has a Z-score of -2.

Ex. A value one standard deviation above the mean has a Z-score of 1.

A value two standard deviations below the mean has a Z-score of -2.

Five-Number Summary

Minimum, 1st Quartile, Median, 3rd Quartile, Maximum

Outlier

A value that is considerably larger or smaller than most of the values in a data set.

Interquartile Range (IQR)

Found by subtracting the 1st quartile from the 3rd quartile. One method for detecting outliers.

Boxplot

A graph that presents the five-number summary. Whiskers out to the Min and Max.

Modified Boxplot

Whiskers out to the lowest/highest values that are not outliers. Symbols represent outliers.