Statistics

the science of collecting, describing, and interpreting data

Data

observations (measurements or survey responses) that have been collected

Population

the complete collection of the people, objects, events, etc. that are to be analyzed

Sample

a sub collection of members collected from the population

Statistical Thinking includes:

1. context of the data

2. source of the data

3. sampling method

4. practical implications

5. conclusion

2. source of the data

3. sampling method

4. practical implications

5. conclusion

A paramater

a number describing a population

A statistic

number describing some characteristic of a sample

Discrete Data

numerical type of data in intervals

(age, # siblings, shoe sizes)

when data values are quantitative and the #s of values are FINITE or COUNTABLE

(age, # siblings, shoe sizes)

when data values are quantitative and the #s of values are FINITE or COUNTABLE

Continuous Data

any number

(height, weight, volume)

infinitely many possible quantitative values (NOT COUNTABLE)

(height, weight, volume)

infinitely many possible quantitative values (NOT COUNTABLE)

Reported Data

self-reported

Qualitative Data

categorical (not a #)

(letter grade, ethnicity, eye color, hair color)

(letter grade, ethnicity, eye color, hair color)

Observational Study

we observe and measure specific characteristics but don't attempt to MODIFY the subjects being studied

Experiment

apply some treatment then proceed to observe its effects on the subjects

Random Sample

members from the population are selected so that each individual has an equal chance of selection

Simple Random Sample (of size n)

members of the populations are selected so that each sample has an equal chance of selection

*If its not random, its not simple random

*If its not random, its not simple random

Cluster Sample

Divide the population into sections (clusters), randomly select one or more clusters, then select ALL members from those clusters

Stratified Sample

Divide the population into subgroups so that the subjects within the subgroups share some characteristic, then draw a sample from each subgroup

Systematic Sampling

select some starting point, then select every kth element of the population

Frequency Distribution

a list that pairs each data value (either individually or by group intervals) with its frequency

Normal Frequency

if the frequencies start out low, then increase to one or two high frequencies, then decrease to a lower frequency

symmetric around the middle (bell shaped)

symmetric around the middle (bell shaped)

Relative Frequency Distribution

list the relative frequencies of each class instead of the frequency

given in decimal form or percentage

given in decimal form or percentage

Pie Chart

the amount of data that belongs in each category is shown as the corresponding proportion of a circle

Pareto Chart

a bar graph for qualitative (non-numerical) data with bars arranged high to low

Dot Plot

each piece of data is represented as a dot along a scale

Stem-and-Leaf Display

each piece of data is divided into 2 parts:

Leading digits = stem

Trailing digits = leaf

Leading digits = stem

Trailing digits = leaf

Bar Graph

uses bars of equal width to show frequencies of categories with qualitative data

Multiple Bar Graph

has two or more sets of bars and is used to compare two or more sets of data

Histogram

a bar chart where the bars touch one another that displays frequency distributions

Mean

the average of group of #s

Median

number that lies in the middle when the data is sorted by size. median = x bar

Mode

number that occurs most often. mode = M

Mid-range

average of the max and the min

Weighted Mean

used to determine an average value if there are different weights (think GPA)

Range

difference between the high and low data values

Standard Deviation

the square root of the variance

Range Rule of Thumb

values that lie outside of 2 standard deviations of the mean are unusual values. values lying within 2 are usual

The Empirical Rule

applies to data with normal distribution

1. 68% of data falls within 1 S

2. 95% of data falls within 2 S

3. 99.7% of data falls within 3 S

1. 68% of data falls within 1 S

2. 95% of data falls within 2 S

3. 99.7% of data falls within 3 S

Z-Scores

number of standard deviation that a given value of x lies above or below the mean

Percentiles

percentage of number that are lower k% from the upper

Boxplot

used to determine normality but offer less specific information. the higher S the more varied and less predictability it has

Procedure

simple process that can be repeated and may result in different outcomes

Sample Space

set of ALL possibly outcomes of an experiment

Event

a set including a collection of the possible outcomes of an experiment

Law of Large Numbers

as a procedure is repeated again and again the relative frequency approximation to the probability of an event tend to approach the actual probability