53 terms

Quantitative

Data that is represented by numerical values.

Categorical

Labels for the categories such as "male" and "female". The distribution of a categorical variable lists the categories and gives either the count or the percent of individuals who fall in each category.

Roundoff error

This may cause percentages to not add up to 100%. Roundoff errors don't point to mistakes in our work, just to the effect of rounding off results.

Pie charts

A pie chart must include all the categories that make up a whole. Use a pie chart only when you want to emphasize each category's relation as a whole. The values need to have a total population of 100%.

Survey

Surveys are used to generalize.

Experiment

Used to observe a response.

Data

The variables.

Key questions

1.) Who/How many

2.) What are the variables

3.) Why is the data gathered

4.) When, where, how, by whom were the data produced.

2.) What are the variables

3.) Why is the data gathered

4.) When, where, how, by whom were the data produced.

How to calculate the degree in each section of a pie chart:

(Percent as a decimal) x (360) = angle measurement for part in pie chart

Bar graphs

These are easier to make and easier to read. They are more flexible than pie charts. Both graphs can display the distribution of a categorical variable, but a bar graph can also compare any set of quantities that are measured in the same units.

What do bar graphs and pie charts do?

Help an audience grasp data quickly.

Stemplot

Gives a quick picture of the shape of a distribution while including the actual numerical values on the graph. These work best for small numbers of that are all greater than 0. They do not work well for large data sets where each stem must hold a large number of leaves.

Clusters

A congregation of values in one area.

Back-to-back stem plot

When you wish to compare two related distributions with common stems.

Splitting stems

Doubling the number of stems with the leaves 0-4 on one stem and the leaves 5-9 on the next.

Trimming

Removing the last digit or digits before making a stem plot.

Mean

Average (x bar).

Median

Middle (Med).

Mode

Most recurring value.

SOCS

Shape

Outliers

Center

Spread

Outliers

Center

Spread

Histogram

Breaks the range of values of a variable into classes and displays only the count or percentage of the observations that fall into each class. You can chose any convent number of classes, but you should always chose classes of equal width.

Frequencies

The number of individuals in each class.

Frequency table

A table of frequency for all classes,

Histogram vs. Bar Graph

Histogram: shows the distribution of counts or percentages among the values of a single quantitate variable.

Bar Graph: Displays the distribution of a categorical variable.

Bar Graph: Displays the distribution of a categorical variable.

Modes

Major peaks.

Unimodal

Distribution with one major peak.

Symmetric distribution

If the values smaller and larger than its midpoint are mirror images of each other (bell curve).

Skewed

If it i skewed to the right then the right tail is much longer than the left tail.

Dealing with outliers

Identifying outliers is a matter of judgement. Look for points that are clearly apart from the body of the data, not just the most extreme observation in the distribution.

Ogive

Relative cumulative frequency graph. You can find the path percentile using the type of graph. To find the center draw a vertical line down the horizontal axis to 50%.

Time plots

Displays of the distribution of a variable that ignore time order, such as templets and histograms, can be misleading when there is a systematic change over time. It plots each observation against the time at which it was measured. Always put time as your horizontal scale.

Seasonal variation

A regular rise and fall that occurs each year.

Distribution

Tells us what values it takes and how often it takes these values.

What are the two common measures of center?

Mean and Median.

About the Mean as a value for center:

- The mean is sensitive to the influence of a few extreme observations.

- Not a resistant measure: the mean cannot resist the influence of extreme observations.

- Not a resistant measure: the mean cannot resist the influence of extreme observations.

About the Median as a value for center:

- Formal version of the midpoint

- More resistant than the mean

- More resistant than the mean

Mean vs. Median

- If the distribution is exactly symmetric the mean and median are the same.

- In a skewed distribution, the mean is farther out in the long tail than is the median.

- In a skewed distribution, the mean is farther out in the long tail than is the median.

Range

The difference between the largest value and the smallest observation.

pth Percentile

The value such that p percent of the observations fall at or below it.

-ex. 65th percentile

(.65)(number of observations)=

round answer up

-ex. 65th percentile

(.65)(number of observations)=

round answer up

Quartiles

The first quartile is the 25th percentile and the third is the 75th percentile (Q1, Q3).

Boxplot

A graph of the five-number summary. They are best used for side-by-side comparison as they show less information than a histogram or a templet.

Five-number summary

Minimum

Maximum

Median

Q1

Q3

Maximum

Median

Q1

Q3

Interquartile range (IQR)

The distance between the quartiles (the range of the center half of the data) is a more resistant measure of spread. The IQR is resistant as it is not affected by changes in either tail of the distribution.

Rule for outliers

1.5 x IQR = H

Q1 - H =

Q3 + H =

Q1 - H =

Q3 + H =

Modified Boxplots

Plot suspected outlier individually.

Parallel Boxplots

Use the same number line for comparison.

Standard deviation

That distribution belongs to the combination of the mean to measure center and the standard deviation to measure spread. The standard deviation measures spread by looking at how far the observations are from their mean. The use of squared deviations renders s even more sensitive than mean to a few extreme observations.

Sum of deviation

- The sum of the deviation is always equal to 0

- (n-1) can vary freely "degrees of freedom"

- (n-1) can vary freely "degrees of freedom"

Variance s^2

The average of the squares of the standard deviations of the observations from their mean.

Linear transformation

Changes the original variable x into the new variable xnew given by an equation of the form:

xnew = a + bx

Adding the constant a shifts all values of x upward or downward by the same amount. Multiplying by the positive constant b changes the size of the unit of measurement.

Linear transformations do not change the shape of the distribution.

xnew = a + bx

Adding the constant a shifts all values of x upward or downward by the same amount. Multiplying by the positive constant b changes the size of the unit of measurement.

Linear transformations do not change the shape of the distribution.

Effect of a linear transformation

- Multiplying each observation by a positive number b multiplies both measure of center (mean and median) and measures of spread (IQR, range, and standard deviation) by b.

-Adding the same number a to each observation adds a to measures of center and to quartiles but does not change measures of spread.

-Adding the same number a to each observation adds a to measures of center and to quartiles but does not change measures of spread.

Where is the mean relative to the median?

Mean will move to the left or the right of the median and is it is perfectly symmetrical it will be the same value.

Trmean

The trimmed mean. A measure of center that is more resistant than the mean.