Get ahead with a $300 test prep scholarship
| Enter to win by Tuesday 9/24
AP Statistics Chapter 1: Test
Terms in this set (53)
Data that is represented by numerical values.
Labels for the categories such as "male" and "female". The distribution of a categorical variable lists the categories and gives either the count or the percent of individuals who fall in each category.
This may cause percentages to not add up to 100%. Roundoff errors don't point to mistakes in our work, just to the effect of rounding off results.
A pie chart must include all the categories that make up a whole. Use a pie chart only when you want to emphasize each category's relation as a whole. The values need to have a total population of 100%.
Surveys are used to generalize.
Used to observe a response.
1.) Who/How many
2.) What are the variables
3.) Why is the data gathered
4.) When, where, how, by whom were the data produced.
How to calculate the degree in each section of a pie chart:
(Percent as a decimal) x (360) = angle measurement for part in pie chart
These are easier to make and easier to read. They are more flexible than pie charts. Both graphs can display the distribution of a categorical variable, but a bar graph can also compare any set of quantities that are measured in the same units.
What do bar graphs and pie charts do?
Help an audience grasp data quickly.
Gives a quick picture of the shape of a distribution while including the actual numerical values on the graph. These work best for small numbers of that are all greater than 0. They do not work well for large data sets where each stem must hold a large number of leaves.
A congregation of values in one area.
Back-to-back stem plot
When you wish to compare two related distributions with common stems.
Doubling the number of stems with the leaves 0-4 on one stem and the leaves 5-9 on the next.
Removing the last digit or digits before making a stem plot.
Average (x bar).
Most recurring value.
Breaks the range of values of a variable into classes and displays only the count or percentage of the observations that fall into each class. You can chose any convent number of classes, but you should always chose classes of equal width.
The number of individuals in each class.
A table of frequency for all classes,
Histogram vs. Bar Graph
Histogram: shows the distribution of counts or percentages among the values of a single quantitate variable.
Bar Graph: Displays the distribution of a categorical variable.
Distribution with one major peak.
If the values smaller and larger than its midpoint are mirror images of each other (bell curve).
If it i skewed to the right then the right tail is much longer than the left tail.
Dealing with outliers
Identifying outliers is a matter of judgement. Look for points that are clearly apart from the body of the data, not just the most extreme observation in the distribution.
Relative cumulative frequency graph. You can find the path percentile using the type of graph. To find the center draw a vertical line down the horizontal axis to 50%.
Displays of the distribution of a variable that ignore time order, such as templets and histograms, can be misleading when there is a systematic change over time. It plots each observation against the time at which it was measured. Always put time as your horizontal scale.
A regular rise and fall that occurs each year.
Tells us what values it takes and how often it takes these values.
What are the two common measures of center?
Mean and Median.
About the Mean as a value for center:
- The mean is sensitive to the influence of a few extreme observations.
- Not a resistant measure: the mean cannot resist the influence of extreme observations.
About the Median as a value for center:
- Formal version of the midpoint
- More resistant than the mean
Mean vs. Median
- If the distribution is exactly symmetric the mean and median are the same.
- In a skewed distribution, the mean is farther out in the long tail than is the median.
The difference between the largest value and the smallest observation.
The value such that p percent of the observations fall at or below it.
-ex. 65th percentile
(.65)(number of observations)=
round answer up
The first quartile is the 25th percentile and the third is the 75th percentile (Q1, Q3).
A graph of the five-number summary. They are best used for side-by-side comparison as they show less information than a histogram or a templet.
Interquartile range (IQR)
The distance between the quartiles (the range of the center half of the data) is a more resistant measure of spread. The IQR is resistant as it is not affected by changes in either tail of the distribution.
Rule for outliers
1.5 x IQR = H
Q1 - H =
Q3 + H =
Plot suspected outlier individually.
Use the same number line for comparison.
That distribution belongs to the combination of the mean to measure center and the standard deviation to measure spread. The standard deviation measures spread by looking at how far the observations are from their mean. The use of squared deviations renders s even more sensitive than mean to a few extreme observations.
Sum of deviation
- The sum of the deviation is always equal to 0
- (n-1) can vary freely "degrees of freedom"
The average of the squares of the standard deviations of the observations from their mean.
Changes the original variable x into the new variable xnew given by an equation of the form:
xnew = a + bx
Adding the constant a shifts all values of x upward or downward by the same amount. Multiplying by the positive constant b changes the size of the unit of measurement.
Linear transformations do not change the shape of the distribution.
Effect of a linear transformation
- Multiplying each observation by a positive number b multiplies both measure of center (mean and median) and measures of spread (IQR, range, and standard deviation) by b.
-Adding the same number a to each observation adds a to measures of center and to quartiles but does not change measures of spread.
Where is the mean relative to the median?
Mean will move to the left or the right of the median and is it is perfectly symmetrical it will be the same value.
The trimmed mean. A measure of center that is more resistant than the mean.