30 terms

In statistics, what is meant by individuals?

objects described by a set of data

In statistics, what is meant by a variable?

any characteristic of an individual

What is meant by exploratory data analysis?

Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them

What is the difference between a categorical variable and a quantitative variable?

A categorical variable places an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging makes sense.

When is it useful to use a bar chart? Or a pie chart?

Bar charts and pie charts are both used to display categorical data. A pie chart cannot be used unless we have information about all the categories to total 100%

What is meant by a distribution? How do you describe the overall pattern of a distribution?

The distribution of a variable tells us what values the variable takes and how often it takes these values. Shapes, Outliers, Center, Spread

Define range

the difference between the largest and smallest values of data distribution

When is it better to use a histogram rather than a dotplot?

When you have many data values.

What is meant by frequency in a histogram?

The frequency = the number of counts in each class.

When setting a window for constructing a histogram on the TI-83: What is the significance of Xscl? How do you choose the values of Xmin and Xmax?

Xscl = width of each class ; Xmin = smallest data value and Xmax = largest data value

Define outlier

an individual observation that falls outside the overall pattern of the graph

If a distribution is symmetric, what does its histogram look like? Skewed Right? Skewed Left?

Symmetric = right and left sides are approximately mirror images of each other ; skewed right = right side (larger values) extends much farther out than the left side

How is the stemplot of a distribution related to its histogram?

A histogram is a shaded in stemplot - on the histogram we lose the individual data values but we keep the overall shape of the distribution

When is it advantageous to split stems on a stemplot?

When each stem has many leaves

What is the purpose of a back-to-back stemplot?

To compare the shapes of 2 distributions

When is it useful to construct a time plot?

For variables that are measured over time, such as the height of a growing child, seasonal variation, price of a stock

In statistics, what is the most common measurement of center?

arithmetic average, or mean

Explain how to calculate the mean, x.

To find the mean of a set of observations, add their values and divide by the number of observations.

Explain how to calculate the median, M.

median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median:

1) arrange all the observations in order of size, from smallest to largest

2) if the number of observations is odd, the median M is the center observation of the order list.

3) if the number of observations is even, the median M is the mean of the 2 center observations in the ordered list.

1) arrange all the observations in order of size, from smallest to largest

2) if the number of observations is odd, the median M is the center observation of the order list.

3) if the number of observations is even, the median M is the mean of the 2 center observations in the ordered list.

Explain why the median is resistant to extreme observations, but the mean is nonresistant

The median is resistant because it is only based on the middle one or two observations of the ordered list. The mean is sensitive to the influence of a few extreme observations. Even if there are no outliers a skewed distribution will pull the mean toward the long tail.

In statistics, what is meant by spread?

Spread is a way to measure the variability of the observations around the center. One of the most common ways to measure spread is to calculate the range of the data. The range is obviously sensitive to extreme measures.

Explain how to calculate Q1 and Q3

1) arrange the observations in increasing order and locate the median M in the list.

2) Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

3) Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

2) Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

3) Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

What is the interquartile range?

the distance between the first and third quartiles, Q3 - Q1

How can we use IQR to determine outliers?

An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile

What is the five-number summary?

The 5 # summary is Minimum, Q1, Median, Q3, Maximum.

How do we use the five-number summary to make a modified boxplot?

A modified boxplot is a graph of the 5-number summary, with outliers plotted individually.

- a central box spans the quartiles

- a line in the box marks the median

- observations more than 1.5*IQR outside the central box are plotted individually

- lines extend from the box out to the smallest and largest observations that are not outliers

- a central box spans the quartiles

- a line in the box marks the median

- observations more than 1.5*IQR outside the central box are plotted individually

- lines extend from the box out to the smallest and largest observations that are not outliers

What does standard deviation measure? How do we calculate it?

It measures spread around the mean and should only be used when the mean is chosen as the measure of center

What is the relationship between variance and standard deviation?

standard deviation s is the square root of the variance s-squared

When does standard deviation equal zero?

when there is no spread. This happens only when all observations have the same value

Is standard deviation resistant or nonresistant to extreme observations? Explain.

s, like the mean, is not resistant. Strong skewness or a few outliers can make s very large.