Introduction to the Practice of Statistics 8th Edition Chapter 1
Terms in this set (33)
In the Normal distribution with mean μ and standard deviation σ: Approximately 68% of the observations fall within σ of the mean μ. Approximately 95% of the observations fall within 2σ of μ. Approximately 99.7% of the observations fall within 3σ of μ.
The objects described by a set of data. They may be customers, companies, subjects in a study, or other objects.
A curve that describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range. It is always on or above the horizontal axis and has area exactly 1 underneath it.
A graph of the five-number summary with the following properties: 1. A central box spans the quartiles Q1 and Q3. 2. A line in the box marks the median M. 3. Lines extend from the box out to the smallest and largest observations or to a cutoff for suspected outliers.
Of a set of observations, this consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, it is: Minimum Q1 M Q3 Maximum
Abbreviated IQR, this is the distance between the first and third quartiles. IQR = Q3 — Q1
A conversion of a numerical description of a distribution from one unit of measurement to another. Changes the orginal variable x into the new variable xnew given by an equation of the form xnew = a + bx
Denoted "x-bar", this is the average value of a distribution. It can be found by adding the values of a set of observations and then dividing by the number of observations.
Denoted M, this is the midpoint of a distribution. Half of the observations are smaller than M, and the other half are larger than M.
A class of density curves that are symmetric, unimodal, and bell-shaped. They describe Normal distributions. The mean is at the center of the symmetric curve and is the same as the median.
Graph that shows the distribution of a categorical variable by representing each category as a bar. The bar heights show the category counts or percents.
Tells us what values a variable takes and how often it takes these values.
Variable that takes numerical values for which arithmetic operations such as adding and averaging make sense.
exploratory data analysis
Uses graphs and numerical summaries to describe the variables in a data set and the relations among them.
Variable that places a case into one of several groups or categories.
A special variable used in some data sets to distinguish the different cases.
A characteristic of a case.
Major peaks of a distribution. A distribution with one major peak is called unimodal.
Normal quantile plot
Plot that best assesses the adequacy of a Normal model for describing a distribution of data. A pattern on such a plot that deviates substantially from a straight line indicates that the data are not Normal.
Description of a distribution if the right tail (larger values) is much longer than the left tail (smaller values), or vice versa.
An individual value that falls outside the overall pattern of the distribution.
A graphical display of a quantitative variable that breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class. Chosen classes should always be of equal width.
Graph that shows the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories.
A display of the distribution of a quantitative variable that separates each observation into a stem and a one-digit leaf.
Description of a distribution if the values smaller and larger than its midpoint are mirror images of each other.
Display of the distribution of a variable that plots each observation against the time at which it was measured. Always put time on the horizontal scale of the plot and the variable you are measuring on the vertical scale.
The proportion of observations in a distribution that lie at or below a given value. When the distribution is given by a density curve, this is the area under the curve to the left of a given value.
Described by bell-shaped, symmetric, unimodal density curves. Completely specified by the mean μ and standard deviation σ. The mean is the center of symmetry, and σ is the distance from μ to the change-of-curvature points on either side.
standard Normal distribution
The Normal distribution N(0,1) with mean 0 and standard deviation 1.
A standardized value that tells us how many standard deviations the original observation falls away from the mean, and in what direction. Observations larger than the mean are positive when standardized, and observations smaller than the mean are negative.
Used to describe the spread of a distribution. The first quartile, Q1, has one-fourth of the observations below it, and the third quartile, Q3, has three-fourths of the observations below it. Q1 is the median of the observations whose positions in the ordered list are to the left of the location of the overall median. Q3 is the median of the observations whose positions in the ordered list are to the right of the location of the overall median.
Denoted s, measures the spread of a distribution by looking at how far the observations are from their mean. It is also the square root of the variance.
Denoted s2. In a set of observations, this is the average of the squares of the deviations of the observations from their mean.