box plot: a graphical summary of data based on a five-number summary

Chebyshev's Theorem: A theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean

coefficient of variation: a measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100

correlation coefficient: a measure of linear association between two variables that takes on values between -1 and +1. Values near +1 indicate a strong positive linear relationship; values near -1 indicate a strong negative linear relationship; and values near zero indicate the lack of a linear relationship.

covariance: a measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship

Empirical rule: a rule that can be used to compute the percentage of data values that must be within one, two , and three standard deviations of the mean for data that exhibit a bell-shaped distribution.

five-number summary: an exploratory data analysis technique that used five numbers to summarize the data: smallest value, first quartile, median, third quartile, and largest value

grouped data: data available in class intervals as summarized by a frequency distribution. Individual values of original data are not available

interquartile range (IQR): a measure of variability, defined to be the difference between the third and first quartile

mean: a measure of central location computed by summing the data values and dividing by the number of observations

median: a measure of central location provided by the value in the middle when the data are arranged in ascending order.

mode: a measure of location, defined as the value that occurs with greatest frequency

outlier: an unusually small or unusually large data value

percentile: a value such that at least p percent of the observations are less than or equal to this value and at least (100-p) percent of the observations are greater than or equal to this value. The 50th percentile is the median

point estimator: the sample statistic, such as sample mean, when used to estimate the corresponding population parameter

population parameter: a numercal value used as a summary measure for a population i.e. population mean, population variance, population standard deviation

Quartiles: The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and the third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data

range: a measure of variability, defined to be the largest value minus the smallest value

Sample statistic: a numerical value used as a summary measure for a sample i.e. sample mean, sample variance, sample standard deviation

skewness: a measure of the shape of a data distribution. Data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right result in positive skewness

standard deviation: a measure of variability computed by taking the positive square root of the variance

variance: a measure of variabiltiy based on the squared deviations of the data values about the mean

wieghted mean: the mean obtained by assigning each observation a weight that reflects its importance

z-score: a value computed by dividing the deviation about the mean (xi - x bar) by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations xi is from the mean.

Essentials of Statistics for Business and Economics ch 3

