67 terms

quick review of some definitions key to understanding statistics (chapter 1-4)
book used: Essentials of Statistics for Business and Economics 6th edition

data

facts and figures collected, analyzed, and summarized for presentation and interpretation

data set

all the data collected in a particular study

elements

entities on which data are collected

variable

characteristic of interest for the elements

observation

the set of measurements obtained for a particular element

nominal scale

data for a variable consists of labels or names used to identify an attribute of the element.

ordinal scale

data exhibit properties of nominal & order or rank of data is meaningful.

interval scale

data have all the properties of ordinal data and interval between values is expressed in terms of a fixed unit of measure.

always numerical

always numerical

ratio scale

data have all the properties of interval data and ratio of values is meaningful.

scale must contain a zero value.

scale must contain a zero value.

categorical data

data that can be grouped by specific categories.

uses either nominal or ordinal scale of measure.

uses either nominal or ordinal scale of measure.

quantitative data

data that use numerical values to indicate how much or how many.

use either interval or ratio scale to obtain data.

use either interval or ratio scale to obtain data.

cross-sectional data

data collected at the same time or approximately the same time.

time series data

data collected over several time periods (longitudinal).

descriptive statistics

summaries of data, which may be tabular, graphical or numerical.

common ways to express this is through bar charts or histograms.

common ways to express this is through bar charts or histograms.

population

the set of all elements of interest in a study

sample

a subset of the population

census

process of conducting a survey to collect data for the entire population

sample survey

process of conducting a survey to collect data for a sample

statistical inference

using data from a sample to make estimates and test hypotheses about the characteristics of a population

data mining

deals with methods for developing useful decision-making information from large data bases.

frequency distribution

a tabular summary of data showing the number (frequency) of items in each of several non-overlapping classes

bar chart

graphical device for depicting categorical data summarized in a frequency, relative frequency or percent frequency or percent frequency distribution.

each class is separate.

each class is separate.

pie chart

graphical device for presenting relative frequency and percent frequency distribution for categorical data.

histogram

a common graphical presentation of quantitative data.

variable of interest on x-axis and frequency or relative frequency on the y-axis.

variable of interest on x-axis and frequency or relative frequency on the y-axis.

ogive

a graph of a cumulative distribution shows data values on x-axis and either cumulative frequency, relative frequency, or percent frequency on the y-axis.

stem-and-leaf display

can be used to show both the rank and shape of data simultaneously.

crosstabulation

a tabular summary of data for two variables

scatter diagram

a graphical presentation of the relationship between two quantitative variables

sample statistics

measure computed for data from a sample

population parameters

measures computed for data from population

point estimator

sample statistics, such as mean, variance, and standard deviation, when used to estimate the corresponding population parameter.

mean

average value

median

value in the middle of the data arranged in ascending order

mode

value that occurs with the greatest frequency in a set of data

percentile

provides information about how the data are spread over the interval from the smallest to the largest value

interquartile range

Q3-Q1

measure of variability that overcomes the dependency on extreme values

measure of variability that overcomes the dependency on extreme values

variance

the measure of variability that utilizes all the data

SIGMA(xi-xbar)/n-1 (SIGMA=summation) (xi=data value) (xbar=mean) (n=sample total)

SIGMA(xi-xbar)/n-1 (SIGMA=summation) (xi=data value) (xbar=mean) (n=sample total)

standard deviation

positive square root of the variance

sqrt(SIGMA(xi-xbar)/n-1) (sqrt=square root) (SIGMA=summation) (xi=data value) (xbar=mean) (n=sample total)

sqrt(SIGMA(xi-xbar)/n-1) (sqrt=square root) (SIGMA=summation) (xi=data value) (xbar=mean) (n=sample total)

coefficient of variation

descriptive statistic that indicates how large the standard deviation is relative to the mean

((sd/xbar)*100)% (sd=standard deviation) (xbar=mean)

((sd/xbar)*100)% (sd=standard deviation) (xbar=mean)

skewness

an important numerical measure of the shape of a distribution

(n/(n-1)(n-2))*SIGMA((xi-xbar)/sd)^3 (n=sample total)(SIGMA=summation) (xi=data value) (xbar=mean) (sd=standard deviation)

(n/(n-1)(n-2))*SIGMA((xi-xbar)/sd)^3 (n=sample total)(SIGMA=summation) (xi=data value) (xbar=mean) (sd=standard deviation)

z-score

used to measure the relative location for a particular value from the mean

(xi-xbar)/sd (xi=data value)(xbar=mean)(sd=standard deviation)

(xi-xbar)/sd (xi=data value)(xbar=mean)(sd=standard deviation)

Chebyshev's Theorem

theorem enables us to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean.

(1-(1/z^2)) (z=number of standard deviations)

(1-(1/z^2)) (z=number of standard deviations)

empirical rule

used to determine the percentage of data values that must be within a specified number of standard deviations of the mean.

data has to approximate the bell-shaped distribution.

data has to approximate the bell-shaped distribution.

box-plot

graphical summary of data that is based on a five-number summary.

1. smallest value

2. first quartile (Q1)

3. median (Q2)

4. third quartile (Q3)

5. largest value

1. smallest value

2. first quartile (Q1)

3. median (Q2)

4. third quartile (Q3)

5. largest value

covariance

a measure of linear association between two variables.

positive values indicate a positive relationship.

negative values indicate a negative relationship.

Sxy=SIGMA(xi-xbar)(yi-ybar)/n-1 (SIGMA=summation) (xi=data value of x) (xbar=mean of x) (yi=data value of y) (ybar=mean of y) (n=sample total)

positive values indicate a positive relationship.

negative values indicate a negative relationship.

Sxy=SIGMA(xi-xbar)(yi-ybar)/n-1 (SIGMA=summation) (xi=data value of x) (xbar=mean of x) (yi=data value of y) (ybar=mean of y) (n=sample total)

correlational coefficient

measurement of the relationship between two variables that is not affected by the units of measurement for x & y.

Rxy=Sxy/(Sx*Sy) (Sxy=covariance) (Sx=standard deviation of x) (Sy=standard deviation of y)

Rxy=Sxy/(Sx*Sy) (Sxy=covariance) (Sx=standard deviation of x) (Sy=standard deviation of y)

weighted mean

mean computed by giving each observation a weight that reflects its importance.

xbar=SIGMA(wi*xi)/SIGMA(wi) (SIGMA=summation) (wi=weight for observation i) (xi=value of observation i)

xbar=SIGMA(wi*xi)/SIGMA(wi) (SIGMA=summation) (wi=weight for observation i) (xi=value of observation i)

grouped data

data available in class intervals as summarized by a frequency distribution.

probability

a numerical measure of the likelihood that an event will occur

must be between 0 - 1.

must be between 0 - 1.

experiment

any process that generates well defined outcomes

sample space

set of all experimental outcomes

multiple-step experiment

an experiment described as a sequence of k steps with n1 possible outcomes on the first step, n2 possible outcomes on second step, and so on, then total number of experimental outcomes is given by (n1)(n2)(n3)...(nk)

tree diagram

a graphical representation that helps in visualizing a multiple-step

combination

allows one to count the number of experimental outcomes when the experiment involves selecting n objects from set of N objects

N!/(n!(N-n)!) = C (N!=total objects factorial) (n!=selected objects factorial)

N!/(n!(N-n)!) = C (N!=total objects factorial) (n!=selected objects factorial)

permutations

allows one to compute the number of experimental outcomes when n objects are to be selected from a set of N objects where the order of selection is important.

N!/(N-n)! (N!=total objects factorial) (n!=selected objects factorial)

N!/(N-n)! (N!=total objects factorial) (n!=selected objects factorial)

classical method

method used when all experimental outcomes are equally likely.

if n outcomes are possible, 1/n probability is assigned to each experimental outcome.

if n outcomes are possible, 1/n probability is assigned to each experimental outcome.

relative frequency method

method of assigning probabilities when data are available to estimate the proportion of the time the experimental outcome will occur if the experiment is repeated a large number of times.

think of frequency

think of frequency

subjective method

method of assigning probabilities most appropriate when one cannot realistically assume that the experimental outcomes are equally likely and when little relevant data are available.

think of different people assigning different probabilities to the same experimental outcomes.

think of different people assigning different probabilities to the same experimental outcomes.

event

a collection of sample points

the complement of A

defined to be the event consisting of all the sample points that are NOT in A.

denoted Ac

denoted Ac

union of A & B

the event containing all sample points belonging to A or B or both.

A U B.

P(A U B)=P(A)+P(B)-P(AintersectB)

A U B.

P(A U B)=P(A)+P(B)-P(AintersectB)

intersection of A & B

event containing only the sample points that A & B share.

denoted A intersect B.

denoted A intersect B.

mutually exclusive

if the events have no sample points in common but when one event occurs, the other one cannot.

P(A intersect B)=0

P(A intersect B)=0

conditional probability

the probability of an event given that another event already occurred.

P(A|B)=P(A intersect B)/P(B)

P(A|B)=P(A intersect B)/P(B)

joint probabilities

the probability of the intersection between two events.

marginal probabilities

values referred to because they are located in the margins of the JPT. (JPT=joint probability table)

independent events

two events that have no influence on each other.

P(A|B)=P(A) or P(B|A)=P(B)

P(A|B)=P(A) or P(B|A)=P(B)