the science of collecting, organizing, analyzing, interpreting and presenting data.
A single measure used to summarize a sample data set.
The collection, organization, presentation and summary of data
Generalizing from a sample to a population estimating unknown parameters, drawing conclusions, and making decisions.
represents data collected through observation and experiments.
Making conclusions about a large population from a small sample
Interpret a correlation as a specific causal link (A causes B, B causes A, some third factor causes both)
Generalization to Individuals
Significance versus Importance.
A particular collection of data values as a whole.
Subject (or individual)
An item for study (i.e. an employee in your company)
A characteristic about the subject or individual. (i.e. an employees income)
each data value
One variable. (Histograms, descriptive statistics, frequency tallies)
Two variables. (Scatter plots, correlations, regression modeling)
More than two variables. (Multiple regression, data mining, econometric modeling)
Time Series Data
Each observation in the sample represents a different equally spaced point in time.
Cross Sectional Data
Each observation represents a different individual unit (i.e. person) at the same point in time.
Qualitative, attribute, categorical or classification data and can be coded numerically. (i.e. 1=apple 2= compaq 3= dell 4 = HP)
Ordinal data codes can be ranked. (i.e. 1 = frequently 2= sometimes 3 = Rarely 4 = Never)
Data can not only be ranked but also have meaningful intervals between scale points. (i.e. difference betweet 60 & 70 is same as 20 & 30)
Have all the properties of other three data types.
How are stats are computed
From a sample of n items, chosen from a population of N items
Simple Random Sample
Every item in the population of N items has the same chance of being chosen in the sample of n items.
Sample by choosing every nth item from a list, starting from a randomly chosen entry on the list.
Utilizes prior information of the population.
A simple random sample of the desired size is taken.
One Stage Cluster
Sample consists of all elements in each of k randomly chosen sub regions (clusters)
Two Stage Cluster
First choose k sub regions (clusters), then choose a random sample of elements within each cluster.
A non-probability sampling method that relies on expertise of the sampler to choose items that are representative of the population.
special kind of judgement sampling in which the interviewer chooses a certain number of people in each category.
Take advantage of whatever sample is available at that moment. Quick way to sample.
A panel of individuals chosen to be representative of a wider population.
A table formed by classifying n data values into k classes (bins)
Define the values to be included in each bin. Widths must all be the same except when we have open ended bins.
The number of observations within each bin.
k = 1 +3.3 x log(n)
For any population and standard deviation, the percentage of observation that lie within k standard deviations of the mean, must be at least 100[1-1/k^2]
States that for data from a normal distribution, we expect the interval to contain a known percentage of data.
Lie beyond the distance between the observation and the mean is at least three times as large as the standard deviation.
Redefines such observations in terms of its distance from the mean in "standard deviations".
Are data that have been divided into 100 groups.
Data that has been divided into 10 groups.
Datat that has been divided into 5 groups.
Data that has been divided into 4 groups.
An observational process whose results cannot be known in advance.
Any subset of outcomes in the sample space.
A single outcome.
Union of 2 Events
Consists of all outcomes in the sample space S that are contained either in event A or in event B or boths.
General Law of Addition
P(A or B) = P(A) + P(B) - P(A and B)
Mutually Exclusive Events
Events A and B do no intersect.
Special Law of Addition
In the case of mutually exclusive events - P(A or B) = P(A) + P(B).
The probability of even A given that event B has occurred.
Multiplication of Law for Independent Events
The probability of n independent events occurring simultaneously is - P(A1 and A2 and ..... An) = P(A1) P(A2) ...... P(An).
Coefficient of Variation
Compares dispersion in data sets with different units of measurement or different means.
N = Data
K= Percent to trim n*k =x. Remove x number of smallest and largest observations.
Mr = (xMin + xMax)/2
P(A given B) = (P(A and B)) / P(B)