Key Concepts:
Terms in this set (44)
What are descriptive statistics? give an ex
Set of techniques that describe what has happened in the past
used to find patterns
What are inferential statistics? What do they help us do?
Inferential statistics are designed to address objectives, questions, and hypotheses in studies to allow inference from the study sample to the target population.
They help us to: identify relationships
examine predictions
determine differences among groups
what are predictive statistics and give and ex
use models constructed from past data to predict the future
next best offers
what are the 4 v's
Volume, Velocity, Variety, Veracity
what are the 5 steps of decison making
1. Identify & define the problem
2. Determine the criteria for evaluating alternative solutions
3. Determine the set of alternative solutions
4. Evaluate the alternatives
5. Choose among the alternatives
On the descriptive stat, would mode=N/A
if variables are continious, there would be no exact duplicates.
If everything occurs once, you cant get a mode
how do you know if the data has a positive or negative skew on a descriptive stat?
look at the skewness tab and if the number is +, then it is a positive skew, if the number is -, it is a negative skew.
T/F there are no measures of assosiation on the Descriptive Statistics
True
What does skew mean?
- a measure of the lack of symmetry in a distribution
-think of "short" ( if skewed to the right, the right is short, "short to the right")
how do you calculate the coefficent of variation
SD/mean
T/F Confidence lvl is the MOE
True
How is the Upper Confidence Level calculated?
Mean + MOE
On a box and whisker plot what does the line mean?
What does the X mean
line= median
x=mean
What are the 3 measures of central tendency?
1. Mean
2. Median
3. Mode
What are the 3 measures of variability?
range, variance, standard deviation
what are the 2 measures of association
correlation coefficent: the relationship that shows the change of one variable due to the change in the other (-1&+1)
covariance: a measure showing the relationship of 2 random variables that move together (-infin and infin)
How do you compute the z-score
x-mean/sd
what is the empirical rule
1std: 68% of observations are between -1 and 1 std from mean
2std: 95% of observations are between -2 and 2 std from mean
3std: 99% of observations are between -3 and 3 std from mean
T/F When there is a 0 correlation coefficent there is no relationship between x&y?
True
what is a random variable
a quantity whose values are not known with certainty
what is a continuous random variable and ex
A continuous random variable is a random variable where the data can take infinitely many values.
For example, a random variable measuring the time taken for something to be done is continuous since there are an infinite number of possible times that can be taken.
what is a discrete random variable and ex
A discrete variable is a variable which can only take a countable number of values.
Flipping a coin 3 times the number of times it would land on heads has a discrete variable of 4 (0,1,2,3)
what is a random experiment
experiment where probability determines outcomes, experiment is repeated in exactly the same way but an entirely different outcome may occur
Conditional Probability and formula
the probability that one event happens given that another event is already known to have happened
p (A AND B)/A
what does inference refer to
refers from the sample to the population
what is a representative sample
AKA RANDOM SAMPLE: randomly selected sample of subjects from a larger population of subjects
What's the difference between a sample statistic and a parameter?
The sample statistic is the estimate of the parameter
What's the difference between a point estimate and an interval estimate?
Point estimation gives us a particular value as an estimate of the population parameter.
Interval estimation gives us a range of values which is likely to contain the population parameter. This interval is called a confidence interval.
confidence interval
the range of values within which a population parameter is estimated to lie
margin of error and formula
is the confidence lvl
zscore times std dev/ square root of sample size
critical t
MOE (CONFID LVL)/SE
standard error
the standard deviation of a sampling distribution
sd/ square root of sample size
confidence level
degree of certainty that a survey is accurate
null hypothesis
H0: The null hypothesis states that a population parameter (such as the mean, the standard deviation)is equal to a hypothesized value. The null hypothesis is often an initial claim.
alternative hypothesis
The alternative hypothesis states that a population parameter is smaller, greater, or different than the hypothesized value in the null hypothesis. The alternative hypothesis is what you might believe to be true or hope to prove true.
Type 1 error
Rejecting null hypothesis when null is true
type 2 error
fail to reject null hyp when the null hyp is false
what is the difference between the standard devation and standard error
The standard deviation (SD) measures the amount of variability, for a set of data from the mean, while the standard error (SE) measures how far the mean of the data is likely to be from the true population mean.
A clothing company wants to determine the factors that drive sales via the 800 telephone number. The company has data regarding (1) Customer Age; (2) Customer Credit Score; (3) Wait Time (min) to get an operator and (4) Purchase Amount.
• State a null hypothesis with regard to Customer Age.
• State the alternative hypothesis.
• What would be an independent variable in a regression analysis? Dependent variable(s)?
null hyp: the customers age has no relevance
alt hyp: there is a relationship
indep: age
dep: credit score and purch
what is the purpose of a regression analysis
is to predict the outcome of the dependent variable based on values of the independent variable(s).
what are three ways of determining the significance of a coefficient
1) comparing actual t to critical t;
(2) examining p-value < α; and
(3) determining if 0.00 is in the relevant confidence interval.
training data set
Used to estimate models with different explanatory variables.- RAW DATA
validation data set
Apply these data to your model results and compute a summary statistic. This prevents over-fitting. Select the specification that fits best.
ex:
Of the 300 obs, you may want to use 200 for validation and 100 for the test dataset. Often, analyses only use training & validation sets.- SELECT SPECIFIC DATA FROM OVERRALL/RAW
testing dataset
Apply our final specification to a pristine set of data to arrive at our final performance statistics - FINAL
