Home
Subjects
Explanations
Create
Study sets, textbooks, questions
Log in
Sign up
Upgrade to remove ads
Only $35.99/year
Stats
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Spring'15
Terms in this set (95)
Levels of Measurement
Nominal
Ordinal
Interval
Ratio
Nominal
differentiates between items or subjects based only on their names or (meta-)categories and other qualitative classifications they belong to.
Ordinal
allows for rank order (1st, 2nd, 3rd, etc.) by which data can be sorted, but still does not allow for relative degree of difference between them.
Interval
allows for the degree of difference between items, but not the ratio between them.
Ratio
takes its name from the fact that measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit magnitude of the same kind.
Measurements of central Tendency
Mode
Median
Mean
Mode
Most frequent score in a distribution
Median
ordering all scores of a distribution by their size and then determining the central score(s).
Mean
Average
Measures of Dispersion
Range
Interquartile Range
Variation- Absolute deviation - Variance -Standard Deviation
Comparison
Range
Difference between high and the lowest score
Interquartile Range
Difference between the highest score of the lower 25% of a distribution and the lowest score of the highest 25% of a distribution
Interquartile Range - Formula
IR=Q3-Q1
Variation
A measure that takes into account each and every score. Measure of how much does a score deviate from the mean of the scores.
Average Deviation
Average of all absolute deviations of individual scores from arithmetic mean of distribution. AD=( E|x1-AM|)/n
Deviation of a score from the mean
X - μ.
-if mean is higher than X you get a minus score. Which is just telling you that the score is lower than the mean. And we are not interested in if the score is higher or lower but in how far away is the score from the mean. what we need is a way of adding up the deviations so that they don't cancel each other.
-There are two solutions:
1. Absolute deviation ||
2. Variance ( )²
Absolute Deviation
We use absolute value of the scores, to avoid cancelation.
Absolute deviation: |X-μ|
To find the average deviation we divide it by the # if score denoted by N.
Mean Absolute Deviation= Σ |X-μ|/N
Variance
We square scores, as square of a number is always positive.
Variance is defined as the average of all squared deviations of individual scores from the arithmetic mean of a distribution.
Find the deviation of each score from the mean, square each deviation, then add up the squared deviations, and then divide this figure by the number of scores (N) to find average of square deviations= variance= S2= Σ (X-μ)²/N
Variance of a Population
defined as the sum of all squared deviations of individual scores from the arithmetic mean of a distribution, divided by the sample size, reduced by 1.
Variance of a Population = σ2 =[Σ(xi -AM)2]/(n-1).
Standard Deviation
Defined as the square root of the variance.
We need to do so, to invert the effect of the variance, in order to get the standard deviation and not the squared deviation.
s = √s2=√Σ (X-μ)²/N
Z-Score or standard Score
The position of a score within a distribution of scores. Describes the distance of a raw score from the distribution mean in relation to standard deviation. Tells how many standard deviations a score is from the mean of the distribution.
z=X-μ/ σ
Population
the entire aggregation of items from which samples can be drawn.
Parameter for population characteristics.
Sample
A selection from a larger population that is statistically representative of that population.
Statistic for sample characteristics.
Standard Error of the Mean
-The standard deviation of the distribution of sample means. It is a measure of the standard (average) difference of a sample mean from the mean of all sample means of samples of the same size from the same population.
-To calculate the precision of the estimate of the population mean from a sample.
-How well the sample mean estimates the population
σAM=√(σ²/n)
σAM = standard error of the arithmetic mean
σ² = variance of the population
n = sample size.
The formula assumes that the variance of the population is known. (rarely the case). under the assumption that the sample is a random sample from the population, the population variance can however be estimated from the sample variance. sample variance in that case has to be multiplied by a factor of n/(n-1).
σAM=√([Σ(xi -AM)2]/n(n-1))
Degrees of Freedom
The degrees of freedom is the number of scores we need to know before we can work out the rest using the information we already have. It is the number of scores that are free to vary in the analysis.
Number of scores that can vary.
Central Limit Theorem
The larger the sample we select the closer the distribution approaches the normal distribution. Bell-shaped curve emerging when two samples are drawn from any distribution of scores, the two sample means are calculated and then subtracted from one another.
If all samples of a particular size are selected from any population, the sampling distribution of the sample mean is approximately a normal distribution.
Confidence Interval
A range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability.
Hypothesis
A predicted relationship between variables.
Types of Hypothesis:
-Null
-Alternative
-One-tailed
-Two-tailes
Null Hypothesis (H0)
all is correct/all is as expected.
The working hypothesis that parameters of a sample and a known population, or of two or more samples do not differ.
Hypothesis that predicts NO relationship between variables. The aim of research is to reject this hypothesis
Alternative Hypothesis (H1)
The working hypothesis that parameters of a sample and a known population, or of two or more samples do differ.
Hypothesis that we are trying to prove (opposite of the null hypothesis).
The hypothesis that states there is a difference between two or more sets of data.
One-tailed Hypothesis
A working Hypothesis that expects a sample parameter to be (e.g.) larger than a population parameter, or the parameter of one sample to be (e.g) larger than that of one or more other samples.
a hypothesis with direction *ex: A will increase B
Two-tailed Hypothesis
A working Hypothesis that expects a sample parameter to differ from a population parameter OR from the same parameter in one or more other samples.
Hypothesis that predicts a significant relationship or difference, but does not indicate the specific nature of the relationship.
Statistical significance
A statistical statement of how likely it is that an obtained result occurred by chance.
The significance level is the risk (probability) of erroneously claiming a relationship between an independent and a dependent variable when there is not one. Statistical test are undertaken so that this probability is chosen to be small, usually set at 0.05 indicating that this will occur no more than 5 times in 100. This sets the probability of making a type I error.
Error in statistical Inference
Errors are inevitable, because, when you secure knowledge by analysing sample data, but want to generalise to populations, there can never be "sureness".
Type I (α error)
Type II (β error)
Type I (α error)
-Find difference is sample but not in population.
-Errors that you commit when- in studying a sample and deciding on the basis of the sample about a population- you accept the alternative hypothesis, although in the population the null hypothesis is true.
-p≤0.1
Type II (β error)
-Finde difference in population but not in the sample.
-Error that you commit when you decide sample-based the null hypothesis is true, although in the population the alternative hypothesis is true.
-p≤0.5
statistical comparison of two means
Comparison of two means can take several forms.
- whether the mean of a given sample differs from that of a (known) population.
-whether the means of to samples differ on a given variable.
-whether two samples of related objects/people differ in their means.
-whether means on two equally-scales variables differ in a given sample.
-whether the mean of a variable differs in a sample of people between time 1 and time 2.
t-test
Used to test whether the difference between a sample mean and a known population mean or two sample means from independent and from dependent samples is statistically significant.
A group of statistics used to determine if a significance difference exists between the means of two sets of data.
Student t-distribution
Means of smaller samples [unlike those of larger samples which follow the normal distribution in accordance with the central limit theorem] follow another bell-shaped sampling distribution
a family of probability distributions described by continuous, symmetrical, bell-shaped curves which are slightly flatter than the normal curve, but that approach the normal curve as the sample size increases.
Independent Sample
Samples that originate from independent drawings of samples.
Sample data that is independent or not related to each other.
Dependent Samples
Can Come in three ways:
-as a result of a dependent drawing of samples.
-by measuring the same object/individual on more than one variable.
-by measuring the same object/individual more than one time.
Independent sample t-test
One-tailed H1
Two-tailed H1
Independent sample t-test One-tailed H1
-To test whether mean of one of the two independently drawn samples are bigger than or smaller than </> the mean of the other sample.
Degrees of freedom=df=(n1-1)+(n2-1)
One-tailed Alternative hypothesis: μ1> μ2 | p≤0.05
Independent sample t-test Two-tailed H1
-To test whether mean of one of the two independently drawn samples differs from the mean of the other sample.
- No a priori expectation.
Two-tailed Alternative hypothesis: μ1 ≠ μ2 | p≤0.05
Dependent sample t-test
One-tailed H1
Two-tailed H1
Repeated measures
Dependent sample test One-tailed H1
-To test whether mean of one of the items of one of the dependent samples is bigger than or smaller than </> the mean of the item of the other sample.
One-tailed Alternative hypothesis: μ item 1 > μ item 2 | p≤0.01
Dependent sample test Two-tailed H1
-To test whether mean of one of the items of one of the dependent samples differs from the mean of the item of the other sample.
Two-tailed Alternative hypothesis: μ item 1 ≠ μ item 2 | p≤0.05
Dependent sample test Repeated measures
over time comparison of the means..
Assumptions of the t-test
To correctly decide quantitative hypothesis with the t test:
1. The known population is normally distributed.
2.The sample is randomly selected.
3.The standard deviation of the unknown population is the same as the known population..
4. The variance in the (two) samples needs to be equal
5.The quantitative material has to be of at least interval scale.
Kolmogoroff-smirnov. To check normal distribution
Levene's test. To check homogeneity of variance. above p=0.05
For both test an insignificant test statistic indicates conformity of the data to the pertinent assumption.
What do we do if Either of the three so-calles assumptions of the t-test is obviously violated?
We use a test instead that does not make these assumptions: Non-parametric or Distribution free test.
Non parametric test
-Statistical test that do not use, or make assumptions about, the characteristics (parameters) of populations.
-are more conservative
-if all mathematical and measurement assumptions are met, more efficient.
-no assumption made.
Parametric test
-Statistical test that use characteristics (parameters) of populations or estimates of them (when assumptions are also made about the populations under study).
-are more robust
-Although with mild violations of the assumptions this may be more efficient
-acquire progressive bias in favour of the alternative hypothesis when two of their assumptions are violated.
Ranking
...
Non parametric Test
Statistical tests that do not use, or make assumptions about, the characteristics (parameters) of populations
Non parametric two sample test
Mann-Whitney U test (Independent)
Wilcoxon Signed-Ranks Test (dependent)
Mann-Whitney U-test
Calculates the difference between the sample's actual ranks and the maximum ranks they could have got, with a small value of U indicating a group is close to the top.
Uses ordinal scale data: 2 independent groups; equivalent independent sample t-test.
Wilcoxon Signed-Ranks Test
Used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ.
Analysis of variance or ANOVA
Compares more than two groups to evaluate whether there are differences between any of the groups.
Test that could simultaneously check whether the overall null hypothesis of all means being equal could be rejected for a given empirical result.
One-Way ANOVA
Tests whether the means of one outcome or dependent variable are significantly different across group.
Test the null Hypothesis that k populations have the same means by comparing the variability between groups to the variability within groups
Relates the so called treatment variance (variance between groups) and error variance (variance within groups).
ANOVA design where we tested the impact of one categorical 'treatment' (independent variable) on one continuous outcome (dependent variable).
Null hypothesis of one-way ANOVA
No difference between any of the means of k samples.
H0: μ1 = μ2 = ..... μk
Alternative hypothesis of one-way ANOVA
At least one of the k means differs from the others.
H1: μj ≠ μj'
Degrees of Freedom Error
d.f. error= k(n - 1)
F statistic
Statistic used to indicate average amount of difference between group means relative to the average amount of variance within each group.
The ratio of the sum of squares between divided by d.f between divided by the sum of squares within divided by the degrees of freedoms within.
F= SSB/k-1//SSW/k(n-1)
If variance between groups is bigger than variance within groups, that tells us that the variation of the data is due mostly to the difference between the means and less to the variation within the means, and vice versa.
Degrees of Freedom Treatment
d.f. treatment = k - 1
F-test shows a significant result
when it indicates that the empirical result has a probability of a pre-selected level or less, we only know that there is at least one mean deviating from the others, but we do not yet know which one it is.
To know which mean is the one deviating
Post hoc test. Latin: "after this".
> Multiple comparison test > Tukey & Sheffe test
One-Way ANOVA single comparisons
To figure out which sample mean is deviating by using post hoc test = sheffe
One-Way ANOVA trend tests or polynomial contrasts.
Whether the means across several samples follow a trend.
in principle, checks whether means go down (or up) linearly.
Linear trend
Up or down
Quadratic trend
U-shaped or inverted U-shaped
Cubic trend
That of repetitive but regular ups and downs.
Two-way ANOVA
Designs where the influence of more than one IV on one DV can be tested.
Hypothesis:
The three alternative hypotheses assume that at least one 'cell' mean differs from the other 'cell' means.
-All means for the k levels of IV A are expected not to differ. H0A:μ1 =μ2 =.....μk
-All means for the l levels of IV B are expected not to differ. H0B:μ1 =μ1 =.....μl
-All means of the k x l combinations of IVs A and B are expected not to differ.
H0AxB: μ1 = μ2 = ..... μk x l
degrees of freedom : k x l (n - 1), where k is the number of levels of IV A, l is the number of levels of IV B.
In SPSS: GLM Procedure. General Linear Model.
SECOND HALF
OF THE SEMESTER
Multivariate ANOVA or MANOVA
Test the impact of one or more IVs on multiple DVs.
Marginal Significance
Reported when the result is significant on the 10% level but not on the 5% level.
Eta Squared - η2
It is often used to estimate whether the size of an effect is small, medium, or large.
The amount of variance explained in the dependent variable on the basis of an independent variable or an interaction of them.
Significance of interaction is not enough to decide if an effect is due to interaction, but the effect size of an significant effect.
η2 ~ .01 =Small almost never substantial even if sig.
η2 ~ .06=Medium
η2 ~ .14=Large
Repeated measures ANOVA
ANOVA comparing within each person/sample
Analysis Covariance ANCOVA
Covariance is a measure of linear association between two variables, (i.e. how much a change in one variable is linearly associated with a change in another variable).
ANCOVA is a general linear model which blends ANOVA and regression.
ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV).
Nonparametric ANOVA
In cases where data are clearly not interval scaled, or they are clearly not normally distributed with small samples (n per 'cell' < 30), or variances are obviously heterogeneous between 'cells,' parametric ANOVA (as learned up to now) should not be used, because there is danger that it produces false results. In such cases, particularly in the case of repeated measures ANOVA one should rather resort to nonparametric alternatives.
Nonparametric one-way ANOVA
nonparametric one-way ANOVA works exactly like a Wilcoxon Rank Sum test for more than 2 samples.
The Nonparametric One-Way ANOVA task enables you to perform nonparametric tests for location and scale when you have a continuous dependent variable and a single independent classification variable. You can perform a nonparametric one-way ANOVA using Wilcoxon (Kruskal-Wallis), median, Van der Waerden, and Savage scores. In addition, you can test for scale differences across levels of the independent variable using Ansari-Bradley, Siegal-Tukey, Klotz, and Mood scores. The Nonparametric One-Way ANOVA task provides asymptotic and exact p-values for all tests for location and scale.
Nonparametric two-way ANOVA (no viene)
Two-way ANOVA is quite cumbersome to perform as a nonparametric test. It needs a procedure called data alignment before it can be performed. Before running tests for the two main effects and the interaction effect one has to eliminate the impact of the non-tested effects prior to ranking.
Nonparametric reapeated measures ANOVA (no viene)
For the so-called Friedman test, we perform another type of rank transformation: we just rank order the scores of every person across time.
Binomial test
is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories.
Binomial test is more precise in the case of dichotomous variables than the 2-test.
x2-test (Chi-square test)
test is the appropriate test to check whether the empirical frequencies of any event are compatible with frequencies expected under the null hypothesis.
Testing frequencies of an event in essence means that we are testing whether a certain nominally-scaled variable is distributed in a certain way.
Compares expected distributions under the null hypothesis to empirically observed distributions.
The 2-test should not be used with sample sizes that lead to expected cell frequencies of less than 5 under the tested null hypothesis. If there are too many such expected cell frequencies, the determined 2-score may be invalid
Correlation
The correlation assesses the degree of similarity in the scores of two different variables. A correlation coefficient shows both the direction and strength of association between two variables. It can take values from -1 (perfect negative association) through 0 (no association) to +1 (perfect positive association)
Product-moment/zero order/Pearson-Bravis Correlation
can be used to not only describe the similarity of the scores of two variables, but to also test whether the similarity deviates significantly from 0.
Parametric
measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.
measure of the degree of linear dependence between two variables.
Spearman's rank correlation
nonparametric measure of statistical dependence between two variables.
It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.
Kendall's T
exists in three versions (a, b, c).
to measure the association between two measured quantities.
A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.
Contingency Cramers Index
In substantive terms the coefficient expresses, how the frequency distribution of variable A is impacted by variable B and vice versa.
a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive).
"Please note that this coefficient does NOT express how similar the gender distributions are across the colleges. What it does express is the degree to which knowledge about one's college of residence offers at the same time knowledge on the gender of that participant (or vice versa)."
Correlation and determination
Correlation coefficient is a measure of the similarity of scores for two variables.
Sometimes the question, however, is how much information overlap exists between two variables.
This figure is typically sought in percent. In order to obtain such a figure, we simply square the correlation coefficient. The result—often notated R2, but sometimes also r2—is the coefficient of determination. Reading it as a percent value is common.
Similarity and Prediction
a question of prediction of one variable on the grounds of knowledge about the other.
"sat score to predict academic performance."
Regression
-Regression equation does not allow point-predictions, but only best estimates of the 'true scores' on the predicted variable.
-What is important in the equation is not so much the constant a, but much more so the weight b, which is an indicator for the predictive power of x in the prediction of y.
-Statistical process for estimating the relationships among variables.
-Regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed.
Standardized regression coefficients, so-called ß (beta) coefficients
These ß coefficients are b weights standardised with respect to the standard deviations of the predictor and the dependent:
ß = b * σ predictor / σ dependent
The standardized coefficient in our example is 0.28. It tells us that a one standard deviation increase in 'satisfaction with food' results in 0.28 standard deviations increase in 'pleasure with studying.'
Partial Correlation
-Measures the degree of association between two random variables, with the effect of a set of controlling random variables removed.
-cigüeñas bebes y urbanización.
-Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z1, Z2, ..., Zn}, written ρXY·Z, is the correlation between the residuals RX and RY resulting from the linear regression of X with Z and of Y with Z, respectively.
Multiple Regression
A first attempt to make predictions is to calculate four simple regressions.
Doing simple regressions does not take the possibility into consideration that the predictors may themselves be related to each other.
By drawing a 'line', called vector by mathematicians, through a multi-dimensional cloud of bivariate points in space, we adjust our predictions: Beer Intake = a + b1X1 (Money) + b2X2 (Stress) + b3X3 (Country) + b4X4 (College)
The coefficients b for the four predictor variables are then estimates of the pure predictive capacity of the four separate predictor variables.
This means that prediction contribution sizes are given that reflect the strength of the four predictors, given that no relationship existed among the predictors.
Please urgently note that if you want to use a nominally scaled variable in a regression analysis, you have to enter it into your equation as a dichotomous variable, juxtaposing one category with all others.
for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors.
Related questions
QUESTION
what is the DF for Chi Squared
QUESTION
What are p values used for?
QUESTION
sNout: This mnemonic indicates you would rule in or out the diagnosis?
QUESTION
In addition to telling us at what percentile a given score (or mean) falls, Z scores tell us:
Recommended textbook explanations
Elementary Statistics: Picturing the World
7th Edition
Betsy Farber, Larson
2,412 explanations
Essentials of Statistics
6th Edition
Mario F. Triola
2,243 explanations
Elementary Statistics Plus MyLab Statistics with Pearson eText -- Access Card Package
7th Edition
Betsy Farber, Ron Larson
2,412 explanations
Mylab Statistics With Pearson Etext -- Standalone Access Card -- For Elementary Statistics: Picturing The World ( Edition)
7th Edition
Betsy Farber, Ron Larson
2,412 explanations
Sets found in the same folder
stats final new material
30 terms
Ch 7 Quiz HESC 349 Stats
20 terms
Stats Exam Chapter 3
13 terms
SPSS
18 terms
Sets with similar terms
Statistics I Final Terms/Concepts
83 terms
Hobby Statistics Final
71 terms
Exam 2
28 terms
test #2
48 terms
Other sets by this creator
Part 3: Preparation of statement of cash flows
18 terms
French
63 terms
Financial Markets: Options
20 terms
Fin Accounting
7 terms
Other Quizlet sets
f
30 terms
EMT Crash Course Ch. 25 (Musculoskeletal Injuries)
17 terms
Week 2 - IV Meds (Intermittent Infusion Sets and M…
18 terms
Verified questions
PROBABILITY
Twenty individuals consisting of 10 married couples are to be seated at 5 different tables, with 4 people at each table (a) If the seating is done “at random,” what is the expected number of married couples that are seated at the same table? (b) If 2 men and 2 women are randomly chosen to be seated at each table, what is the expected number of married couples that are seated at the same table?
PROBABILITY
Let X denote the number of patients who suffer an infection within a floor of a hospital per month with the following probabilities: $$ \begin{matrix} \text{x} & \text{0} & \text{1} & \text{2} & \text{3}\\ \text{P(X = x)} & \text{0.7} & \text{0.15} & \text{0.1} & \text{0.05}\\ \end{matrix} $$ Determine the following: (a) F(x) (b) Mean and variance (c) P(X > 1.5) (d) $P(X \leq 2.0)$
PROBABILITY
The roof of an apartment block is 130 above ground level. The car park beneath the apartment is 35 m below ground level. How high is the roof above the floor of the car park?
STATISTICS
Successful companies such as Amway, Herbalife, and NuSkin employ multilevel marketing (MLM) schemes to distribute their products. In MLM, each distributor recruits several additional distributors to work at the first level. Each of these first-level distributors, in turn, recruits distributors to work at the second level. Distributors at any level continue to recruit distributors, forming additional levels. A distributor at any particular level typically makes a 5% commission on the sales of all distributors at lower levels in his or her “group.” a. Elaine is a distributor for an MLM company. Elaine recruits six distributors to work for her at the first level. Each of these distributors recruits an additional five distributors to work at the second level. How many distributors are under Elaine? b. Refer to part a . Suppose each distributor at Elaine’s second level recruits seven third-level distributors, and each of these third-level distributors recruits five levelfour distributors. Now how many distributors are under Elaine? (Your answer should give you an insight into why some MLM distributors can earn $100,000 in commissions per month!)