Home
Subjects
Explanations
Create
Study sets, textbooks, questions
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Statistics Final
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (124)
Sample
The part of the population about which you actually have information. The symbol is n.
Population
Entire set of things of interest. The symbol is N.
Simple Random Samples
Method choosing a sample in which each individual in the population has an equal chance of being selected.
Stratified Sampling
Stratified Sampling is different. With this technique, we separate the population into groups or strata based on some characteristic, and then take a proportional simple random sample from each group
Systematic Sampling
A systematic sample is obtained by selecting every kth individual from the population. The first individual corresponds to a random number between 1 and k.
Cluster Sample
Cluster sampling is used when the population is already naturally broken up into groups (clusters), and each cluster represents the population. That way we can just select certain clusters.
Haphazard Selection or Convenience Sampling
Method of selecting a sample of individuals to study by taking
whoever is available or happens to be first on a list. This method of selection can result in a sample that is not
representative of the population
Probability
The likelihood that an outcome of interest will occur
Steps for Figuring Probability
1. Determine the number of possible successful outcomes (outcomes of interest).
2. Determine the number of all possible
outcomes.
3. Divide the number of possible successful outcomes by the number of all possible outcomes.
p
p is a symbol for probability
Range of Probabilities
Probability cannot be less than 0 or greater than 1.
Descriptive Statistics
Used to summarize and describe a group of numbers from a research study
Inferential Statistics
Procedures for drawing conclusions based on the scores collected in a research study but going beyond them
Variable
Characteristic or condition that can have different values ( e.g., sex, race, birth order, temperature, level of stress)
Value
Number or category that a variable can have (e.g., male/female, 0-10)
Data (Plural)
Measurements or observations of a variable
Datum (Singular)
A single measurement or observation
Nominal / Qualitative Variable
Variable that has values that are names or categories. No natural order.
Numeric / Quantitative Variable
Variable that has values that are numbers
Level of Measurement
1. Nominal
2. Ordinal / Rank-Order
3. Interval / Equal-interval
4. Ratio
Nominal Measurement
Variable in which values are categories with no natural order (e.g. gender, religion, ethnicity)
Ordinal/Rank-Order Measurement
Numeric variable in which values correspond to the relative
position of things measured (e.g., class standing, birth order, position in a race)
Interval/Equal-Interval Measurement
Numeric variable in which differences between values correspond
to differences in the underlying thing being measured. 0 does not mean absence of the thing being measured (e.g., temperature (oF or oC), ratings of mood)
Ratio Measurement
Same as interval EXCEPT 0 means absence of thing being
measured (e.g., runners' times in race, weights of 4th graders)
Central Tendency
Uses a single number to describe a group of scores. Defines the central point of the distribution. Central point defined differently for each measure of central tendency. Mean, Median, Mode.
Mean
Average value. Sum of all values divided by the total number of values.
Rounding Rule for Statistical Calculations
State your answers with one more decimal place of precision than is found in the raw data.
Population Mean
The symbol for the mean is µ
Sample Mean
The symbol for the mean is X(hat) or M
Median
The middle score when all of the scores are lined up from lowest to highest
Mode
The most common value(s) in a distribution
Comparing Measures of Central Tendency
The median is better than the mean or mode as a representative value when a few extreme scores would strongly
affect the mean but not the median.
Outlier
An extreme value that is much higher or much lower than the
others
Frequency
Number of scores with a particular value. (If 5 students reported that their level of happiness with their living arrangements in Doom Hall was a 2 on a 0-10 scale, the frequency for a rating of 2 would be 5)
Frequency Tables
A table displaying the pattern of frequencies over different values
Histogram
Graph of the information on a frequency table. The height of each bar is the frequency of each value in the frequency table
Frequency Distributions
Show the pattern of frequencies over the various values (how the frequencies are spread out)
Unimodal Distribution
A histogram with one very high area
Bimodal Distribution
A distribution with two fairly equal high points
Multimodal Distribution
A. distribution with two or more high points (a bimodal distribution is one type of multimodal distribution)
Rectangular Distribution
When all values have approximately the same frequency
The Normal Curve
Bell-shaped, unimodal, and symmetrical
Symmetrically Distributions
Have approximately the same number of values on both
sides of the distribution.
Skewed Distributions
Distributions where the scores pile up on one side of the middle. On the other side of the middle, the scores are spread out.
Positively Skewed Distribution
Tail is to the right
Negatively Skewed Distribution
Tail is to the left
Floor Effect
Scores pile toward the lower end of the distribution because it is not possible to have a lower score (e.g., number of children)
Ceiling Effect
Scores pile toward the upper end of the distribution because it is not possible to have a higher score (e.g., scores on a very easy statistics test)
Heavy-Tailed Distribution
There are many scores in the tails (the tails are thick).
Light-Tailed Distribution
There are few scores in the tails (the tails are thin)
Variability
How spread out the scores are in a distribution. Amount of spread of the scores around the mean. (Distributions with the same mean can have very different amounts of spread around the mean. Distributions with different means can have the same amount of spread around the mean)
Range
Highest Value - Lowest Value
Variance
Measure of how spread out a set of scores are
1. Find a central point
2. Find distances or deviations to that point
3. Add these and take the average (mean)
How to find Variance
To calculate the variance of a distribution:
1. Find the deviation score for each score. (Subtract the mean from each score (X-M))
2. Find the Squared Deviation score for each score (Square each of these deviation scores)
3. Find the sum of the Squared Deviations (Add up the squared deviation scores to get the sum of the Squared Deviations)
4. Find the average of the Squared Deviations (Divide the sum of the squared deviations by the number of scores/values to get the average of the Squared Deviations)
Standard Deviation
Return to the original scale. Most widely used way of describing the spread of a group of scores. The average amount the scores differ from the mean. To calculate the standard deviation: Take the square root of the variance
Z Scores
The number of standard deviations a value is above or below the mean of the scores in a distribution.
Hypothesis Testing
A systematic procedure for deciding whether the results of a research study supports a hypothesis
Hypothesis
A prediction about the outcome of a research study
Independent Variable
A variable that is manipulated or chosen for study
Dependent Variable
Is a variable that is measured or observed in a study - outcome variable
Null Hypothesis
H0. No effect; No significant difference; No relationship
Alternative Hypothesis
H1 - There is an effect; There is a significant difference, There is a relationship
The Possibility of Being Wrong
We believe that there is a difference when there is NO difference in the population (A). Occurs with probability = alpha (a). We believe that there is NO difference when there is a difference in the population
The Possibility of Being Right
We believe that there is a difference and there is a difference in the population. We believe that there is NO difference and
there is NO difference in the population.
Power
Probability that the null hypothesis is not true and we correctly reject the null. In other words, if there is a difference in the population, what is the chance that we will find it in our study?
What contributes to "power?"
1. Sample Size (n) Larger n means more power
2. Effect size (Cohen's d) Larger effect means more power (Difficult to increase effect size)
3. α (.05 or .01) Larger α means more power (α is usually fixed - do not usually increase this)
High Power
Greater chance of getting significant result
Low Power
The study might be a waste of time
Effect Size
Amount that two populations do not overlap
How to change Effect Size
To increase effect size / amount of non-overlap
1. Increase the size of the effect - larger mean differences
2. Decrease the variability (standard deviation)/decrease the differences between people in the same group
Sampling Distribution of Means
The sampling distribution of means is another type of probability distribution. We obtain multiple samples, all with the same sample size (n) and plot the mean for each sample on a probability (frequency) distribution.
Central Limit Theorum
The sampling distribution of means will be normal even if the distribution being sampled is not, as long as the sample size is large enough
Confidence Intervals
A range of values likely to contain the true value of the
population mean
95% Confidence Interval
If we obtain many samples and construct confidence intervals for each of them, 95% of the confidence intervals will contain the population mean. We can be 95% confident that our interval contains the population mean. This translates to Z scores from -1.96 to 1.96 on the distribution of means.
99% Confidence Interval
This translates to Z scores from -2.58 to 2.58 on the
distribution of means.
Z Test
Hypothesis test used to compare a sample mean to a population mean when the population variance is known
T test for a single sample
Hypothesis test used to compare a sample mean to a population mean when the population variance is unknown
Variance Problem
A sample's variance will, on the average, be smaller than its population's variance. Need a sample variance that is adjusted to be
larger than the population variance
Degrees of Freedom
Number of scores that are "free to vary". If all the deviation scores but one are known, the last score can have only one value
T test Distributions
There is one t distribution for each number of degrees of freedom. The greater the number of degrees of freedom, the closer the t distribution is to the normal curve
Interpreting t Tests
Alpha versus p versus Sig.
1. Alpha is our a priori or predetermined cutoff level (Often 0.05 or 0.01)
2. p (or Sig. in SPSS) is the probability of obtaining a test statistic at least this extreme if the null hypothesis were true.
T Test for Dependent Samples / Means
Two scores for each person. Repeated measures design. Comparing each person to himself/herself. Same procedure as t test for single sample, except use difference scores and assume that the mean of the population difference scores is 0. Sample distribution of the difference scores/values may not be normal. Assume that the population distribution of the difference scores/values is normal
T Tests for Dependent Samples Variance
1. Calculate the DIFFERNCES between each pair of scores and the mean of the DIFFERENCE scores. You NO longer need the original data.
2. Calculate the variance for the difference scores.
3. Calculate the standard error of the difference mean (SEM) or the standard deviation of the distribution of the difference means
4. Calculate the t score
t Test for Independent Samples
One score for each person. Comparing two groups of people. Each set of scores is from an entirely different group of people (e.g.Treatment versus Control groups. Males versus Females) Same procedure as t test for dependent samples, except calculate differences between group means, have two separate variances - one for each group, sample sizes for the two groups may be different
Assumptions of the t Test for Independent Samples
The population distributions are normal. The two populations have the same variance. The scores/values in both groups are independent of each other
t Test Limitations
Multiple t tests in the same study increases the probability of a Type I error (increases α) We can adust for this problem by setting a lower α
Analysis of Variance (ANOVA)
Where does the difference lie? Is it between each of the majors? Is it between one major and the other majors? Analysis of variance. ANOVA
Types of Variance in ANOVA
Between-groups (Between-treatments) Variance
Within-groups (Within-treatments) Variance
Between-Groups Variance
Variability that results from differences between the sample means for the levels of a factor
Within-Groups Variance
Variability within each sample - within each level of a factor
Significance of ANOVA
If there is a difference SOMEWHERE between our groups
Tukey HSD
Tukey's Honestly Significant Difference Test. Gives a critical cutoff difference for comparing the difference between group means. If the differences between two of the group means is higher than the cutoff for the specified α, that difference is significant
Relationship between F (ANOVA) and t tests
For 2 independent samples, either t or F can be used. Always result in same decision. F = t squared
Two extensions of ANOVA
Repeated Measures ANOVA
Factorial ANOVA
Repeated Measures ANOVA
Comparable to t-test for dependent means. Used with repeated measures design
Factorial ANOVA
Used when there is more than one independent variable/factor
ANOVA (F) and t Test Interpretation (And F & T Scores)
If the calculated F or t score is more extreme than the tabled value. (Statistically significant. Reject the null hypothesis.)
If the calculated F or t score is NOT more extreme than the tabled
value. (NOT statistically significant. DO NOT reject the null hypothesis.)
ANOVA (F) and t Test Interpretation (And P-Value)
If the observed p-value (Sig. value) is less than alpha. (Statistically significant. Reject the null hypothesis.). If the observed p-value (Sig. value) is NOT less than alpha (NOT statistically significant. DO NOT reject the null hypothesis.)
Primary use of correlation
Examining two quantitative variables to determine if there is a linear relationship between them
Pearson Correlation Coefficient
What is different about this statistic?
1. Two continuous variables
2. Used for linear relationships - not differences
3. The two variables are not labeled as independent or dependent
Assumptions
1. The individual cases/persons are independent
2. Populations from which sample is selected must be normally distributed for both variables
3. Variables must be linearly related
Correlation does not imply causation
Any time you do NOT manipulate the independent variable,
you cannot assume causation. So, why make a big deal out of it for correlation? You have no independent variable or "predictor" with
correlation. So, you have no idea which variable "caused" the other. Correlation is UNIQUE since it can NEVER be used to determine causation
Scatterplot
One variable on X axis, one variable on Y axis
Positive Correlation
Both variables tend to increase (or decrease) together.
Negative Correlation
The two variables tend to change in opposite directions, one increases while the other decreases.
Measuring a Correlation
Statisticians measure the direction and strength of a correlation with a statistic called the correlation coefficient, represented by the letter r
Coefficient of Determination
Proportion of variance accounted for by the relationship. r squared. If have perfect relationship then it is 1 or 100%. If have no relationship, then it is 0 or 0%
Correlation and Regression
Very similar. Still have two continuous variables. Still focused on relationships / associations. Similar assumptions. Correlation gives us the direction and strength of the relationship. Regression allows us to predict one variable from the other
How?
1. Label one variable as Independent (X) and one variable as
Dependent (Y)
2. Find a straight line through the data that provides the best fit for that data -- Best fit line
3. Find the equation for that line
Regression Line
For any independent variable x, we use the line to
determine what our predicted dependent variable y will be
Least Squares Error
1. Mathematically determine the solution where the distance between Y and Ŷ (Y hat - our predicted Y) is the smallest for all the data points.
2. This distance between the data point and the line represents the error
3. Some distances will be positive and some negative.
4. We square all of these values to obtain positive values.
Types of Regression
Different types of Regression depending on the research question
Simple Linear Regression
Assumptions
1. The individual cases/persons are independent
2. Populations from which sample is selected must be normally distributed for both variables
3. Variables must be linearly related
4. The errors have certain characteristics (are "nice enough")
Simple Linear Regression Slope
Slope and error are critical pieces for regression. Strongly related to Correlation Coefficient (r). As r goes from 0 to 1 or 0 to -1, slope increases and error decreases
Correlation Coefficient and Simple Linear Regression
If your correlation coefficient is significant, your slope will be significant and your regression will be significant. If your regression is not significant, should not be used for prediction
Simple Linear Regression Significance
Analysis of Regression. Similar to Analysis of Variance. Uses an F-ratio of two Mean Square values. Each MS is a SS divided by its df
Issues with Simple Linear Regression
Nonlinear Relationships cannot be interpreted. Restricted Range.
Extrapolation
Do not attempt to make predictions beyond the bounds of your
data. (e.g. A five year old is predicted to be ~ 42.3 inches tall. A fifteen year old is predicted to be ~ 66.2 inches tall. A seventy five year old is predicted to be ~ 209.6 inches or about 17.5 feet tall)
Restricted Range
Imposition of conditions by a researcher which limit the whole range of scores having been collected to a very constrained fraction of the total for observation.
Chi-Square Test
One qualitative variable, and want to compare which group people are most likely to belong to (Chi-Square Test for Goodness of Fit (e.g., are volunteers for psychology experiments more likely to be male or female))
Two qualitative variables, and want to know if they are independent of each other (Chi-Square Test for Independence (e.g., is gender independent of color preference?))
Expected Frequency
The frequency value that is predicted from Null Hypothesis and the sample size.
Observed Frequencies
In a sample of data, individuals in each category are counted
Chi-Square Test Hypothesis
Do not reject null if the discrepancy between the Observed and Expected values is small. Reject null if the discrepancy between the
Observed and Expected values is large
Distribution and Degrees of Freedom
All chi-square values ≥ 0. Chi-square distribution is positively skewed. Chi-square has different distributions depending on df
Critical region for a Chi-Square Test
Significance level is determined. Critical value of chi-square is located in a table of critical values according to. Value for degrees of freedom (df). Significance level chosen
Chi-Square Test for Independence
You have different types of participants (e.g., males / females, people with different majors). You want to know whether which type of participant they are is independent of some other variable (e.g., is gender independent of color preference?)
Sets with similar terms
Hobby Statistics Final
71 terms
Statistics I Final Terms/Concepts
83 terms
Psych Statistics
190 terms
statistics
100 terms
Other sets by this creator
Dinosaurs
59 terms
Dinosaurs
59 terms
Random Terms
10 terms
SAT Vocabulary
24 terms