Search
Create
Log in
Sign up
Log in
Sign up
225 Exam 2
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (76)
sampling frame
list of individuals from which a sample is actually selected
requires determining what criteria we're going to use to determine who is a member of a target population - operationalization of your sample
target population
sampling frame
sample
study results
make inference about target population
overall flow of sampling: 5 steps
probability sampling
a method used by researchers to select a representative sample in which every individual in the population has some probability of being selected as a respondent, probability of selection can be specified
(simple random, cluster, stratified)
non-probability sampling
a sampling technique in which there is no way to calculate the likelihood that a specific element of the population being studied will be chosen
blocking variables
categorical variables included in the statistical analysis of experimental data as a way of statistically controlling or accounting for variance due to that variable
randomized block assignment
randomize within "blocks" of your sample
e.g. within genders, ethnic groups
sampling a broader population increases external validity
focused samples increase internal validity
validity/sampling tradeoff
simple random sampling
every member of the population has an equal probability of being selected for the sample
cluster sampling
A probability sampling technique in which comparable clusters of participants within the population of interest are selected at random, population within the clusters are then randomly sampled
stratified random sampling
separation of the target population into different groups based on demographic characteristics, called strata, and then taking a random sample from each stratum
sampling without replacement
A member of the population may be chosen for inclusion in a sample only once. If chosen, the member is not returned to the population before the next selection. Probabilities do not remain constant.
Independent random sampling
Includes the conditions of random sampling and further requires that the probability of being selected remains constant for each selection, sampling with replacement
Advantages of non-probability sampling
- you can reach "hidden" populations, outliers
- possible when there is no sampling frame available (drug users, undocumented immigrants, etc.)
self-selected sample
members volunteer to participate, a form of convenience sampling, large sample size but not an accurate representation
convenience sampling
using a sample of people who are readily available to participate, likely to generate a nonrepresentative sample
purposive sampling
a biased sampling technique in which only certain kinds of people are included in a sample in line with study goals
expert sampling
A sample of people with known or demonstrable experience and expertise in some area
snowball sampling
recruitment of participants based on word of mouth or referrals from other participants
quota sampling
sample designed to mirror population characteristics or demographics, convenience sampling to create sample within each group
consent rate
the percentage of targeted subjects who agree to participate in a study. Should be above 70% to avoid self-selection bias
bivariate correlations
associations that involves exactly two variables
Pearson's r (Pearson product-moment correlation coefficient)
correlation coefficient for variables measured on an interval or ratio scale
Spearman's rho (Spearman rank-order correlation coefficient)
correlation coefficient when one or both variables measured on ordinal scale
covariance
A measure of linear association between two variables. We look at how much each score deviates from the mean, if both variables deviate from the mean by the same amount, they are likely to be related
correlation coefficient
a standardized version of the covariance
coefficient of determination
the square of the correlation coefficient; indicates the proportion of variance in one variable that can be accounted for by the other variable
contingency table
A data matrix that displays frequency counts for nominal variables; cross tabulation results. A way to examine a relationship but it is NOT a measure of a correlation.
expected frequency
the frequency expected in a category if the sample data represent the population
observed frequency
The number of observations of a data value in an experiment.
simple linear regression
scores on X can be used to predict scores on Y assuming a meaningful relationship (r) has been established between X and Y in past research
predictor
in regression analysis, the independent variable
criterion
in regression analysis, the dependent variable
equation for describing the regression line
statistical validity
the extent to which statistical conclusions derived from a study are accurate and reasonable (effect size, statistical significance, outliers, restriction of range, curvilinear)
effect size
a measure of the strength of the relationship between two variables or the extent of an experimental effect
statistical significance
a statistical statement of how likely it is that an obtained result occurred by chance
curvilinear association
the relationship between two variables is not a straight line, cannot be described with a correlation coefficient
third variable problem
the concept that a correlation between two variables may stem from both being influenced by some third variable
bidirectionality problem
ambiguity about whether X has caused Y or Y has caused X
partialing out
removing the influence of a variable from the association between other variables
multivariate designs
research designs involving more than two measured variables
cross-sectional correlations
in a longitudinal design, a correlation between two variables that are measured at the same time
autocorrelation
In a longitudinal design, the correlation of one variable with itself, measured at two different times.
cross-lag correlations
in a longitudinal design, a correlation between an earlier measure of one variable and a later measure of another variable. Helps in establishing temporal precedence.
1. Covariance
2. Temporal precedence
3. Internal validity
Longitudinal designs can provide some evidence for causation by fulfilling 3 criteria:
1. extent to which points are scattered around the line
2. Slope of the regression line
3. Y intercept (expected value for Y when X = 0)
3 things that must be known in order to make predictions using regression
Residuals
prediction errors, difference between actual and predicted value for DV
regression coefficients (beta weights, bet coefficients)
the equation coefficients that minimize the sum of the squared residuals (slope and Y intercept)
multiple regression
a statistical technique that simultaneously considers the influence of multiple explanatory variables on a response variable Y. Instead of a 2D line, this model fits a plane in 3D space.
standard error of the regression
Statistic that measures the variance of residuals in a regression analysis.
linear model
A hypothetical model of the relationship between two variables which follows a straight line and provides a means for predicting the value of one variable from another
Mean Square
Average squared error
Hierarchical Entry
Known predictors (based on past research) are entered into the regression model first. New predictors are then entered in a separate step/block. Based on researcher's discretion. It is the best method, but relies on the researcher to know what they're doing.
Forced Entry
All variables are entered into the model simultaneously. The results obtained depend on the variables entered into the model. It is important, therefore, to have good theoretical reasons for including a particular variable
Stepwise Entry
Variables are entered into the model based on mathematical criteria and then the computer selects variables in steps. Should only be used in an exploratory phase.
semi-partial correlation
Measures the relationship between two variables controlling for the effect that a third variable has on only one of the others.
Cook's distance
Measures the influence of a single case on the model as a whole (absolute values greater than one may be cause for concern)
Residual statistics and Influential cases (Cook's Distance)
two ways to assess the accuracy of the model in the sample
Residual statistics
In an average sample, 95% of standardized residuals should lie between +2 and 99% should lie between +2.5
Outlier
Any case for which the absolute value of the standardized residual is 3 or more
1. variable type (outcome must be continuous, predictors can be continuous or dichotomous)
2. non-zero variance
3. Linearity
4. Independence
Four assumptions for a linear model
b0
y-intercept of a linear model, the predicted y when x=0
homoscedasticity
For each value of the predictors the variance of the error term should be constant
Beta values
The change in the outcome associated with a unit change in the predictor (b1, b2, ..)
independent residuals
for any pair of observations, the error terms should be uncorrelated
method of least squares
process of fitting a mathematical function to a set of measured points by minimizing the sum of the squares of the distances from the points to the curve
multicollinearity
a situation in which predictor variables are highly correlated, causes problems in a regression model
SS total
variability between the scores and the mean
R
The correlation between the observed values of the outcome and the values predicted by the model
SS residual
variability between the model and the actual data (error in the model)
Adjusted R squared
an estimate of the R squared in the population (usually smaller)
SS model
difference in variability between the model and the mean (improvement due to the model)
b-values
the change in the outcome associated with a unit change in the predictor
R squared
SS model/SS total, the proportion of the variance accounted for by the regression model, square of the Pearson correlation coefficient
F statistic
Mean square of the model divided by the mean square of the residual -> improvement due to the model/error in the model
mean square
the sum of squares divided by the appropriate degrees of freedom
;