Get ahead with a $300 test prep scholarship
| Enter to win by Tuesday 9/24
Research Methods: Chapter 5: Correlation and Correlational Research
Terms in this set (59)
A statistical association between two variables (X and Y) when the score values of X are associated with the scores or values of Y in a nonrandom fashion. It occurs in the context off discussing correlational research and distinguishing such research from experiments.
Scientific vs. Correlational Experiment Goals
When scientists conduct experiments, their goal is to examine cause-effect relations between variables. When they conduct correlational research, their ultimate goal is to understand potential cause-effect relations as well.
Scientific Simple Research Experiment Steps
In the simplest form of an experiment, they researcher takes the following steps:
and independent variable, X.
a dependent variable, Y.
whether the different conditions of X have produced differences in Y.
Attempt to eliminate confounding variables
by controlling the experimental environment.
Extraneous factors that systematically vary with X and influence Y. They are undesirable because they provide potential alternatives to the conclusion we wish to draw, namely, that X influenced Y.
, involves examining potential associations between naturally occurring variables by measuring those variables and determining whether they are statistically related. It is extremely important to remember that in correlational research,
variables are measured, not manipulated
Correlational Research Experiment Steps
In a correlational study of two variables, X and Y, the researcher takes the following steps:
whether there is an association between X and Y.
Attempt to reduce the influence of confounding variables
and, where possible, through
special research designs
The 3 Possible Sources of Association in Correlational Research
X and Y are characteristics of the sample people
. Is there a correlation between peoples level of self esteem (X) and level of anxiety (Y), or between how stressed they feel (X) and how much they exercise (Y)?
X and Y are characteristics of different, but related, sets of people
. Is there a correlation between children's level of self esteem (X) and their parents level of self esteem (Y), or between workers job satisfaction (X) and their managers level of experience (Y)?
X is a personal characteristic and Y is an environmental characteristic
. Are people more likely to act aggressively (X) on hotter days (Y)? Is students satisfaction with college (X) related to the overall size of their institution (Y)?
Higher scores or levels of one variable tend to be associated with higher scores or levels of another variable. As X increases, Y also tends to increase; as X decreases, Y also tends to decrease.
Ex. There is a positive correlation between height and weight of adults: the taller the adult, the more they tend to weigh in general.
Higher scores or scores of one variable tend to be associated with lower scores or levels of another variable. Scores on X and Y tend to move in opposite directions: As X increases, Y tends to decrease. As X decreases, Y tends to increase.
Ex. There is a negative correlation perceived social support and psychological disorders, such that people who report having less social support tend to report more psychological distress.
Even with a simple set of data, mere visual inspection is inadequate to precisely judge how strongly the variables are correlated. Moreover, in real research, the data are usually not so straightforward; there may be hundreds or thousands of participants, and researchers may examine correlations among many variables. Fortunately, statistics and graphs help us identify correlations.
The Pearson Product-Moment Correlation Coefficient (
A statistic that
measures the direction and strength of the linear relation between two variables
that have been measured on an
interval or ratio scale
. It can range from values of +1.00 to -1.00, with the plus or minus indicating a positive or negative correlation. The strength of the correlation is reflected by the absolute value of the coefficient. The closer the absolute value of a correlation is to 1.00, the stronger the relation. Researchers compute Pearson's r when they assume that both X and Y have been measured on an interval or ratio scale.
In psychology, the typical approach for reporting correlations is to present positive correlations with a plus sign and to use a minus sign to indicate a negative correlation.
Spearman Rank-Order Correlation Coefficient (
A statistic used to
measure the relation between two quantitative variables
when one or both variables have been measured on an
(i.e., the scores represent ranks).
A Computed Correlation
When a correlation is computed, the way in which the measured scales have been coded affects whether the statistical analysis yields a plus or minus sign for that coefficient. To avoid confusion, researchers will code their variables initially, or recode them prior to analyzing them, so that higher values for each variable conceptually reflect a greater amount of the underlying attribute.
The way a researcher conceptualizes a variable affects whether a correlation emerges as positive or negative.
Interpreting the Strength of a Correlation
Although there is no universally agreed-upon standard upon nomenclature for labelling correlations,
(1988) provides an often cited guideline for judging whether associations between variables are small, medium, or large size. He proposes that the absolute values of Pearson's r of
0.10 to 0.29
represents a small association,
0.30 to 0.49
reflect a medium size association, and
0.50 to 1.00
represent a large association. Still, as Cohen notes, these guidelines are not set in stone, and it is arbitrary to say that the 0.01 difference in strength can suddenly transform a small association to medium one for example.
A field of psychology that examines the statistical properties of psychological tests. When assessing the reliability of a new personality test by having the same people take the test twice, 2 weeks apart, psyshometricians would be unlikely to consider a test-retest correlation coefficient of 0.50 to indicate strong reliability. Rather, it would indicate that the test needs further work to enhance its reliability.
Absolute Value of a Correlation
Although the absolute value of a correlation rejects strength, it
be interpreted directly as representing the "percentage" that two variables are related.
Recall that there are different ways to measure variability. One measure is called the
. For a set of scores, the variance takes into account how far the scores are spread apart from their mean.
Pearson's r & Variance
An important aspect of Pearson's r is that when you square the correlation coefficient, the resulting (r^2) represents the proportional variance in Y that is accounted for by the variance in X.
The Coefficient of Determination
This value, r^2, is called the coefficient of determination. Another way to think of it is that for two variables, X and Y, the coefficient of determination informs us about the extent to which difference among the X scores predict (statistically account for) differences among the Y scores, based on the linear relation between the two variables. The stronger the correlation between X and Y, the bigger the slice of Y's variance accounted for by the variance of X. The coefficient of determination helps us predict how differences in numerical values of correlations actually translate into larger differences in the ability of one factor to account for variability in another factor.
A scatter plot (also called a
) is a graph in which data points portray the intersection of X and Y values. The numerical values of one variable are represented along the horizontal axis (x-axis), and the numerical values of the other variable are represented along the vertical axis (y-axis). Each point on a scatterplot represent the intersection of a pair of values. When a correlation is perfect, the pairs of X and Y scores form data points that fall precisely onto a straight line, sloping upward from left to right (r = +1.00) or downward from left to right (r = -1.00). When there is no correlation (r = 0.00), variation in X scores don't correspond to variations in Y scores in a linear fashion, and the overall pattern of data points has no upward or downward slope. In between values of r = 0.00 and +/- 1.00, a correlation becomes stronger as the data points increasingly converge onto a sloped straight line. Scatter plots are valuable because they provide a visual feel for the data and can reveal nonlinear relations between variables that the Pearson r statistic cannot detect.
Correlation and Causation
Correlation does not establish causation.
The Three Key Criteria Used in Drawing Causal Interference:
Covariation of X and Y
. As X changes, Y changes.
. Changes in X occur before changes in Y.
Absence of plausible alternative causes
. Other than the changes in X, there are no changes in other factors that might reasonably have reduced the changes in Y.
Correlation and Temporal Order
, by definition means that X and Y covary. But when we examine the criterion for temporal order, we run into a problem. Remember that in a
, the researchers measures the variables but does not manipulate them. The often creates conceptual ambiguity about the temporal order of variables X and Y.
The Bidirectionality Problem
Also called the
two-way causality problem
: there is ambiguity whether X has caused Y or Y has caused X. Moreover, it is also possible that each variable influences each other. Although the bidirectionality problem is common in correlational research, in some correlational studies it can be ruled out on logical grounds.
Seasonal Affective Disorder (SAD)
A form of depression that is associated with the changing seasons of the year. Most typically, people who suffer form SAD develop more symptoms or more intense symptoms of depression in the fall and winter. As spring and summer return, their symptoms lessen. Thus, among SAD sufferers, for any 12-month period there is a strong negative correlation between the average number of minutes of daylight per month (X), and the intensity of their depression (Y): the fewer the minutes of monthly daylight, the more intense the depression. Here, we can safely assume that Y cannot possibly cause X: As SAD sufferers' degree of depression changes, this does not cause the amount of monthly daylight to change. So can we now confidently conclude that X (changes in monthly daylight) is the cause of Y (changes in the intensity of depression) among people with SAD? Unfortunately NOT, as we will later see why.
The Third-Variable Problem
A third variable
, may be the true cause of why X and Y are related. In a study, if we fail to measure Z, then all we see when analyzing the data is the association between higher scores on X and Y. Thus, X and Y may be related statistically, but there may be no causal link between them. In this case, the relation between X and Y are
(not genuine), and their correlation is often called a
. In practice, third variables are often postulated to cause both X and Y. Potential third variables should always be considered when interpreting correlational findings.
Can we Gain a Clearer Causal Picture?
Scientists who conduct correlational studies often attempt to reduce the causal ambiguity that accompanies the correlational method through statistical analyses and research designs. These two approaches can also be combined.
Statistical Approaches and Correlational Studies
In correlational studies, the most common way that researchers try to remove the influence of potential third variables is by measuring them when the data are collected and then statistically adjusting for those variables in the data analysis.
in which a correlation between variable X and variable Y is computed while statistically controlling for their individual correlations with a third variable, Z. In this approach, the correlations between X and Z, and between Y and Z, are statistically filtered out of the analysis between X and Y. The more strongly Z correlates with X and Y, the more likely it is that the strength of the correlation between X and Y will change considerably after their shared association with Z is statistically controlled. In many cases, after a partial correlation analysis is conducted, what originally was a statistically significant simple correlation between X and Y may be reduced enough in strength so that it is no longer statistically significant.
Drawing Causal Conclusions from Correlational Data
In trying to address the problem of drawing causal conclusions from correlational data, partial correlations are merely the tip of the iceberg. There are more advanced statistical techniques that build on the principle of partial correlation and can examine complex patterns of correlated variables. The key point is that researchers use these statistical techniques, such as
structural equation modelling
, to test hypotheses about possible causal links among sets of correlated variables.
Cross-Sectional Research Design
In a correlational study that employs this research design (which is also called a
one-shot correlational study
), each person participates on one occasion, and all variables are measured at that time. In such research, the bidirectionality problem typically sticks out like a sore thumb because X and Y are measured at the same time.
Longitudinal Research Design
In a correlational study of this research design, data is gathered on the same individuals or groups on two or more occasions over time.
A type of
in which variable X is measured at an earlier point in time than Y, thereby reducing the likelihood that variable Y is the cause of variable X. Used in fields such as clinical psychology to identify personality traits, lifestyle habits, and life experiences that may be risk factors for developing mental disorders or physical illnesses.
Cross-Lagged Panel Studies
that involves three steps:
1. Measure X and Y at time 1.
2. Measure X and Y again at time 2.
3. Examine the patterns of correlations among X1, Y1, X2, and Y2.
Combining Statistical and Design Approaches
In correlational research, the strategy of combining longitudinal designs with partial correlation analyses or related statistical techniques has become increasingly common. Not only does the bidirectionality problem remain because of the cross-sectional research design, but an even more pervasive problem persists regarding potential third variables. This problem applies to all correlational studies, even those that use special research designs and advanced statistical techniques to adjust for third variables.
Correlational Research as a True Experiment
In correlational research, even the more sophisticated statistical procedures and the best longitudinal designs do not provide the type or degree of control over potential confounding variables that experiments make possible. Moreover, although researchers may conceptually discuss some correlational variables as if they were independent variables, we need to keep in mind that
in correlational studies such variables are only measured, not manipulated as in a true experiment
Media Reports of Correlational Findings
High-quality reporting is not always evident in the news media. At times, the results of correlational studies are presented in ways that give the public a false impression that a causal relation was found. Whether you are reading scientific articles, writing a report, digesting the news, or being exposed to commercial advertisements, always appreciate that a correlation between X and Y may suggest the
of a causal relation. At the same time, keep the bidirectionality and third variable problems in mind and remember the cardinal rule:
By itself, correlation does not establish causation
Correlation and Prediction
One of the most important properties of correlation s that if two variables are correlated, knowing the scores on one variable helps us to predict the scores on the other variable. Correlation enables prediction even when no causal relation between the two variables is assumed. The stronger the correlation between two variables, the more accuracy we gain in predicting one from the other. If two variables are not correlated, then there is no predictive advantage.
Regression Analysis (or Simple Linear Regression)
Regression analysis (also called
simple linear regression
) explores the
, linear relationship between two variables, and is often used to predict the scores of one variables based on the scores of another variable. Using statistical software programs to perform regression analyses provide us with a
. Using the software we may find that we obtain a strain line representing a perfect correlation between X values and
Y value (as X values were used to create Y values). This straight line is called a
The Regression Line
In essence, the regression line is a visual representation of the regression equation: It represents the
overall best fit between X and Y
. "Best fit" means that according to a certain statistical criterion, this line does the best possible job of most closely fitting the overall patter of the data points in the
. The stronger the correlation between X and Y, the better the overall fit. When two variables are perfectly, linearly correlated, the actual data points will fall exactly onto the regression line. Be aware that, in practice, the process of developing a regression equation based on one sample, and then applying the equation to other samples, is more complex than portrayed here.
Criterion Variable (Y)
The variable that we are trying to estimate or predict. Designated as
Predictor Variable (X)
A variable whose scores are used to estimate the scores of a criterion variable. Designated as
A type of analysis that explores the linear relation between one variable and a set of two or more other variables. We use computer software to determine a regression equation. As the number of predictors grows, it becomes increasingly important to have a large sample of data.
Key Concept About Multiple Regression
To be retained in the final regression equation, each new predictor must enhance our ability to estimate Y beyond what can be achieved by the other predictor variables already in the equation.
Using Two or More Predictors
It often happens that a predictor will correlate significantly with the criterion variable on its own, but after being entered into a multiple regression analysis, it is found not to add a statistically significant amount of new information to the prediction of Y. In this case, the predictor would be dropped from the final regression equation.
Multiple regression is also used extensively in Basic Research
Benefits of Correlational Research
Although correlational research does not permit clear causal conclusions, it provides many benefits. Correlation
, and plays an important role in the
development and validation of psychological tests
Prediction in Daily Life
There are many practical applications in which correlational findings are used to predict behaviour. Using standardized tests such as SAT and ACT to help select students for college admission is one example. Many organizations invest heavily in commercially available tests or in developing their own personnel selection tests; these are used to select candidates who are most likely to succeed on the job. Psychological tests and other measures are used to diagnose mental disorders, assess impairments following brain injury, and so forth. These data lead to recommendations, essentially predictions, about the most effective method of addressing issues. The usefulness of diagnostic tests in such individual cases rests in part on broader correlational research demonstrating that test scores do in fact help predict peoples real-world functioning or other outcomes.
Making predictions based solely on statistical data.
The test should be able to predict relevant criteria regarding the situation it is being used in.
Correlation establishes a basis for prediction, and
correlational research is the primary approach for establishing the criterion validity of tests
. For many types of tests,
is an essential quality. There must be evidence that a test truly measures the particular construct it is claimed to measure. For a test to be valid, it must also be reliable:
It must yield consistent measurements
. Correlational research plays a central role in establishing the construct validity and reliability of a test. One strategy for establishing test reliability is to see how a new test correlated with other, already validated tests that measure the same attribute. One strategy for establishing test reliability is to examine how well items or subsets of items within the test correlate with each other or with overall test scores.
Correlational Research and Highly Controlled Experiments
Correlational research can be employed in circumstances where, due to practical or ethical considerations, highly controlled experiments cannot be conducted. To explore the questions of the experiments, correlational studies offer a great advantage because they involve measuring variables rather than manipulating them. This does not mean that correlational studies are easier to conduct than experiments or the throne ethical issues never surface in correlational studies. This simply is not the case. Correlational studies and experiments run the gamut form being extraordinarily difficult and time consuming to design and execute, to being relatively straightforward to conduct.
Hypothesis and Model Testing
Although clear causal inferences cannot be made from correlational research, correlational findings do indicate whether the obtained associations between variables are
consistent with the possibility that a cause-effect relationship might exist
. Thus, correlational research can provide information about whether a hypothesized causal model is more or less plausible than an alternative causal model. By using longitudinal designs and statistically filtering out the potential influence of several plausible confounding variables, the researchers increase the likelihood that their correlational findings reflect an underlying causal relation. Conditions for establishing a correlation between two variables are less rigorous than those needed to establish casualty. Constant failures to find even a correlation between two variables casts doubt on the possibility that a causal relation between those variables exists.
Convergence with Experiments
Because correlational studies reveal potential cause-effect links between naturally occurring variables, they can provide an impetus for researchers to conduct experiments that subsequently examine such possible causal links under more controlled laboratory conditions. When experiments uncover causal relations between independent and dependent variables, correlational research can help establish the external validity of those experimental findings by examining whether the variables in question are at least correlated under naturally occurring conditions. Thus, there is often a synergistic relation between these two research methodologies, and a convergence of correlational and experimental findings increases our confidence that scientists have identified a causal relation that applies to real-life circumstances.
Special Issues that Affect the Measurement of a Relation
Many factors affect the measurement of a relation between two variables. Three factors are:
Nonlinear relations, range restriction, and associations involving qualitative variables
Pearson's r measures the degrees of linear relation between two variables. Some variables, however, have a nonlinear fashion. In such cases, Pearson's r will
or possibly fail to detect the relation between two variables. A nonlinear relation on a scatter plot is called an
In correlational research,
occurs when the range of scores obtained for a variables has been artificially limited in some way. They can lead to erroneous conclusions about the strength, direction, and linear versus nonlinear nature of the relation between two variables. When a relation is linear, range distraction can distort its strength.
Associations involving Qualitative Variables
The concept of linear relation (i.e., of positive and negative correlation) does not apply.