119 terms

scatter plot

a graph of data points

line of best fit

approximates the trend in data

model

sometimes a line or an equation used to represent data

Stroop Test

correlates a person's perception of words and colors for a list

matching list

the color of ink matches the color of the word

non-matching list

the color of ink does not match the color of the word

median

the middle of a set of data

median-median line

a method for calculating the line of best fit using the median

least squares method

a method of calculating the line of best fit using the distance each point is from the line of best fit

Pearson product-moment correlation coefficient

a measure of how well the regression equation fits the data

r

the correlation coefficient that varies from 0 to +/- 1

regression equation

the equation found to represent a set of data

causation

when one event causes a second event

necessary condition

a correlation needed for causation

sufficient condition

a correlation does not show causation

quadratic regression

used to model quadratic data

If we use knowledge of SAT scores to predict his or her GPA. wHAT IS THE PREdicTOR AND WHAT IS THE CRITERION?

sat IS PREDICTOR AND GPA IS CRITERION

How do we translate S2y'?

The sample variance of the Y scores around the Y'.

When r=0.0, the Y-intercept is equal to?

the mean of all the Y scores in the sample

If we can claim to account for .65 of the vvariance in Y scores by knowing a relationship, it means that?

We are on average, 65% more accurate at predicting Y' scores than we would be if we did not know the relationship.

In general, the greater the proportion of variance accounted for...

the more accurately we can predict the behaviour

If heterodasticity is present Sy' will be?

greater than the actual average error in predictions of Y for some scores and less than the actual average error for other X scores

The regression line can be thought of as a series of points representing?

all the possible Y' values associated with all possible X scores

Standard error of the mean is defined as?

Average spread of actual Y scores around the predicted Y scores

Linear regression is defined as the procedure for determining?

the best-fitting straight line in a linear relationship

When we square hte correlation coefficient to produce r2, the result is equal to the?

proportion of variance accounted for

The Y-intercept of a line is the?

value of Y at the point where the regression line crosses the Y axis

Suppose you have several different predictor variables and one criterion variable. all your variables are measured using interval or rations scales. What is the appropriate statistical test to use?

Multiple regression

The absence of random assignemnt in any study allows for what?

potential confounding

The absolute value of a correlation coefficient indicates the?

strength of the relationship

We should always draw a scatterplot of the data when we compute a correlation because hte scatterplot allows us to?

see the nature of the relationship between the two variables

The best-fitting straight line through a scatterplot is known as the?

regression line

When your scale correlates with other procedures or scales that are valid, it has__________ validity ?

Convergent

When your scale does not correlate with other unrelated procedures or scales it has ________validity?

discriminant

When the relationship between two variables is high (for example, r=.98) the variability in the Ys at each X is ____________ realtive to the overall variability of Y scores in the sample.

smaller

In general, a positive linear relationship means that?

as the values of one variable increase, there is a tendency for the values of the other variable to also increase.

Suppose you find a restriction of range in your study of IQ scores and school achievement at school. Restricting the range is likely to _____ the correlation coefficient.

decrease the size of

Whe consistency of participants responses to the same test at two different times determines?

test-retest reliability

The consistency of participant response on different versions of the same test determines?

split-half reliability

If we plot a scatterplot, and the data points form a shape that appears to be random dots and is far from forming a slanted straight line as possible, the correlation for the data is?

0.0: there is no relationship

THe defining formula for the Pearson correlation coefficient shows that it is the?

average correspondence of paired X and Y z-scores

Predictive validity

Extent to which a procedure is correlated with future behaviour

Concurrent validity

Extent to which a procedure is correlated with present behaviour

What procedure would be used to find out whether there is a relationship between SAT scores and GPA?

The Pearson correlation coefficient

The best-fitting line through a scatterplot is known as the?

regression line.

In general a positive relaitonship means that?

As one variable increases the other variable also increases

We should always draw a scatterplot of the data when we compute a correlation because it alows us to see?

the nature of the relationship between the two variables

r2

coefficient of determination

Linear regression is defined as?

the best fitting straight line in a linear relationship

In the fomula Y' what does Y" stand for {Y'= (b)(x) + a}?

predicted Y score

In this formula,{Y'= (b)(x) + a} what does the "a" stand for?

the value of Y that hits the Y axis

Define the Standard error of the estimate

the average spread of Y scores around predicted Y scores

What value of "r" would yield the smallest Sy'(standard error)?

the highest numbered "r"

As the variability--differences--in Y scores at each X become larger, the relationship does what?

becomes weaker and results in a smaller correlation coefficient

Zero association means that?

No linear relationship is present

The larger the correlation coeficient (whether pos. or neg.), the stronger the relationship. Why?

The less the Ys are spread out at each X and the closer the data come to forming a straight line

What is another word for the degree of efficeincy in a relationship?

coefficient although it DOES NOT directly measure units of consistency

Define the purpose of computing a correlation coefficient.

Statistical technique for demonstrating the reliability and the validity of a measurement procedure in any experiment or correlational design.

What are the types of reliability that a correlation coefficient is used to show?

test-retest, inter-rater, split-half

inter-rater reliability

the consistency of ratings by any two raters

test-retest reliability

Test in which participants receive the same score when tested at different times

How high does a coefficient have to be in order to be considered reliable?

+.80 or higher

Face validity

Procedure is valid because it looks valid/Extent to which a measurement procedure appears to measure what it was intended to measure

Convergent Validity

Extent to which scores obtained from one procedure are positively correlated with scores obtained from another procedure that is already accepted

Discriminant validity

Extent to which scores obtained from one procedure are not correlated with scores from another procedure that measures OTHER variables or constructs.

Criterion validity

Extent to which a procedure correlates with a behavior.

Concurrent validity

Extent to which a procedure correlates with an individuals current behavior

Predictive validity

Extent to which a procedure correlates with an individuals future behavior

What is the range of a coefficient?

0-+/-1.0

What is the most common method of correlation coefficient?

Pearson correlation coefficient

Define the Pearson correlation coefficient

Corelation coeffieccient that describes the strength and type of a linear relationship between interval and ratio variables, symbolized by r.

Define the Spearman Rank order coefficient

The correlation coefficient that describes the linear relationship between pairs of ranked scores (ex: any two ordinal variables OR tied rank variables, symbolized by Rs

Tied rank variables

occcurs when two aprticipants receive the same ranking score in SPearman's rank coefficient, resolved by averaging the score and assigning it to both participant to correlate their scores.

Point biserial correlation coefficient

Describes the linear relationship between the scores from one continuous variable and one dichotomous variable (ex: correlating male/female with interval scores from a personality test).Can be used for one continuous interval or ration and one dichotomous, symbol is Rpb.

How does a restricted range affect a correlation coefficient?

reduces the accuracy, producing a smaller coefficient than if hte range were not restricted and leads to an underestimate of the degree of association between the two variables. Avoiding this increases power.

Why is the correlation coefficient important?

It is one number that allows us to envision and summarize the important information in a scatterplot, in terms of it's strength and direction.

what does a horizontal scatterplot, with a horizontal regression line indicate?

no relationship

The smaller the absolute value of the coefficient, the greater the ?

variability of the Ys at each X, the vertical width of the scatterplot, and the less accurately Y scores can be predicted from X

How can the power of a correlational design be increased?

Minimizing error variance and avoiding a restricted range, so that thelargest possible coefficient is obtained.

If it passes through the proper inferential procedure, a sample correlation coefficient is used to estimate what?

the corresponding population correlation coefficient: r=p,Rs estimates Ps, Rpb estimates Ppb.

Define linear regression

THe statistical procedure for using a relationship to predict scores aka the statistic that summarizes the linear relationship.It produces the line that summarzes the relationship

How is Y' pronounced

Y prime

What does the symbol Y' stand for

a predicted Y score. Our best prediction of the Y score at a corresponding X

Define regression line

straight line that summarizes the linear relationship in a scatterplot by,on average, passing through the center of the Y scores at each X and it consists of the predicted Y score-the Y'-for every possbile X

Why is "r" computed first?

to determine if a relationship exists. If r=0 their is no relationship

What is the importance of linear regression?

It is used to predict a individual's unknown Y score based on his/her X score from a correlated variable. Usually more external validity and more accurate description of the relationship.USed to predict unknown Y scores based on X scores from correlated variable.

Linear regression equation [(b)(x) + a]

equation that creates the straight line by producing a value of Y' at each X, define sthe line that summarzies the relationship. Describes it's slope and Y intercept.

Linear regression equation to calculate regression line points for scatterplot

Y'=[(b)(x) + a]

Y intercept equation

a=mean of Y- (b) (mean of x)

Slope equation

b

coefficient of determination

r2

SEE (Sy) is acronym for

Standard error of estimate which is the standardized difference between predicted Y' and actual Y scores

How do you calculate proportion of variance accounted for?

r2 which is also known as "coefficient of determination"

When r=0, the standard erro of the estimate is at it's max. and that is equal to?

the standard deviation of all Y scores in the sample (Sy)

Stonger correlations produce what size SEE

smaller SEE

What does the equation r2 aka coefficient of determination aka proortion of variance indicate?

How important the realtionship is by comparing amount of error obtained using the regression equation for XY to errors without the regression equation for XY

what does Sy2 refer too?

Describes the error variance when using regressinon to predict Y scores, measures error in prediction.

Sr'

Standard error of estimate

Sr' definitional formula/average error

subtract Y' from Y and square each deviation/divide by N then find hte square root of that to get the error of the estimate

proportion of variance

is the amount we reduce errors in predicting Y scores when we use the relationship, compared too if we did not. Equals r2

a=

y-intercept

Y intercept

value of Y when it corsses the Y axis

Y' is the predicted Y score for what?

the corresponding X

The differences (and error) between Y and Y' is also summarized by what?

the variance of the Y scores around Y' (S2y)

If there is a large R there is a week or strong relationship?

stronger the relationship and a small value of Sy and S2y, because the Y scores are closer to Y', thus the smaller difference between Y and Y'

When r=0 what doe Sy and S2y equal?

Sy and S2y equal each other

When R= +/- 1 how much is the eror in predictions

Zero error and Sy' equals zero.

another term for r

Is the correlation coefficient

Proportion of variance accounted for indicates what?

The importance of a relationship

heteroscedasticity

An unequal spread of Y scores around the regression line (that is around the values of Y')

Homodasticity

An equal spread of Y scores around the regression line (that is the values of Y')

Symbol for Pearson correlation coefficeint

r symbol

Coefficient of alienation

1- r2

Sr

standard error ofthe estimate symbol

Sx

sample standard deviation symbol

S2x

sample variance symbol

sideways px

population standard deviation symbol

rs

Spearman correlation coefficient symbol

rpb

point-biserial correlation coefficient sign