Upgrade to remove ads
Psychology Tests and Measurements Quiz 2
Terms in this set (83)
In classical test theory, an observed score on an ability test is presumed to represent the test taker
true score and measurement error
this variety of error has also been referred to as "noise". it is
Coefficient alpha is an expression of
the mean of all possible split-half correlations
Alternate forms demonstration ____________ carry over effects than the test-retest method
in the test-retest method to estimate reliability
The time frame between interviews must be relatively short
Which is TRUE of measurement error?
like error in general, measurement error may be random or systematic
This variety of error has also been referred to as "noise". it is
A wall street securities firm that is actually located on wall street is testing a group of candidates for their aptitude in finance and business. As the testing begins, an unexpected "occupy wall street" sit-in takes place. From a psychometric perspective in the context of this testing, the sit-in is viewed as
The more homogeneous a test is, the
more inter-item consistency it can be expected to have
A confidence interval is a range or band of test scores that
is likely to contain the true score
The standard error of measurement is
used to infer how far an observed score is from the true score, known as the standard error of a score, and is used in the context of classical test theory
Reliability, in a broad statistical sense, is a synonymous with
Which of the following is true of systematic error?
It has no effect on the reliability of a measure
as the degree of reliability increases, the proportion of
The total variance attributed to true variance increases
Why might ability test scores among test takers most typically vary?
both because of the true ability of the test taker and because of irrelevant, unwanted influences
A source of error variance may take the form of
item sampling, test takers reaction to environment-related variables such as room temperature and lighting, as well as test takers variables such as amount of sleep the night before a test, amount of anxiety or drug effects
Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or two people) on two different administrations of the same test?
a test-retest estimate
Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is presumed to be relatively stable over time?
Which of the following might lead to a decrease in test-retest reliability?
the passage of time between the two administrations of the test, coaching designed to increase test scores between the two administrations of the test, and to practice with smaller test materials between the two administrations of the test
What term refers to the degree of correlation between all the items on a scale?
Which of the following is usually minimized when using split-half estimates of reliability as compared with test-retest or parallel/alternate-form estimates of reliability?
time and expense
Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?
Assign easy items to one half of the test and difficult items to the other half of the test
Which of the following is, generally speaking, the preferred statistic for obtaining a measure of internal-consistency reliability?
Coefficient alpha is an expression of
the mean of all possible split-half correlations
A synonym for inter-score reliability
inter-judge reliability, observer reliability, and inter-rater reliability
which BEST conveys the meaning of an inter-scorer reliability estimate of .90?
90% of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error
Which of the following would result in the LEAST appropriate estimate of reliability for a speed test?
split-half from a single administration of the test
Item response theory (IRT) focuses on
individual items of a test
As the confidence intervals increases, the range of scores into which a single test score falls is likely to
As the reliability of a test increases, the standard error of measurement
In general, which of the following is TRUE of the relationship between the magnitude of the test-retest reliability estimate and the length of the interval between test administrations?
The longer the interval, the lower the reliability coefficient
The index that allows a test user to compare two people's scores on a specific test to determine if the true scores are likely to be different is
The standard error of the difference
By definition, estimates of reliability can range from
0 to 1
What type of reliability estimate is appropriate for use in a comparison of "Form A" to "Form B" of a picture vocabulary test?
What index of reliability would be BEST use to compare two evaluators' assessments of a group of job applicants?
The Kappa statistic
What type of reliability estimate is obtained by correlating pairs of scores from the same person on two different administrations of the same test?
The greater the proportion of the total variance attributed to true variance, the more __________ the test
A score earned by a test taker on a psychological test may BEST be viewed as equal to
The true score plus error
A goal of a test developer is to
maximize true variance
Most reliability coefficients, regardless of the specific type of reliability they are measuring, range in value from
0 to 1
Different types of reliability coefficients
May reflect different sources of error variance
The fact that young children develop rapidly in "growth spurts" is a problem when it comes to the estimation which type of reliability for an infant development scale?
Because of the unique problems in assessing very young children, which of the following would be BEST practice when attempting to estimate the reliability of tests designed to measure cognitive and motor abilities in infants?
use relatively short test-retest intervals
The direction of scoring a particular motor ability test instructs the examiner to "give credit if the child holds his hands open most of the time." because what constitutes "most of the time" is not specifically defined, directions such as these could result in lowered reliability estimates for
a school wants to determine if an assessment has reliability over time. They administer the same assessment at the beginning of a school year and at the end. The school is assessing________ reliability
Which of the following is a method to obtain a reliability coefficient?
test-retest, parallel-form, and internal-consistency
Which of the following is most likely an acceptable reliability coefficient for a standardized assessment?
Three raters compared their individual scores in order to assess ____________ reliability
A teacher includes three test items that assess the same concept. This teacher is attempting to assess _______ reliability of the instrument
ecological validity refers to a judgement regarding how well a test measures what it purports to measure
at the same time and place that the variable being measured is actually emitted
Each of the three approaches to validity assessment in the trinitarian model should BEST be thought of as
one type of evidence that, with others, contributes to a judgement concerning the validity of a test
"it's a measure of validity that arrived at by a comprehensive analysis of how scores on the test relate to other test scores" this statement is a reference to
a test is considered valid when the test
measure what it purports to measure
predictive and concurrent validity can be subsumed under
relating scores obtained on a test to other test scores or data from other assessment procedures is typically done in an effort to establish the _________ validity of a test
face validity refers to
the appearance of relevancy of the test items
In an undergraduate measurement course, an instructor announces that the first examination will cover the topics of reliability and validity. One student in the class, Jamarr, publicly predicts that only questions on reliability will be posed. As it turns out, true to Jamarr's prediction, all of the test questions are on the topic of reliability. Given this background, which of the following is the most reasonable conclusion that Jamarr's fellow students could draw?
the first examination lacked content validity
It has to do with the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use. it is
Before constructing a comprehensive final examination that covers everything you have studied since day 1 od your course, your instructor reviews the objectives of the course, the textbook, and all lecture notes. Your instructor is clearly making a diligent effort to maximize the _____________________ validity of the final examination
If a test developer has only a "fuzzy" vision of the construct being measured, then
the content validity of the test is likely to suffer, the construct validity of the test is likely to suffer, and the content irrelevant to the targeted construct may be measured.
A standard against which a test or test score is evaluated is known as
Which of the following is BEST are viewed as varieties of criterion-related validity?
concurrent validity and predictive validity
The form of criterion-related validity that reflects the degree to which a test score is correlated with a criterion measure obtained at the same time that the test score was obtained is known as
The form of criterion-related validity that reflects the degree to which a test score correlates with a criterion measure that was obtained some time subsequent to the test score is known as
a key difference between concurrent and predictive validity has to do with
the time frame during which data on the criterion measure is collected
which is an example of a criterion?
achievement test scores, success in being able to repair a defective toaster, and student ratings of a teaching effectiveness
what type of validity evidence BEST sheds light on whether a college adminissions test is valid for selecting students who will complete the program withing 4 years
predictive criterion-related validity
which magnitude of validity coefficient is typically acceptable to conclude that a test is valid
the range must be between 0 to 1
a coefficient of correlation is calculated between Henry's score on a test of sociopathy and a clinician's rating of henry on the variable of sociopathy. this coefficient of correlation might also be referred to as
a validity coefficient
a construct is
unobservable, something that describes behavior, and something that is assumed to exist
which qualifies as a construct?
depression, intelligence, and mechanical aptitude
If a test is a valid measure of a particular construct, we would expect that
groups of people who differ with respect to the construct will obtain different test scores
a significant, positive relationship exists between scores on a new test of intelligence and scores on the fourth edition of the standord-binet intelligence scale. These data may be viewed as supportive of which type of validity evidence for the new test?
convergent evidence of construct validity
A statistically insignificant correlation exists between scores on a new test of depression and a well-established measure of satisfaction with life. These data may be costrued as which type of validity evidence with regard to the test of depression?
discriminant evidence of construct validity
which term is used to refer to the tendency of a rater to evaluate ratees higher than they objectively deserve because of the rater's inability to discriminate between aspects of the ratee's behavior
a supervisor unintentionally rates his supervisees less favorable than they really deserve. which type of error has been made?
a rater systematically assigns rating in the middle range. Which type of error BEST characterizes this rater's ratings?
central tendency error
if new predictors explain something about a predicted score that was not already explained by existing predictors, the new predictor might be praised for its
a test developer compares a student's performance on a newly developed math achievement test to the same student's performance on a well established math achievement test for the purpose of exploring the _____________ validity of the new test
comparing SAT scores earned in high school with the first college GPA of that same student is a process related to establishing the _______________ validity of the SAT
which is an example of convergent evidence for the construct validity of a test measuring fear of cats?
both a high correlation between the test and an existing validated test measuring fear of cats and a high correlation with an existing validated test measuring more-generalized fear
if a newly developed test designed to measure happiness correlates with other tests of happiness but not with tests of sadness, this referred to as ____________and ____________ evidence of validity, respectively
convergent and discriminant
a test reviewer comes to the conclusion that a certain test is a "valid test". this means that the reviewed test has been shown to be valid for
a particular use with a particular population at a particular time
THIS SET IS OFTEN IN FOLDERS WITH...
test 1 psychological tests
Psychology Tests and Measures Lecture Quiz 1
Psychology Tests and measures Chapter 9 and 10
Psychology tests and measures final quiz
YOU MIGHT ALSO LIKE...
Combo with Final Exam Review and 1 other
Final Exam Review
Psychological Testing: Chapter 5
OTHER SETS BY THIS CREATOR
NURS 330: medication administration
NURS Fundamentals 330: Chapter 41 Oxygenation
OTHER QUIZLET SETS
Physics midterm 2018
Psych 295 final
Ch.2-3 Health and Illness
Bio Stats Midterm Study Guide