Upgrade to remove ads
Terms in this set (38)
uniformity of procedure in administering & scoring a test
reduces measurement error
the scores of a representative sample of the population on a particular test
comparing an individual's test score to norms
administration, scoring, & interpretation of scores are separate of the subjective judgment of the examiner (e.g. MMPI-II, WAIS-IV)
consistency; provides repeatable, consistent results
test measures what is says it measures
tests that show test taker's best possible performance (e.g. achievement & aptitude tests)
tells what a test taker usually does or feels (personality tests, interest inventories)
limit when the measure does not include an adequate range of items at the "top" of the exam (e.g. intelligence test does not have enough difficult items, so high achieving test takers score similarly)
test does not contain adequate range of questions at the "bottom/low" end of the exam
classical test theory
obtained test score comprised of 2 components: truth & error
truth--classical test theory
reflects test takers actual status on whatever attribute is being measured by the test
error (measurement error) classical test theory
refers to factors that are irrelevant to whatever is being measured. it is random--can be due to any # of factors.
part of most methods of estimating a test's reliability.
it is a correlation coefficient that ranges between 0.0 to 1.0
closer to 1 = higher reliability (personality tests ~ .70 & above; selection tests in industrial settings ~ .90)
test-retest reliability (coefficient of stability)
administering the same test to the same group of people & then correlating scores on 1st & 2nd administration
drawbacks: practice effects, changes in administration conditions,
alternate forms reliability (equivalent forms, parallel forms)
administering 2 equivalent forms of a test to the same group of examinees & then correlating the scores.
the 2 forms are not administered in succession, in order to demonstrate a high coefficient it must be consistent across time & across different content
internal consistency reliability
obtaining correlations among individual items. three methods: split-half, Cronbach's coefficient alpha, & the Kuder-Richardson Formula 20.
split half method
divide the test into 2 halves (e.g.odds/evens) and then scoring each half & correlating the 2 scores
used when test items are dichotomously scored (e.g. MMPI-II)
Cronbach's coefficient alpha
used when tests have multiple scored items (e.g. a likert scale test, BDI)
interscorer reliability (inter-rater reliability)
concern for measures on which scoring depends upon rater judgment. most common method involves calculating a correlation coefficient b/w the scores of 2 different raters
the measure of agreement b/w 2 judges who each rate a set of objects using nominal scales
standard error of measurement
indicates how much error an individual test score can be expected to have. SEM is used to construct a confidence interval. the lower the reliability of a test the more error it contains
indicates the range in which an test taker's true score is likely to fall, based on obtained score
factors affecting reliability
1) longer tests have more reliability
2) the more homogeneous the group taking the test the more the reliability coefficient decreases
3) If test items are too easy or too hard then score variability decreases & decreases reliability coefficient
4) a test that people can guess the correct answer (e.g. a true/false test) the lower the reliability coefficient
the extent to which the test items adequately & representatively sample the content area to be measured
the test appears to the test takers, the personnel administering the test, & untrained observers to be valid. increases motivation to complete test, cooperation with test process
criterion related validity
useful for predicting a person's behavior in a specific situations. scores on predictor test are correlated with an outside criterion, such as job performance, school achievement, or scores on another test
the procedures used to determine how valid a predictor is
the predictor & criterion data are collected at/about the same time. if a test is useful for predicting a given current behavior it has concurrent validity
scores on the predictor are collected 1st then the criterion data are collected (future)
formula for standard error of measurement
SEmeans = SD*(the square root of 1-r [reliability coefficient])
factors affecting the validity coefficient
1) restricted range of scores--more diverse the group (scores) the higher the validity coefficient.
2) both the predictor & the criterion must be reliable, however reliability does not guarantee validity
3) criterion-related validity may vary among subgroups within a population.
4) cross-validation: after 1 validation it is validated again with a different sample
5) criterion contamination: when the predictor scores influence any individual's criterion status
during cross-validation the reduction that occurs in a criterion-related validated coefficient.
the degree to which a test measures a theoretical construct or trait
requires that different ways of measuring the same trait yield similar results.
discriminant (divergent) validity
when a test has a LOW correlation with another test that measure a different construct. a test of mechanical ability should not have a high correlation with a test of reading ability.
a complex statistical procedure which is conducted to assess the construct validity of a test or a # of tests, among other reasons
THIS SET IS OFTEN IN FOLDERS WITH...
EPPP: Social and Cultural Bases of Behavior
EPPP - Ethics
EPPP-Statistics and Research Design
YOU MIGHT ALSO LIKE...
Combo with Final Exam Review and 1 other
Tests and Measurements CH 6
OTHER SETS BY THIS CREATOR