Upgrade to remove ads
Combo with Final Exam Review and 1 other
Terms in this set (57)
Reliability, in a broad statistical sense, is synonymous with:
A source of error variance may take the form of:
item sampling, test takers' reactions to environment-related variables such as room temperature and lighting, and test taker variables such as amount of sleep the night before a test, amount of anxiety, or drug effects (all of these.)
Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people) on two different administrations of the same test?
a test-retest estimate
A reliability coefficient is:
an index, a ratio of the total variance attributed to true variance, and unaffected by a systematic source of error (All of these.)
An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval between the test and retest is more than:
As the reliability of a test increases, the standard error of measurement:
Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is relatively stable over time?
Which of the following is true of systematic error?
It has no effect on the reliability of a measure.
Computer-scorable items have tended to eliminate error variance due to:
Which of the following might lead to a decrease in test-retest reliability?
the passage of time between the two administrations of the test, coaching designed to increase test scores between the two administrations of the test, and practice with similar test materials between the two administrations of the test (All of these.)
If items from a test are measuring the same trait, estimates of reliability yielded from KR-20 will typically be ________ as compared to estimates from split-half methods.
If traditional measures of reliability are applied to criterion- referenced tests, the reliability estimates will likely be:
Test-retest estimates of reliability are referred to as measures of ________, and split-half reliability estimates are referred to as measures of ________.
stability; internal consistency
Which of the following factors may influence a split-half reliability estimate?
fatigue, anxiety, and item difficulty (all of these.)
KR-20 is the statistic of choice for tests with which types of items?
multiple-choice and true-false (all of these.)
The Spearman-Brown formula is used for:
correcting for one half of the test by estimating the reliability of the whole test, determining how many additional items are needed to increase reliability up to a certain level, and determining how many items can be eliminated without reducing reliability below a predetermined level (all of these.)
Typically, adding items to a test will have what effect on the test's reliability?
Reliability will increase.
Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?
Assign easy items to one half of the test and difficult items to the other half.
A police officer mistakenly records the blood alcohol level of a suspected drunk driver after administering a breathalyzer test. This mistake is most related to which type of reliability?
A coefficient alpha over .9 may indicate that:
the items in the test are redundant.
Which best conveys the meaning of an inter-scorer reliability estimate of .90?
Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error.
If a time limit is long enough to allow test takers to attempt all items, and if some items are so difficult that no test taker is able to obtain a perfect score, then the test is referred to as a ________ test.
If a test is homogeneous:
it is functionally uniform throughout, it will likely yield a high internal-consistency reliability estimate compared with test-retest, and it would be reasonable to expect a high degree of internal consistency (all of these.)
Which type(s) of reliability estimates would be most appropriate for a measure of heart rate?
Typically, speed tests:
contain items of a uniform difficulty level.
Which type(s) of reliability estimates would be appropriate for a speed test?
test-retest, alternate-form, and split-half from two independent testing sessions (all of these.)
Generalizability theory is most closely related to
Traditional measures of reliability are inappropriate for criterion-referenced tests because variability:
is minimized with criterion-referenced tests.
A test is considered valid when the test:
measures what it purports to measure.
Face validity refers to:
the appearance of relevancy of the test items.
Which is NOT a method of evaluating the validity of a test?
evaluating the percentage of passing and failing grades on the test
Predictive and concurrent validity can be subsumed under:
may influence the way the test-taker approaches the situation, relates more to what the test appears to measure than what the test may actually measure, and has received little attention and is given short-shrift as compared to other indices of validity (all of these.)
Which assessment technique has the MOST face validity?
administering a word processing test to a person applying to be a word processor
Relating scores obtained on a test to other test scores or data from other assessment procedures is typically done in an effort to establish the __________ validity of a test.
An instructor announces that an examination will cover the topics of reliability and validity. A student boasts that he will read and study only the material on reliability. In fact, all the test questions are only on reliability. The best conclusion a student of assessment could draw from this is that:
the examination lacked content validity.
Lawshe devised a method for determining agreement among raters or judges who rate items on how essential they are. This method provides a way to quantify what type of validity?
In calculating the content validity ratio, panelists are asked to determine:
if the skill or knowledge measured by the item is essential.
A standard against which a test or test score is evaluated is known as:
The minimum value of a content validity ratio necessary to be statistically significant at the .05 level is dependent on:
the number of panelists judging the items.
Which may best be viewed as varieties of criterion-related validity?
concurrent validity and predictive validity
The form of criterion-related validity that reflects the degree to which a test score is correlated with a criterion measure obtained at the same time that the test score was obtained is known as:
The form of criterion-related validity that reflects the degree to which a test score correlates with a criterion measure that was obtained some time subsequent to the test score is known as:
A key difference between concurrent and predictive validity has to do with:
the time frame during which data on the criterion measure is collected.
Which is an example of a criterion?
achievement test scores, success in being able to repair a defective toaster, and student ratings of teaching effectiveness (all of these.)
An index of utility can be distinguished from an index of reliability and an index of validity in that an index of utility can tell us something about:
the practical value of the information derived from what a test measures.
sets a ceiling on test utility.
If targeted test-takers for a particular test consistently fail to follow the directions for taking the test then:
the test could still have great utility and the test could still be valid (b and c.)
Validity is to ____________ as utility is to ____________.
Often used for the purpose of licensing persons in professions, these tests are called:
Which is an example of the selected-response item format?
a multiple-choice item
A well-written true-false item:
has a correct response that is veritably true or false, and not subject to debate.
Which statement is TRUE of the test tryout phase of test construction?
Test conditions should be as similar to the actual administration as possible.
The item-validity index is key in determining:
The higher the item-difficulty index, the ________ the item.
An item-reliability index provides a measure of a test's:
standard error of measurement
indicates how much error an individual test score can be expected to have. SEM is used to construct a confidence interval. the lower the reliability of a test the more error it contains
YOU MIGHT ALSO LIKE...
Psychology Tests and Measurements Quiz 2
Final Exam Review
Psychological Testing: Chapter 5
OTHER SETS BY THIS CREATOR
Praxis education of young child
Social studies and science for methods exam
Earth and Space final
Earth and Space Science- Final Exam part 3
OTHER QUIZLET SETS
scmn 5720 final only letters
Chapter 11, Unit 2, Study guide