Search
Create
Log in
Sign up
Log in
Sign up
Item Response Theory
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (20)
item characteristic curve
The relationship between the probability of a correct
0.9 response on a true/false item and the underlying dimension can be assumed to take the form of a cumulative normal distribution
features of an item characteristic curve
50% = Slope at this point is estimate of discrimination
Point on X-axis is estimate of difficulty or threshold
1. ICC differ in difficulty or threshold
Easier = LEFT
Harder = RIGHT
2. ICC differ in discrimination
Steeper slope = more discriminating
3. Parameter ICC
A third parameter "pseudo-guessing" can be used to estimate the probability of a response for people with very low levels of the underlying dimension
explain how IRT differs in its approach to test construction compared to classical test theory
Item Response Theory is a newer theory with a focus on test items that adds more tools for solving measurement problems in psychology
- Test bias
- Adaptive testing
- Item selection
• CTT focuses more on the total score of a scale or sub-scale
• IRT focuses on the relationship between items and the total score or latent dimension underlying the test.
Item Analysis & CTT
• Classical Test Theory is often thought off as the theory of total scores
• Key equation of CTT- Testscore=true score+ error
• Interested in the true score but for any individual we can
only observe the test score
• It is concerned with the reliability and validity of total scores
• Methods that estimate item difficulty and item discrimination that are "tack on" to CTT
• In IRT the relationship between the item and the overall construct being assessed is central.
Classical Test Theory
• Item analysis is an add on
• Estimates of test and item parameters are dependent on the sample from which they were calculated
• But Scoring in CTT is usually simpler (eg addition of items) rather than requiring computer time as in IRT
Item Response Theory
• Is concerned about the relationship between observed responses to items and the underlying dimension or construct
• Assumes that there is a relationship between responses to items and the underlying or latent dimension being assessed by the scale
use item characteristic curve properties to define a good test item
Items with similar discrimination but that differ in difficulty
• Note Rasch Model: three of these four items discriminate below the population average
ICC Multiple Choice
An ICC for multiple choice items plots a separate curve for each response
e.g. Beck Depression Inventory item 10.
"Episodes of crying"
• Responses are supposed to be ordinal but option 2 is less frequently chosen than option 3 at all levels of depression = Example of non-parametric IRT
select a good test item based on IRT statistics
Item Information Functions
Test Information Functions
Summary Different IRT Models
• Parametric or Non-parametric
- Parametric more common
- Ramsay's TestGraf for non-parametric IRT
• Number of parameters estimated
1. Parameter or Rasch Models - assume all items have same slope or discrimination and differ only on difficulty or threshold. May be theoretically strong but may give a poor fit to the data.
2. Parameter Models - like those seen above
3. Parameter Models - Add a parameter for pseudo-guessing (more widely used in ability testing)
• Linkfunction
- Logistic (mathematically easier)
- Normal Ogive/probit (theoretically stronger)
Differential Item Functioning
Item Bias
Classical Test Theory Approach
• Cronbach's Alpha=0.761 for these 11 items in the sample of 7,746
• Note that the relationship between an item and the total score is expressed by a single number the Item-total correlation (ITC)
• In this instance because the item is binary (yes or no) the correlation with the total is a point bi-serial correlation • The closer the ITC gets to 1.00 the stronger the relationship between the item and the total score
• Remember that correlations work on standard scores so the scale is removed.
IRT in Test Construction
Steps in test construction
1. Preparation of specifications
2. Preparation of an item pool
3. Field testing of items
4. Selection of test items
5. Compilation of norms if required
6. Specification of cutoff scores if required
7. Reliability studies
8. Validity Studies
9. Final test production
Field testing
requires larger samples but the randomness of the sample is less important because IRT estimates are (theoretically at least) sample invariant
Selection of test items
can be on the basis of achieving the desired test information function
Item banking and Adaptive testing
• Once IRT parameters (discrimination, difficulty, pseudo- guessing) are known from a large sample it is possible to choose items that provide the best estimate a persons level on the latent dimension with known precision.
• Ranges from two stage testing procedures to computerized adaptive testing
• Where to start?
- An item with high discrimination and average difficulty
• The next item
- Is chosen to maximize the information (minimize the error of
measurement)
• When to finish?
- a set number of items
- a time limit
- A minimum standard error of estimation
• People of different ability do different items
Pyramidal Testing Model
Heavy line shows route of test taker whose item responses are listed across the top
CAT Item selection
• For example a 4item bank for a maths test
- Item A "567 + 235 = ?" difficulty =0
- Item B "456 / 56 = ?" difficulty=1
- Item C "24 + 78 = ?" difficulty= -1
- Item D "10 + 15 = ?" difficulty=-1.5
• If Item A correct then the next choice would be item B and if correct maths ability would be at or above 1 (in standardised scores mean=0, SD=1)
• If itemA incorrect then next choice C then if C incorrect then item D if D incorrect then ability would be less than -1.5 (standardised)
• More items would give greater allowance for getting a single item wrong (increase reliability and reduce standard error)
Embretson's New Rules of measurement
1. The Standard Error of Measurement
Old Rule: The standard error of measurement applies to all scores in a particular population
• New Rule: The standard error of estimation differs across scores but generalizes across populations
• The Standard error of estimation is the inverse of the test information function
2. Test Length and Reliability
3. Interchangeable test forms
4. Assessment of item properties
• Old: Unbiased assessment of item properties depends on having representative samples
• New: Unbiased assessment of item properties may be obtained from un- representative samples
5. Establishing Meaningful Scale scores
6. Establishing Scale Properties
7. Mixing Item Formats
• Old: Mixed item formats leads to unbalanced impact on test total scores
• New: Mixed item formats can yield optimal test scores
8. The meaning of change scores
9. Factor Analysis of Binary Items
10. Importance of Item Stimulus Features
Key Points
• IRT provides estimates of item discrimination, difficulty that are sample invariant
• Item Characteristic Curves provide information about
1. item discrimination,
2. difficulty
3. (and pseudo-guessing)
• IRT is a newer theory with a focus on test items that adds more tools for solving measurement problems in psychology
- Test bias
- Adaptive testing
- Items selection
YOU MIGHT ALSO LIKE...
Academic Word Lists - AWL Sublists
giflingua
$3.99
STUDY GUIDE
PSY3041 Weeks 3 and 4
49 terms
deliah_furayi
Chapter 10: Item Analysis
65 terms
lauren_holt6
PLUS
Psychometrics - Fall 2014
97 terms
jaaronsmith
OTHER SETS BY THIS CREATOR
ASSESSMENT OF PERSONALITY
31 terms
miaabirk
APPLICATION OF ASSESSMENT TECHNIQUES IN CLINICAL PRACTICE
20 terms
miaabirk
ASSESSMENT OF INTELLIGENCE
32 terms
miaabirk
ETHICAL ISSUES IN THE USE OF PSYCHOLOGICAL ASSESSMENT
11 terms
miaabirk