Test Theory Test 2

1 / 23
Give the equation for the 1PL item response model, explain what kind of data it can be applied to, and interpret the equation.
Click the card to flip 👆
Terms in this set (23)
T(uj = 1 | theta i) = 1 / (1 + e^(-a(theta i - bj)

--Applicable to data where all test items have the same discrimination parameter
-- Left side = probability of responding correctly to the test item (j)
-- theta = latent factor (xenophobia, test anxiety, idiocy, etc.)
-- b = "difficulty," the point on the theta continuum where there is equal likelihood of answering correctly or not
--a = discrimination parameter, slope of the theta trace line at point b.
L varies as PIsubu ( integral of PI from k to j of (Tj^(uj) (1-Tj) ^(1-uj))as a function of theta) ^ ru

-- Tj = The trace line for probability of answering correctly on an item
-- (1-Tj) = opposite of the above, the probability of answering incorrectly
-- PI from k to j of ... = items are locally independent (of each other), therefore you can multiply the probs for all of the items to get the entire response pattern of the test
-- ...as function of theta = theta is assumed to be on a standard normal distribution, therefore it is a continuous random variable
-- integral... = removes theta because the model assumes one dimension. If using a multidimensional model, you integrate more dimensions (theta2, theta3, etc.)
-- PIu (...) ^ru = multiplies together response patterns of all subjects to get total probs of all subjects, BECAUSE subjects are assumed independent
-- Points along the theta curve that can be blocked off and added as areas of rectangles under the curve to closely approximate the true integral of the curve. The more quadrature points, the closer to the real curve estimation, the better the summation. But it also takes way longer.
-- Quadrature points/area summation is used to estimate the likelihood function in EM MML
Expectation-Step -- Find the likelihood function of theta (area under the curve)
-- Use this to calculate expected freqs of each response pattern (TFTF, FFTT, etc.), which is the number of subjects who responded in each pattern at each level of theta

Maximization-Step -- Use these estimated freqs to find the max value of the same likelihood function with these parameters:
-Theta is random, M=0, SD = 1
-Response patterns are already fixed based on e-step above
-Item parameters are free (a and b) so these are tweaked to find the highest possible frequency function

Plug the new item parameters back into the e-step and find a new likelihood function

Repeat until convergence (no change, or <0.000001 change)
--During an m-step, item parameters (a and b) are estimated to give max likelihood. Then they get plugged back into their item response functions (trace line, theta function)

--Multiply all item response functions associated with that response pattern (so, TFTF, you do prob of correct times incorrect times correct...)

--This gives likelihood of that specific response pattern, dependent on only theta because all other parameters are "known" (estimated)

--Assume theta is normal, like it has been

--Use this to estimate the posterior distribution by multiplying it by all the trace lines (all the calculated likelihoods) and putting it over the summation of the same thing across all quadrature points, which makes it integrate to 1

-- Posterior distribution is the frequency that people will respond to the test with that response pattern. If you multiply by the number of people in sample who gave that pattern, then sum all the pattern proportions, you get the total estimated number of people who embody that level (quadrature point) of the theta distribution.
--"Information" is the reciprocal of the precision of the parameter estimate (theta). Precision in analysis is defined as the variability. Therefore the information value for estimated theta is:

I= 1/SD^2theta

--SD^2 for a test is:

1/ (sum of k at j=1 of ([T-diff]^2/((Tj)(1-Tj))

or the reciprocal of the summation of all test items (k) starting at 1, of the differential of probability of correctness squared, divided by the item correctness times the incorrectness.

--So information is the reciprocal of all that.

--Which can be rearranged to:

I= Sum of k at j=1 of ([T-hat]^2/Tj(1-Tj)

(removes the "reciprocal of a reciprocal")

Which simplifies to:

Sum of k starting at j=1 of differential of T by theta ^2 / correctness*incorrectness)

-- to find information for one given item, just remove the summation and find the equation at j item.

-- In a 2PL model, T = 1/(1+e^(-a (thetai - bj))

--You can use the above general equation for I and plug in the equation for T, then do a lot of crazy derivations, to arrive at:

for any item j in a 2PL model: Ij = aj^2 Tj (1-Tj)

Or the squared discrimination parameter times probability of correctness times probability of incorrectness.

-- For a 1PL model (assuming all items have same discrimination): Ij = Tj * (1-Tj)

--For 3PL:

Ij = aj^2 (incorrectness/correctness) ((correctness - gj)^2) / ((1 - gj) ^2))

Where g is the "guessing" parameter.
--These models cause observed response proportions to be dependent on the model. This is inappropriate for chi-square distributions and majorly inflates type 1 error (false positive).

-- The tests put subjects into equal sized subgroups. Therefore the results are hugely dependent on the sample, and the way in which those subgroups are chosen (how many, how spaced out they are) majorly affects the resulting statistic.
-- find the observed proportions of correct responses using only the data collected. So for every person who got a score of 8, e.g., how many answered 1 on an item? Do this for all summed scores.

-- Then, find the expected proportions of correctness. Find the likelihood of a right or wrong response based on the IRT model.

-- S-K^2 = Sum of n-1 starting at k = 1 of (Nk) ((Oik - Eik)^2) / ((Eik (1-Eik))

k = number of correct scoring groups
O= observed proportion already calculated
E = Expected proportion already calculated
Nk = number of people in correct scoring group

--Since observed props don't depend on the model, there is no conflict as with other test stats
--Grouping of summed scores does not depend on sample size
--Data is grouped by each possible summed scores (also not sample dependent)
-- Can be used on polytomous items (more than one correct response)
Describe the steps involved in simulating data for a 2PL IRT model.-- Obtain population parameters (a and b) for each test item (j) -You can do this using previously estimated parameters from other experiments -Look at general distributions of the parameters used previously for the statistical test you are examining -- Find a "true" value of theta for each subject - draw this from a normal (0,1) distribution, unless you are planning to violate that assumption. -- plug the parameters and theta into the trace line function to get probability of correctness. -- Translate the result above into binary correct or incorrect responses. -Generally, if the probability is above calculated T it is "correct" --Do the stat test. Get results --Repeat this a ton of times for replicability.Explain Samejima's graded item response model. How do you interpret the parameter estimates? What kind of data is it appropriate for?--Equation: T (uj=v|theta) = (item response equation for v) - (item response equation for v+1) --used for ordinal response data (multiple choice) -- You are generally finding the probability that a response is that "category" of response (so, choosing "b") OR HIGHER. But you only want that one category, so you must subtract the "or higher" part. --a= discrimination = strength of the relationship to the latent variable (theta) -- b = "difficulty" -- point on theta where there is equal likelihood of answering right or wrong. -- The trace line of a single item is not one line; it is multiple lines that represent each category of response. So an item with four responses will have four trace lines. -- The trace lines are different than normal, because the beginning trace line accounts for only response 1, whereas the traces for 2 and 3 are for those responses AND ALL GREATER ONES, but then SUBTRACTING ALL GREATER RESPONSES. They are one degree less and therefore appear as parabolas.How do you compute IRT EAP scores? What does EAP stand for?--Expected a posteriori score -- calculated using the mean of a subject's posterior distribution (frequency of having that response pattern at that level of theta) -- Find the posterior distribution for one subject, with one response pattern. -- The EAP score will be the expected value (mean) of that distribution. It is the theta score you should expect from someone who gave that response pattern.What is a trace line for a 2PL model item? How do you interpret it?-- The trace line shows the probability of giving a specific answer, based on the subject's level of the latent variable (theta) --b = "difficulty" -- point on theta where responding correctly/incorrectly are 0.5. On the trace line, this is the point where the response probability (y-axis) is 0.5 --a = "discrimination" -- how much the test changes peoples' responses based on their levels of theta. If a is high, there will be a larger difference in probability between lower theta subjects and higher theta subjects. On the trace line, it is the slope of the line at point b.Explain what kind of data the 2PL model can be applied to. Explain the interpretation of the model parameters.-- 2PL model works with binary response data. -- T = 1/(1+ exp (-aj (theta - bj))) T = Probability of answering correctly. As a function, it is the trace line for correctness probability given a subject's level of the latent variable a = "Discrimination" = degree to which the response probability will vary with different levels of theta. If it's strong, correctness will be much lower at lower theta than at higher. b = "difficulty" = level of theta at which the chance of being correct or incorrect is equalGive the equation for the 3PL response model, and interpret the parameters. What kind of data can it be applied to?--T (uj=1 | theta) = gj + (1-gj) * 1/ (1 + exp (-aj (theta - bj))) --g = "guessing" parameter. Lower asymptote of the trace line. Also called c. As theta approaches negative infinity, the probability of being correct approaches 1. It's the likelihood you will get a right answer with minimal value of theta. --Notice the equation is essentially prob = lucky guessing + (no luck)*(theta-related parameters). --Usually used for proficiency testing (definite right and wrong answers) --because guessing is included, all subjects have a non-zero probability of correctness. -- a = discrimination = degree to which theta affects correctness --b = difficulty = NOT the point at which prob of correctness/incorrectness are equal. With 3PL, b is the point on theta where the probability = (g+1)/2. That is, it is close to 0.5 but is modified by the allowance of guessing.How is interpretation of the b parameter different between 2PL and 3PL response models?--For 2PL -- b is the point at which there is an equal likelihood of correctness/incorrectness. (T = 0.5) --For 3PL -- b is the point at which the response likelihood = (g+1)/2. This means that it is moderated by guessing and the likelihood of correctness is higher than 0.5 generally.Which item characteristics can increase item information?-- a -- As a increases, the peak of the information function becomes largerWhy is using EAPs better at distinguishing specific subjects than comparing summed scores in non-1PL response models?-- EAP's distinguish subjects based on their response patterns. Two subjects with the same summed score can have different response patterns, and EAP tests give weight to each item based on its estimated parameters (a and b).Describe the steps required to find a "classic" (Bock, Yen, Mckinley & Mills) item fit statistic-- Calibration -- Estimate item parameters and theta for each subject -- Sort -- order subjects based on their EAPs Group -- Keep the subject order the same, try to make 10 subgroups with equal size --Calculate Proportions -- Observed -- Find the proportion of subjects who answered correctly within each subgroup Predicted -- Find the proportion using the trace line function and plugging in EAP-predicted parameters, and the average EAP-predicted theta for the subgroup. Do this for all subgroups. -- Compare -- Find the proportion between observed and predicted subgroups using a chi-square-similar statistic that adds the proportions of observed to predicted proportions of each subgroup.How would you statistically compare the overall model fit of two nested response models with EM MML?--A model is nested within a larger model if you can get the smaller model by setting some large model parameters to zero. That is, if you take away some extra parameters in the larger model, you can still get the smaller model, and therefore the smaller one is more appropriate. -- EX -- If you have a 3PL model and you can get an equivalent 2PL model by setting the guessing parameter to 0, the 3PL is extraneous.If you were looking at a response trace line, how could you tell if it had a valuable "guessing" parameter? That is, how could you tell a 2PL trace line from a 3PL trace line?Look at the lower asymptote (part of the trace line going towards negative infinity). If it is near zero, it is probably 2PL. If it is near non-zero, it is probably affected by guessing and is therefore 3PLExplain what a trace line for Samejima's graded model looks like, and how you would interpret it.-- The plot will look like a bunch of trace lines all on the same plot. -- The trace line for the first categorical response will be nearly normal shaped, but reversed. This is because the trace is starting at the BOTTOM of the threshold continuum, and does not encompass the entire response continuum like a normal single trace line. -- The "middle" trace lines are all parabolas. This is because they consist of the likelihood of choosing that specific category (like, choosing "b") MINUS any progressive category after that. So they are one degree lower than a normal trace line and only encompass a limited area in the middle. -- The final trace line looks "normal" because it is essentially a regular trace line for that one category, as it does not have to account for the probability of any thresholds for categorical responses after itself.