# D7.2 Test construction (cloze narratives)

An (1) is conducted to determine which items to retain in the final version of a test. An (2) (p) is calculated by dividing the number of examinees who answered the item correctly by the (3). It ranges in value from (4) to (5). In general, an item difficulty level of (6) is preferred because it not only maximizes (7) between examinees of low and high ability but also helps ensure that the test has high (8). However, the optimal difficulty level is affected by the probability that an examinee can (9). For this reason, the optimal p value for true/false items is (10). An (11) index (D) is calculated by subtracting the percent of examinees in the lower-scoring group from the percent of examinees in the upper-scoring group who answered the item correctly. It ranges in value from (12) to (13). Advantages in IRT are that item parameters are (14) and performance on different sets of items or tests can be easily (15). In summary, use of IRT involves deriving an item (16) for each item that provides information on one, two, or three parameters, i.e., (17), (18), and (19).