Test Worthiness: Validity, Reliability, Practicality and Cross Cultural Fairness
A test is valid when it measures what it's supposed to; There is evidence supporting the use of test scores
One of the four measures of test worthiness. The degree to which cultural background, class, disability, and gender do not affect test results.
Extent to which an assessment instrument or procedure is inexpensive and easy to use and takes only a small amount of time to administer and score.
A statistical measure of the extent to which two factors vary together, and thus of how well either factor predicts the other; relationship that approaches +1 or -1 demonstrates a strong relationship.
A graph showing the relationship between two variables (represented by the axes), whereby the corresponding data values are plotted as single points. The pattern of points produced may suggest a correlation exists between the variables.
Coefficient of Determination
R squared, used to assess how well the regression model fits the data: represents the percentage of the variation in the dependent variable that is explained by the regressions model.
Evidence that the test items represent the proper domain.
What steps are used to demonstrate validity?
1) Show that test developer adequately surveyed the domain. 2) Show that the content of the test matches what was found in teh survey of the domain 3)show that the test items accurately reflect the content 4) Show that the number of items for each content area matches the relative importance of these items as reflected in the survey of the domain.
Superficial appearance of a test--not true validity
Relationship between test scores and a future standard, A measure of validity based on showing a substantial correlation between test scores and job performance scores.
Relationship between test scores and a currently obtainable benchmark; occurs when a test is shown to be realted to an external source that can be measured at around the same time the test is being given.
Relationship between test scores and a future standard; The success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior.
Standard Error of the Estimate
based on one variable, allows us to predict a range of scores on a second variable.
When an instuemnt incorrectly predicts a test taker will have an attribute or be successful when s/he will not.
When a test forecasts an individual will not have an attribute or will be unsuccessful when s/he will.
Evidence that an idea or concept is being measured by a test; very important when testing more abstract thigns like intelligence/personality. Can show through 1) Experimental Design 2) Factor analysis 3) Convergence with other instruments 4) discrimination with other measures.
Experimental Design Validity
Using experimentation to show that a test measures a concept; confirms the hypothesis you developed to scientifically show that your construct exists.
Statistically examining the relationship between sub-scales (items) and the larger construct; ex. breaking each question of a scale down and analysing whether or not they individually relate to the larger topic you are testing and to one another.
Relationship between a test and other similar tests; they should correlate but not be 100% the same.
Showing a lack of relationship between a test and other dissimalr tests; looking ot find little or no relationship between your test and other dissimilar tests/ measures that are not theoretically related to your test.
Degree of freedom form measurement error--consistency of test scores; An individual taking a test with good reliability will get similar scores each time they take the test; can measure using test-retest, alternative forms, internal consistency
relationship between scores from one test given at more than one different administration
Alternate froms Reliability
relationship between scores from different similar versions of the same test.
reliability measured statistically by going "within" the test; a determination is made as to how scores on individual items relate to each other or to the test as a whole.
Correlating one half of a test against the other half
Coefficient Alpha or Kuder-Richardson
Reliability based on a mathematical comaprison of individual items with one another and total score.
Item Response Theory (ITR)
Examining each item for its ability to descriminate as a function of the contsruct being measured.
Feasability considerations in test selection and administration; time, cost, format.
The ability of the examinee to comprehend what he or she is reading.
Ease of Administration, Scoring and Interpretation
1) The ease of understanding and using trest manuals and related information. 2) The number of individuals taking the test and whether or not their numbers affect the ease of administering or the instrument 3) The kind of training and education needed to administer the test, score the test, and interpret the test results 4) The "turnaround time" in scoring the test and obtaining the results. 5) The amount of time needed to explain test results to examinees. 6) Associated materials that may be helpful in explaining test scores to examinees (e.g. printed sheets granted by the publisher)
What are the steps to selecting and administering good test?
1) determine the goals of your client 2)CHoose instruments to reach client goals 3) Access Information about possible instruments 4)Examine validity, reliablility, corss-cultural fairness, and practicality of the possible instruments 5)