self-report data (S-data)

Information a person verbally reveals about themselves, often based on questionnaire or interview, is self-report data. Self-report data can be obtained through a variety of means, including interviews that pose questions to a person, periodic reports by a person to record the events as they happen, and questionnaires of various sorts.

structured versus unstructured

Self-report can take a variety of forms, ranging from open-ended questions to forced-choice true or false questions. Sometimes these are referred to as unstructured (open-ended, such as "Tell me about the parties you like the most") and structured ("I like loud and crowded parties"; answer true or false) personality tests.

Likert rating scale

In using questionnaires, participants answer each item using the response options provided. These response options are in the form of various rating scales. One example is "true or false" as a rating scale. A common rating scale is called the Likert scale, and it provides numbers that are attached to descriptive phrases, such as 0 = disagree strongly, 1 = disagree slightly, 2 = neither agree nor disagree, 3 = agree slightly, 4 = strongly agree.

experience sampling

In experience sampling, people answer some questions, for example about their mood or physical symptoms, every day for several weeks or longer. People are usually contacted electronically ("beeped") one or more times a day at random intervals to complete the measures. Although experience sampling uses self-report as the data source, it differs from more traditional self-report methods in being able to detect patterns of behavior over time.

observer-report data (O-data)

The impressions and evaluations others make of a person whom they come into contact with. Observer-report methods capitalize on these sources and provide tools for gathering information about a person's personality. Observers may have access to information not attainable through other sources, and multiple observers can be used to assess each individual. Typically, a more valid and reliable assessment of personality can be achieved when multiple observers are used.

inter-rater reliability

Inter-rater reliability involves the use of multiple observers to gather information about a person's personality and then allows investigators to evaluate the degree of consensus among the observers. When different observers agree with one another, the degree of inter-rater reliability increases. When different raters fail to agree, the measure is said to have low inter-rater reliability.

multiple social personalities

Each of us displays different sides of ourselves to different people—we may be kind to our friends, ruthless to our enemies, loving toward a spouse, and conflicted toward our parents. Our social personalities vary from one setting to another, depending on the nature of relationships we have with other individuals.

naturalistic observation

In naturalistic observation, observers witness and record events that occur in the normal course of the lives of their participants. For example, a child might be followed throughout an entire day, or an observer may record behavior in the home of the participant. Naturalistic observation offers researchers the advantage of being able to secure information in the realistic context of a person's everyday life, but at the cost of not being able to control the events and behavioral samples witnessed.

test data (T-data)

A common source of personality-relevant information comes from standardized tests (T-data). In these measures, participants are placed in a standardized testing situation to see if different people react or behave differently to an identical situation. Taking an exam, like the Scholastic Aptitude Test, would be one example of T-data as a measure used to predict success in school.

functional magnetic resonance imaging (fMRI)

Functional magnetic resonance imaging (fMRI) is a non-invasive imaging technique used to identify specific areas of brain activity. As parts of the brain are stimulated, oxygenated blood rushes to the activated area, resulting in increased iron concentrations in the blood. The fMRI detects these elevated concentrations of iron and prints out colorful images indicating which part of the brain is used to perform certain tasks.

projective techniques

In projective techniques, a person is presented with an ambiguous stimulus and is then asked to impose some order on the stimulus, such as asking them what they see in an inkblot. What the person sees is interpreted to reveal something about his or her personality. The person presumably "projects" his or her concerns, conflicts, traits, and ways of seeing or dealing with the world onto the ambiguous stimulus. The most famous projective technique is the Rorschach inkblot test.

life-outcome data (L-data)

Refers to information that can be gleaned from the events, activities, and outcomes in a person's life that are available to public scrutiny. For example, marriages and divorces are a matter of public record. Personality psychologists can sometimes secure information about the clubs, if any, a person joins; how many speeding tickets a person has received in the last few years; whether they own a handgun. These can all serve as sources of information about personality.


Reliability is the degree to which an obtained measure represents the "true" level of the trait being measured. E.g., if a person has a "true" IQ of 115, then a perfectly reliable measure of IQ will yield a score of 115 for that person. Personality psychologists prefer reliable measures so that the scores accurately reflect each person's true level of the personality characteristic being measured.

repeated measurement

Repeated measurement is a way to estimate the reliability of a measure. There are different forms of repeated measurement, and hence different versions of reliability. A common procedure is to repeat the same measurement over time, say at an interval of a month apart, for the same sample of persons. If the two tests are highly correlated between the first and second testing, yielding similar scores for most people, then the resulting measure is said to have high test-retest reliability.

response sets

The concept of response sets refers to the tendency of some people to respond to the questions on some basis that is unrelated to the question content. Sometimes this is referred to as non-content responding. One example is the response set of acquiescence or yea saying. This is the tendency to simply agree with the questionnaire items, regardless of the content of those items.

noncontent responding

Noncontent responding, also referred to as the concept of response sets, refers to the tendency of some people to respond to the questions on some basis that is unrelated to the question content. One example is the response set of acquiescence or yea saying. This is the tendency to simply agree with the questionnaire items, regardless of the content of those items.


Acquiescence (also known as yea saying) is a response set that refers to the tendency to agree with questionnaire items, regardless of the content of those items.

extreme responding

Extreme responding is a response set that refers to the tendency to give endpoint responses, such as "strongly agree" or "strongly disagree" and avoid the middle part of response scales, such as "slightly agree," "slightly disagree," or "am indifferent."

social desirability

Socially desirable responding refers to the tendency to answer items in such a way as to come across as socially attractive or likable. People responding in this manner want to make a good impression, to appear to be well adjusted, to be a "good citizen."

forced choice questionnaire

In a forced-choice questionnaire format, test-takers are confronted with pairs of statements and are asked to indicate which statement in the pair is more true of them. Each statement in the pair is selected to be similar to each other in social desirability, forcing participants to choose between statements that are equivalently socially desirable (or undesirable), and differ in content.


Validity refers to the extent to which a test measures what it claims to measure. There are five types of validity: face validity, predictive validity, convergent validity, discriminant validity, and construct validity.

face validity

Face validity refers to whether a test, on the surface, appears to measure what it is supposed to measure. Face validity is probably the least important aspect of validity. In fact, some psychologists might argue that face validity refers to the assumption of validity, not to evidence for real validity.

predictive validity

Predictive validity refers to whether a test predicts some criteria external to the test. Scales that successfully predict what they should predict have high predictive validity.

criterion validity

Criterion validity or predictive validity refers to whether the test predicts criteria external to the test. Scales that successfully predict what they should predict have high criterion validity or predictive validity.

convergent validity

Convergent validity refers to whether a test correlates with other measures that it should correlate with. Convergent validity is high to the degree that alternative measures of the same construct correlate or converge with the target measure.

discriminant validity

Discriminant validity is often evaluated simultaneously with convergent validity. Whereas convergent validity refers to what a measure should correlate with, discriminant validity refers to what a measure should not correlate with. The idea behind discriminant validity is that part of knowing what a measure actually measures consists of knowing what it does not measure.

construct validity

Construct validity generally refers to whether a test measures what it claims to measure. It is often assessed by determining whether a test correlates with what it is supposed to correlate with, and does not correlate with what it is not supposed to correlate with. Construct validity is the broadest type of validity, subsuming face, predictive, convergent, and discriminant validity.

theoretical construct

Most personality traits refer to constructs, or what Allport called convenient fictions. For example, if someone asks you to show them your level of extraversion, there is nothing you could produce. Extraversion is a convenient fiction, a theoretical construct useful for explaining aspects of personality. Constructs are represented by observable measures, such as self-reports or observations of behavior. So, to explain how extraverted you are, you could produce scores on an extraversion scale. The construct, however, is always more than the observations.


Generalizability refers to the degree to which a measure retains its validity across different contexts, situations, and conditions. Greater generalizability is not always better; rather, what is important is to identify empirically the contexts in which the particular measure is and is not applicable.

experimental method

Experimental methods are typically used to determine causality—to find out whether one variable influences another variable. Experiments involve the manipulation of one variable (the independent variable) and random assignment of subjects to conditions defined by the independent variable.


Researchers conducting experiments use manipulation in order to evaluate the influence of one variable (the manipulated or independent variable) on another (the dependent variable).

random assignment

Random assignment in an experiment is assignment that is conducted randomly. If an experiment has manipulation between groups, random assignment of participants to experimental groups helps ensure that each group is equivalent.


In some experiments, manipulation is within a single group. For example, participants might get a drug and have their memory tested, then later take a sugar pill and have their memory tested again. In this kind of experiment, equivalence is obtained by counterbalancing the order of the conditions, with half the participants getting the drug first and sugar pill second, and the other half getting the sugar pill first and the drug second.

statistically significant

Refers to probability of finding the results of research study by chance alone. The generally accepted level of statistical significance is 5%, meaning that, if study were repeated 100 times, the particular result reported would be found by chance only 5 times.

correlational method

A correlation is a statistical procedure for determining whether there is a relationship between two variables. In correlational research designs, the researcher is attempting to directly identify the relationships between two or more variables, without imposing the sorts of manipulations seen in experimental designs.

correlation coefficient (its direction and magnitude)

Researchers are interested in the direction (positive or negative) and the magnitude (size) of the correlation coefficient. Correlations around .10 are considered small; those around .30 are considered medium; and those around .50 or greater are considered large (Cohen & Cohen, 1975).

directionality problem

One reason why correlations can never prove causality is known as the directionality problem. If A and B are correlated, we do not know if A is the cause of B, or if B is the cause of A, or if some third, unknown variable is causing both B and A.

third variable problem

One reason why correlations can never prove causality is the third variable problem. It could be that two variables are correlated because some third, unknown variable is causing both.

case study methods

In case studies, researchers examine the life of one person in particular depth. Case studies can give researchers insights into personality that can then be used to formulate a more general theory that is tested in a larger population. They can also provide in-depth knowledge of a particularly outstanding individual. Case studies can also be useful in studying rare phenomena, such as a person with a photographic memory — cases for which large samples would be difficult or impossible to obtain.

