Week 10: Replication, Transparency, and Real-World Importance

Summarise what this chapter covers?
Click the card to flip 👆
1 / 71
Terms in this set (71)
Replicable - describing a study whose results have been reproduced when the study was repeated, or replicated.

Replication is part of interrogating statistical validity: we ask about the size of the estimate (effect size), its precision (95% CI).

Replication gives a study credibility and is a crucial part of the scientific process.
A replication study in which researchers repeat the original study as closely as possible to see whether the original effect shows up in the newly collected data.

Study on deep talk & higher well-being = replicated the basic effect in five separate studies.

It cannot replicate the study to every detail (different sets of participants/ different timings).

Despite small variations, in a direct replication researchers try to reproduce the original experiment as closely as possible.
- If there were any threats to internal validity or flaws in construct validity in the original study, such threats would be repeated in the direct replication too.

-When successful, a direct replication confirms what we already learned, but it usually does not test the theory in a new context.

Due to these drawbacks, researchers value other types of replication as a supplement to direct replication.
A replication study in which researchers examine the same research question (the same conceptual variables) but use different procedures for operationalising the variables.

Alcohol & Aggression.

Study 1 = showed participants pictures of alcohol and tested their reaction times to aggression-related words.
Study 2 = Exposed people to advertisements vs control products.
A replication study in which researchers replicate their original study but add variables or conditions that test additional questions.

Shorthand & notetaking study.

-Study was replicated on the same factual and conceptual questions.
-But the researchers extended the study by adding two levels of the independent variable (no notes at all & ewriters.

The "replication" portion of this replication-plus-extension study did not replicate the original result. In addition, the two "extension" conditions differed only slightly from the rest.
Describe what is meant by 'one study, many labs'? Provide an example. What can happen as a result?ometimes multiple groups of scientists work together to conduct replication studies. For example, labs from around the world might agree to conduct a direct replication of one study, all following the same strict research protocol. As a result, a study can be replicated by several labs at one time.Describe what is meant by 'many labs, many studies'? Provide an example. What did this project use?Other replication projects coordinate many labs around the world to replicate a variety of studies. Open Science Collaboration - which selected 100 studies from three major psychology journals and recruited 100 labs around the world, assigning one lab to each study. This project used different metrics to judge whether a study was a successful replication.How does a replication project look in terms of combining, effect size, and the effect estimated by one lab? (many labs, many studies)In this replication project, 36 labs conducted direct replications of 16 studies and their results were combined. Each X represents an original study's effect size. Each small dot represents the effect estimated by one lab. The black square represents the average effect size combined from all 36 labs.When a study fails to replicate, what could the issue be? Even in direct replications, why might there be differences? What else might cause a failure to replicate? Expand more on where the issue might lie in original studies?It could be an issue with the replication study itself. Even in direct replications, there are differences in sample, materials, or geography. Problems with the original studies = failure to replicate. The original studies might have engaged in research practices that, by today's standards, were likely to lead to findings that are not real.Regarding literature, how does progress in psychological science/ replication occur? What are the most credible conclusions based on? Define scientific literature?Progress occurs as researchers successfully conduct systematic sets of direct replications, conceptual replications, and replication-plus-extension studies. A body of evidence. A series of related studies, conducted by various researchers, that have tested similar variables.What leads to a literature review? Outline the literature review approach? What is another approach which incorporates a quantitative technique?Researchers collecting all the studies on a topic and considering them together. One approach is simply to summarise the literature in a narrative way, describing what the studies typically show and explaining how the body of evidence supports a theory. Researchers can also use the quantitative technique of meta-analysis to create a mathematical summary of a scientific literature.What is a meta-analysis?Meta-analysis = a way of mathematically averaging the results of all the studies (both published and unpublished) that have tested the same variables to see what conclusion that whole body of evidence supports.In a meta-analysis, what do researchers do? Using meta-analysis, what can researchers also sort? From these follow-up analyses, what can researchers detect?-Researchers collect all possible examples of a particular kind of study. -Then they average all the effect sizes to find an overall effect size. Researchers can also sort the studies into categories (i.e., they can test moderators), computing separate effect size averages for each category. New patterns in the literature as well as test new questions.Regarding the data in meta-analysis, what can you be certain of and why? However, what might be an issue?Because meta-analyses usually contain data that have been published in empirical journals, you can be more certain that the data have been peer-reviewed, providing one check on their quality. Publication bias: Stronger relationships were more likely to be published than negligible effects.Publication bias could lead to the file drawer problem, what is this?A problem relating to literature reviews and meta-analyses based only on published literature, which might overestimate the support for a theory because studies finding null effects are less likely to be published than studies finding significant results, and are thus less likely to be included in such reviews.What should researchers do to combat the file drawer problem? Give an example of this being problematic?Researchers who are conducting a meta-analysis should follow the practice of contacting their colleagues (via social media groups and subscription lists), requesting both published and unpublished data for their project. Anti-depressants - more studies published showing effective effects, rather than the 50% which showed negative and questionable side effects.Are literature reviews and meta-analysis considered valuable by many psychologists? In summary, what do they provide? What is the main drawback of meta-analysis?Yes, because they assess the weight of the evidence in scientific literature. They tell you whether, across a number of studies, there is a relationship between two variables—and if so, how strong it is. However, a meta-analysis is only as powerful as the data that go into it.When will the meta-analysis be biased in conclusion? Therefore, what has been introduced?If researchers don't work hard to include unpublished results or if studies they do include have followed questionable research practices, the meta-analysis will reach a biased conclusion. New methods that try to ensure the studies included in a meta-analysis were well conducted.Can you count on the media to tell you which scientific studies are replicable? What will journalists sometimes do instead of valuing replicability? What do responsible journalists do?Not always, journalists do not always consider replicability when they report on science stories. Sometimes journalists will report on a single, hot-off-the-press study because it makes a splashy headline. -Report the latest study. -Give readers a sense of what the entire literature says on a particular topic. -Recognise that a study's importance is best judged in the context of the body of evidence.Which of Merton's norms of science encourage self-correcting? Unfortunately what might happen?Scientists should report their own data objectively and make their data public (communality), even when the results do not support their hypotheses (disinterestedness). Unfortunately, scientists can, even unintentionally, engage in questionable research practices that violate Merton's norms and make scientific progress less likely.Describe one questionable practice in scientific reporting? How might the amount of variables included differ from the variables showing a strong effect size? When does this become an issue? How does under reporting mislead people?The underreporting of null findings. Researchers normally include multiple dependent variables in an experiment and sometimes only one out of a dozen variables will show a strong effect. When the researcher reports only the strong effects, not the weak ones. Underreporting misleads people to think the evidence for a theory is stronger than it really is.Define HARKing? What might such findings be due to? And in response, what do careful scientists do? What practice aims to prevent HARKing?Hypothesising after the results are known - A questionable research practice in which researchers create an after-the-fact hypothesis about an unexpected research result, making it appear as if they predicted it all along. -Such findings may be due to chance and cannot be replicated. -Careful scientists replicate surprising findings in a new, independent study. The practice of preregistration aims to prevent HARKing, because researchers publish their target (the hypothesis) before they start to collect data.Why does HARKing mislead people?The practice of preregistration aims to prevent HARKing, because researchers publish their target (the hypothesis) before they start to collect data.What is p-hacking and give examples?A Family of Questionable Data Analysis Techniques: - Adding participants after the results are initially analysed. -Looking for outliers. - Trying new analyses in order to obtain a p value of just under .05, which can lead to nonreplicable results. -Compute scores several different ways. -Run a few different types of statistics.When is the practice of p-hacking misleading? What does research transparency help counter?The practice of p-hacking is misleading when others are not told about all the different ways the data were analysed and only the strongest version is reported. -Unintentional biases. -Helps scientists be more accountable to both themselves and the scientific community.Why is underreport null effects misleading? How do we counter this? Why does this counter this?Researchers mislead about the strength of the evidence by not reporting conditions or measures that did not support the hypothesis. Open materials, in which all study materials are reported publicly. Others can see the full study design and fairly evaluate the strength and consistency of the evidence.Why is p-hacking misleading? How do we counter this? Why does this counter this?Researchers try many ways of analysing their data, so the result is more likely to be a fluke rather than a true, replicable pattern. Open data, in which full data sets are provided. Others can re-run and confirm the statistical analyses. They can also use the data to test new question.Why is HARKing misleading? How do we counter this? Why does this counter this?The study reveals an unexpected result, but the researcher writes about the study as if the result had been predicted all along. Preregistration, in which researchers publish the hypothesis and study design before data collection and analysis begin. Others have more confidence in the strength of the evidence.Why is using small samples misleading? How do we counter this? Why does this counter this?In a small sample, a few chance values can influence the data set, so the study's estimate is imprecise and less replicable. Although it's not part of research transparency, larger samples are now required and encouraged. Studies with large sample sizes produce estimates that are more precise and replicable.What is open science? What is open data?The practice of sharing one's data, hypotheses, and materials freely so others can collaborate, use, and verify the results. When psychologists provide their full data set on the Internet so other researchers can reproduce the statistical results or even conduct new analyses on it.What is open materials? Define preregistered?When psychologists provide their study's full set of measures and manipulations on the Internet so others can see the full design or conduct replication studies. A term referring to a study in which, before collecting any data, the researcher has stated publicly what the study's outcome is expected to be.Certain journals treat preregistered studies more favourably, what do some even do? What does preregistration gives researchers credit for? What does transparent research practices make easier? (3 answers)Some even peer-review the proposal and promise to publish the results regardless of the outcome (these are called registered reports). The importance of the research question and the quality of the study design—not just for the results -Makes our job easier as a consumer of research. -Transparent research practices allow anyone to check out a study and evaluate its quality. -Helps to formalise Merton's norms - helps science self-correct and progress.What does replicability allow us to judge? Why is replicability an essential step? Which of the 4 big validities can replicability help us address?Asking about replicability is one way to judge a study's quality. Reproducing a study's results is an essential step that allows researchers to be more confident in the accuracy of their results. External validity.Which of the replication terms may not always support external validity? Which of the replication categories can? Why is this?Direct replication studies may not support external validity if they are conducted on the same population as the original. Conceptual replications and replication-plus-extension studies can. When researchers test their questions using slightly different methods, different kinds of participants, or different situations, or when they extend their research to study new variables, they are demonstrating how their results generalise to other populations and settings.What does more setting and populations in which a study is conducted lead to? To assess a study's generalisability, what would you ask? If a study is intended to generalise to some population, what must the researchers do? What about if they use a convenience sample?The more settings and populations in which a study is conducted, the better you can assess the generalisability of the findings. You would ask how the participants were obtained. If a study is intended to generalise to some population, the researchers must draw a probability sample from that population. If a study uses a convenience sample, you can't be sure of the study's generalisability to the population the researcher intends.What should you bear in mind regarding the distinction between the population and the population of interest? What matters more 'how' or 'how many?Bear in mind that the population to which researchers want to generalise usually is not the population of every living person. Instead, when researchers are generalising from a sample to a population, they will specify what the population of interest is. When you are assessing the generalisability of a sample to a population, "how" matters more than "how many."What do you need in order to generalise to any population? What would a convenience sample contain?A probability sample. If you had only a convenience sample, it might primarily contain those whom the researcher could contact easily and who were willing to participate in the study.What is the other aspect of external validity? Which of the subcategories of replications illustrates this aspect of external validity? What is ecological validity? (Also called mundane realism)A study's generalisability to different settings? Conceptual replication - shows how the results can generalise from one setting to another. The extent to which the tasks and manipulations of a study are similar to real-world contexts; an aspect of external validity.When the sample for a study has been selected from a population at random (using a probability sample), what can we do? However, what does evaluating the importance of external validity require? Whether a researcher strives for external validity in a study depends on?The results from that sample can be generalised to the population it was drawn from. A nuanced approach that considers the researcher's priorities. What research mode they are operating in: theory-testing mode or generalisation mode.When researchers work in theory-testing mode, what are they usually designing? Define theory-testing mode? Briefly describe the theory-data cycle? In theory-testing mode, what matters more external or internal validity?They are usually designing correlational or experimental research to investigate support for a theory. A researcher's intent for a study, testing association claims or causal claims to investigate support for a theory. The theory-data cycle is the process of designing studies to test a theory and using the data from the studies to reject, refine, or support the theory. In theory-testing mode, external validity often matters less than internal validity.Why was Harlow's contact theory clearly a testing theory mode?-He had a clear hypothesis - which theory was correct. -The results of his theory were really clear. -The theory was overwhelmingly supported. -He prioritised internal validity - can not extend to all the monkeys.Did Harlow's experiment achieve external validity? Why didn't Harlow care?-No, the monkeys were hardly representative of monkeys in the wild. -He didn't use a random sample of monkeys from his laboratory. Harlow didn't care because he was in testing-theory mode: He created the experiment to test his theory - not to test the truth in some population of monkeys.What two things should be kept in mind in the grammar experiment? (Another theory-test mode) What would they do if there was inconsistencies in the population? In either case, what must happen with this theory specifically?1. the researchers chose a strong test of their theory. 2. If the researchers had found some cultural group of parents who did not correct their children's grammar, they would have to modify the reinforcement theory to fit these new data. They'd have to change the theory to explain why reinforcement applies only in this population but not others. in either case, the data from Brown and Hanlon's study mean that the "parent-a-grammar-coach" theory must be rejected or at these modified.Overall, what are most studies in psychology? Summarise this process? Summarise what researchers are not concerned with?Majority = theory-testing type. Most researchers design studies that enable them to test competing explanations and confirm or disconfirm their hypotheses. When researchers are in theory-testing mode, they are not very concerned (at least not yet) with the external validity of their samples or proceduresDefine theory-testing mode? Define generalisation mode? Regarding generalisation mode, what are researchers concerned with?A researcher's intent for a study, testing association claims or causal claims to investigate support for a theory. The intent of researchers to generalise the findings from the samples and procedures in their study to other populations or contexts. The intent of researchers to generalise the findings from the samples and procedures in their study to other populations or contexts = concerned with external validity.What is an example of something done in generalisation mode? What is essential for supporting frequency claims? In turn, what does this mean?Survey research that is intended to support frequency claims is done in generalisation mode. Representative samples are essential for supporting frequency claims. Therefore, when researchers are testing frequency claims, they are always in generalization mode.Which modes are association and casual claims conducted in? Are they ever conducted in generalisation mode? Provide an example of research being shifted to generalisation mode?Most of the time, association and causal claims are conducted in theory-testing mode. But researchers sometimes conduct them in generalisation mode, too. If a therapeutic technique is effective in a sample, the researcher would then want to learn whether it will also be effective more generally, in other samplesWhat is cultural psychology? Which mode do cultural psychologists work in? Who have they challenged and how?A subdiscipline of psychology concerned with how cultural settings shape a person's thoughts, feelings, and behaviour, and how these in turn shape cultural settings. In conducting their studies, cultural psychologists work in generalisation mode. They have challenged researchers who work exclusively in theory-testing mode by identifying several theories that were supported by data in one cultural context but not in any other.Provide an example of cross-cultural results?The Müller-Lyer illusion - are the two vertical lines the same length? Almost all North Americans and Europeans fall for this illusion, but not all people do.Describe how the Müller-Lyer data illusion would become an issue in theory-testing mode? In summary what does cultural studies remind us?Theory-testing mode- they would use the data from a single North American sample to test a theory about human visual perception - not accounting for the fact that culture would affect such a basic cognitive process. Cultural studies like these remind other researchers they cannot take generalization for granted. Even psychological processes that seem basic and fundamental can be affected by cultural environments.To understand the importance of cultural psychologists' work, what should we consider? Regarding most participants being North American college students in psychology, what should we take into account? Researchers often refer to these participants as WEIRD, what does this stand for?That most research in psychological science has been conducted on North American college students. Participants from these countries are from a unique subset of the world's population = not representative of the all the world's people. WEIRD: Western, educated, industrialised, rich, and democratic.Summarise who theory-testing mode operates on and what they prioritise? In response, what do cultural psychologists do?When researchers in psychology operate in theory-testing mode, they do not prioritise external validity and they may even test their theories only on WEIRD participants. Cultural psychologists raise the alarm, reminding researchers that theories that have been tested only on WEIRD subjects may not apply to everyoneWhat is a field setting? What advantage does it have? What is ecological validity? Why might ecological validity be only one factor of generalisability?A real-world setting for a research study = it has a built-in advantage for external validity because it clearly applies to real-world contexts. Ecological validity is one aspect of external validity, referring to the extent to which a study's tasks and manipulations are similar to the kinds of situations participants might encounter in their everyday lives. Ecological validity is just one factor in generalisability because a study's setting may not represent all possible environments.Define experimental realism? Provide examples.The extent to which a laboratory experiment is designed so that participants experience authentic emotions, motivations, and behaviours. People in lab studies have interacted with actual people, drunk real alcoholic beverages, and played games with other participants = many of these situations are truly engaging and emotionally evocative.To what extent does the real-world similarity—the ecological validity—of a study affect your ideas about a study's importance? Because external validity is of primary importance when researchers are in generalisation mode, what might they strive for? What they also try and enhance? When a researcher is working in theory-testing mode, what may be lower priority?A nuanced answer will consider the mode in which it is conducted: generalisation mode or theory-testing mode. They might strive for a representative sample of a population. But they might also try to enhance the ecological validity of a study in order to ensure its generalisability to non-laboratory settings. Lower priority = external validity and real-world applicability.When a researcher is working in theory-testing mode, external validity and real-world applicability may be lower priorities. Provide an example.Johansson (1973) study, in which the model had lights attached to her head and her joints. Of course, no one is going to encounter these circumstances in the real world, but the ecological validity of the situation didn't matter to the researcher. He was testing a research question = he created an extremely artiifical situation.Was the Johansson's light study important? What about generalizability and real-world applicability? Summarise, the contribution made by this study?In terms of the theory he was testing, it was invaluable because he was able to show that both joint location and movement are necessary and sufficient to recognise a human form. On the one hand, both the model and the study's participants were drawn haphazardly, not randomly, from a small group of North American college students. The situation created in the lab would never occur in the real world. On the other hand, the ability to recognize human gestures, postures, and moods is undoubtedly essential for many aspects of social life. The theoretical understanding gained from this seemingly artificial study contributes to our understanding of a perceptual process involved in basic social cognition.What does the researcher go at great lengths to endure for the randomised, double-blind, and placebo-controlled? Why might this be seen as artificial?In such a study, an experimenter goes to extreme artificial lengths to assign people at random to carefully constructed experimental conditions that control for placebo effects, experimenter bias, and experimental demand. Such studies have virtually no equivalent in everyday, real-world settings, yet their results can be among the most valuable in psychological science.In short, was does theory-testing mode demand? What does theory-testing mode prioritise as the expense of? However, what does such studies make?Theory-testing mode often demands that experimenters create artificial situations that allow them to minimise distractions, eliminate alternative explanations, and isolate individual features of some situation. Theory-testing mode prioritises internal validity at the expense of all other considerations, including ecological validity. Nonetheless, such studies make valuable contributions to the field of psychology.A nuanced approach can be used to evaluate what? What must a study be? What does it not necessarily need to be?A study's credibility and importance. A study must be replicated to be deemed credible, but it might not need to be generalisable or have immediate real-world applicability to be important.Replication studies determine whether the findings of an original study are reproducible. Summarise the 3 types of replication?A direct replication repeats the original study exactly. A conceptual replication has the same conceptual variables as the original study but operationalises the variables differently. A replication-plus-extension study repeats the original study and introduces new participant variables, situations, or independent variable levels.What do replication projects coordinate? Define a meta-analysis? What does meta-analysis help to qualify?Replication projects coordinate labs around the world to conduct direct replication studies of between one and several psychological studies at a time. A meta-analysis collects and mathematically averages the effect sizes from all studies that have tested the same variables. It helps quantify whether an effect exists in the literature and, if so, its size and what moderates it.Psychologists have identified questionable research practices that include? What may these produce? What do many psychological scientists now promote to strength the verifiability and replicability of studies?Underreporting null results, p-hacking, and hypothesising after the results are known (HARKing). These may produce findings that cannot be replicated. They promote open data, open materials, and preregistration to strengthen the verifiability and replicability of studies.What does the importance of external validity depend on? What happens in a theory-testing mode? What happens in a generalisation mode?Whether researchers are operating in generalisation mode or theory-testing mode. In theory-testing mode, researchers design studies that test a theory, leaving the generalisation step for later studies, which will test whether the theory holds in a sample that is representative of another population. In generalization mode, researchers focus on whether their samples are representative, whether the data from their sample apply to the population of interest, and even whether the data might apply to a new population of interest.Will high quality research always have good external validity? Who is always in generalisation mode?High-quality research might not always have good external validity. In theory-testing mode, researchers do not (yet) consider whether their samples are representative of some population, so external validity is less important than internal validity. Researchers who make frequency claims are always in generalisation mode.Does research have to be conducted in a field setting to have strong ecological validity? How might data conducted from artificial settings be the best of both worlds?No, laboratory studies conducted in theory-testing mode might have strong experimental realism even if they do not resemble real-world situations outside the lab. Yet the data from such artificial settings help researchers test theories in the most internally valid way possible, and the results may still be important and apply to real-world circumstances.