Stats Comp G1
Terms in this set (91)
A statistic is for a characteristic of a sample, described by a mean or standard deviation. Usually symbolized by roman letters.
A limit or boundary or a characteristic or an element. A parameter is for a characteristic of a population, or of a distribution of scores, described by a statistic such as a mean or a standard deviation. Usually symbolized by Greek letters
The presumed cause in a study; a variable that can be used to predict or explain the values of another variable. A variable manipulated by an experimenter who predicts that the manipulation will have an effect on another variable (the dependent variable)
Another name for independent variable or cause. Often used when discussing non-experimental research designs such as correlational studies.
The presmed effect in a study; so called because it "depends" on another variable. The variable whose values are predicted by the independent variable, whether or not caused by it.
Another term for dependent variable, or the presumed effect in a study. The term is usually used for non-experimental studies. In such usage, the independent variable is called the predictor variable.
A variable that distinguishes among subjects by putting them into a limited number of categories, indicating type or kind, as "class" does by categorizing people into the lower, middle, and upper classes. Also called "discrete" or "nominal" variable.
A variable that can be expressed by a large (often infinite) number of measures. Loosely, a variale that can be measured on an interval or a ratio scale. Although all continuous variables are interval or ratio, all interval or ratio scales are not continuous, in the strict sense of the term.
True Experimental Variable
Variables are manipulated
Variables not manipulated
An extraneous variable that you do not wish to examine in your study; hence you control for it.
Another term for intervening variable; that is, a variable that "transmits" the effects of another variable.
A variable that influences ("moderates") the relation between two other variables and thus produces an interaction effect.
Measures of Central Tendency
Any of several statistical summaries that, in a single number, represent the typical number in a group of several numbers. Examples include the mean, mode, and median.
Average. To get the mean, you add up the values for each case and divide the total by the number of cases. Often symbolized as M or as X-bar
The middle score or measurement in a set of ranked scores or measurements; the point that divides a distribution into two equal halves. When the number of scores is even, there is no single middle score; in that case, the median is found by taking an average of the two middle scores.
The most common (most frequent) score in a set of scores.
Measures of Dispersion
Measures of variability; the range, the standard deviation, and the variance.
A statistic that shows the spread, variability, or dispersion of scores in a distribution of scores. It is a measure of the average amount the scores in a distribution deviate from the mean. The more widely the scores are spread out, the larger the standard deviation. The standard deviation is calculated by taking the square root of the variance. It is symbolized as SD, Sd, sd, s, or a lowercase sigma.
The extent to which the results of a study (usually an experiment) can be attributed to the treatments rather than to flaws in the research design. In other words, internal validity is the degree to which one can draw valid conclusions about the causal effects of one variable on another. It depends on the extent to which extraneous variables have been controlled by the researcher.
Threats to Internal Validity
1) History 2) Maturation 3) Testing 4) Instrumentation 5) Statistical Regression 6) Differential Selection 7) Experimental Mortality 8) Selection-Maturation Interaction 9) Experimental Treatment Diffusion 10) Compensatory Rivalry by Control Group- John Henry Effect 11) Compensatory Equalization Diffusion 12) Resentful Demoralization of the Control Group
An event that intervenes in the course of one's research and makes it difficult, if not impossible, to interpret the relations among independent and dependent variables.
A threat to validity that occurs because of change in subjects over time.
The effects of taking a test on a subsequent performance
Effects of changing measuring instruments or procedures
A tendency for those who score high on any measure to get somewhat lower scores on a subsequent measure of the same thing - or, conversely, for someone who has scored very low on some measure to get a somewhat higher score the next time the same thing is measured. Also called regressing toward the mean because the second score is likely to move toward or be closer to the mean or average score.
bias in the assignment of subjects to experimental and control groups; threat to internal validity.
Losing subjects over the course of the research project. Also called "mortality." Attrition may be a source of bias if the subjects who are lost make the sample less representative of the population.
Diffusion of Treatment
A threat to the validity of a study arising from communication among the subjects, in particular, when the communication results in the experimental treatment being spread ("diffused") among control group subjects.
Reaction of Controls
Awareness of being in a study may affect behavior
Are responses relevant to those who are not in a study
The extent to which the findings of a study are relevant to subjects and settings beyond those in the study. Another term for generalizability.
Threats to External Validity
Pretest treatment interaction, Multiple treatment interference, Selection treatment interaction, Specificty of variables, Experimenter effects, Reactive arrangments.
A group of subjects from a larger group in the hope that studying this smaller group (the sample) will reveal important things about the larger group (the population).
Stimulus Characteristics and Settings
ie computer v real life
Reactivity of Experimental Arrangements
People behave differently when studied
The effects of one treatment on a subject's behavior being confounding by the influence of another treatment administered in the same study
Participants may respond unusually well to a novel innovation or unusually poorly to one that disrupts their routine, a response that must then be included as part of the treatment construct description.
Reactivity of Assessment
Similar to experimental arrangements, but focuses on awareness of what the measures are tapping
heightened awareness level in participants induced by measurement procedures
Timing of Measurement
the shorter the time between the attitude measurement and observed behavior, the stronger the link
Statistical Conclusion Validity
The accuracy of conclusions about covariation made on the basis of statistical evidence. More specifically, inferences about whether it is reasonable to conclude that covariation exists given a particular alpha level and given the variances obtained in the study.
Threats to Statistical Conclusion Validity
Problems that can lead to false conclusions. The term was introduced by D.T. Campbell and J.C. Stanley to refer to the characteristics of various research methods and designs that can lead to spurious or misleading conclusions. Discussions of threats to validity often lead researchers to recommend using more than one method. Because different kinds of research designs are open to different kinds of threats, you can reduce the risk of error by using two or more methods.
Low Statistical Power
Power refers to the probability of detecting a true relationship when one exists. High power, high chance of detecting a true difference. Low power, poor chance. Usually related to having to small of a sample. Usually concludes no relationship. (There is too much static to hear the program).
Variability in the Procedures
The spread or dispersion of scores in a group of scores; the tendency of each score to be unlike the others. More formally, the extent to which scores in a distribution deviate from a central tendency of the distribution, such as the mean. The standard deviation and the variance are two of the most commonly used measures of variability.
Unreliability of Measures
Inconsistency in scales obscure results. "Are we detecting statistical difference or inconsistent measures?"
Multiple Comparisons and Error Rates
Usually called post hoc comparisons. Looking among the possible comparisons, in a factorial or ANOVA design, trying to find some significant difference. Considered poor practice in some circumstances.
The extent to which variables accurately measure the constructs of interest. In other words, how well are the variables operationalized? Do the operations really get at the things we are trying to measure? How well can one generalize from operations to constructs? In practice, construct validity is used to describe a scale, index, or other measure of a variable that correlates with measures of other variables in ways that are predicted by, or make sense according to, a theory of how the variables are related.
Threats to Construct Validity
loose connection between the theory and method, ambiguous effect of independent variable
Attention and Contact with Clients
Threat to construct validity; difficult to tell whether the change is due to the technique or contact w/ the therapist
Single Operations and Narrow Stimulus Sampling
Threat to construct validity; Is the effect due to the selected IV
Does other aspects of the intervention have an effect beyond the aspect identified by the experimenter
A type of confounding effect that occurs when different experimenters working on the same experiment administer different treatments or conditions. Also, any bias introduced by experimenters' expectations.
Cues of the Experimental Situation
Any of the numerous potential cues available to subjects in experimental research, regarding the nature and purpose of the study, that might influence the subjects' reactions to the experimental treatment.
Logical or conceptual validity; so called because it is a form of validity determined by whether, on the face of it, a measure seems to make sense. In determining face validity, one often asks expert judges whether the measure seems to them to be valid.
Parametric Statistical Tests
Z test, One sample/independent/before-after t, One-way/factorial/within subjects ANOVA, MANOVA, pearson/semi-partial/multiple correlation, linear regression
Assumptions of Parametric Tests
-Data randomly sampled and independent*
-Data measured on ratio/interval scale
-Data normally distributed for each group
(residuals in ANOVA/regression)
-For questions regarding means, data variances among groups (residuals in ANOVA/regression) homogeneous
Non-parametric Statistical Tests
Mann-Whitney U, Wilcoxon T, Kruskal-Wallis, Friedman's ANOVA
Assumptions of Non-Parametric Tests
Data randomly sampled and independent
A research procedure that compares different subjects; each score in the study comes from a different subject.
A before-and-after study or a study of the same subjects given different treatments. A research design that pretests and posttests within the same group of subjects; that is, one which uses no control group.
A research design in which subjects are measured two or more times on the dependent variable. Rather than using different subjects for each level of treatment, the subjects are given more than one treatment and are measured after each. This means that each subject will be its own control.
Factorial designs in which the number of levels of the factors is not the same for all factors. Factorial multiple regression analyses that combine the repeated measures and one-time measures.
In a within-subjects factorial experiment, presenting conditions in all possible orders to avoid order effects.
Broadly, any of several measures of association or of the strength of a relation, such as Pearson's r or eta. Often is thought of as a measure of practical significance.
Usually called Cronbach's alpha to distinguish it from the alpha in alpha level. It is a measure of internal reliability of the items in an index. Cronbach's alpha ranges from 0 to 1.0 and indicates the extent to which the items in an index measure the same thing.
The ability of a technique, such as a statistical test, to detect relationships. Specifically, the probability of rejecting a null hypothesis when it is false and therefore should be rejected. The power of a test is calculated by subtracting the probability of a Type II error from 1.0. The maximum total power a test can have is 1.0, and the minimum is 0; 0.8 is often considered an acceptable level for a particular test in a particular study.
AKA Cohen's kappa. A measure of interrater reliability for categorical data. This percentage-of-agreement measure corrects for chance or random agreement. Kappa is 1.0 when agreement is perfect; it is 0.0 when agreement is no better than would be expected by chance.
A statistic, usually symbolized as r, showing the degree of linear relationship between two variables that have been measured on interval or ratio scales, such as the relationship between height in inches and weight in pounds.
A statistic that shows the degree of monotonic relationship between two variables that are arranged in rank order (measured on an ordinal scale).
Biserial Correlation Coefficient
A correlation coefficient computed between a dichotomous and a continuous variable. The dichotomous variable actually is an interval-level variable, but one that has been collapsed to only two levels (such as high and low). The biserial correlation provides an estimate of what the correlation would have been if the collapsed dichotomous variable had been left as a continuous variable. The estimate usually is high.
Point Biserial Correlation Coefficient
A type of correlation to measure the association between two variables, one of which is dichotomous and the other continuous.
A type of correlation or measure of association between two variables used when both are categorical and one or both are dichotomous. Phi is a symmetric measure. It is based on the chi-square statistic. To calculate, divide the chi-square by the sample size and take the square root of the result.
Analysis of Variance (ANOVA)
A test of statistical significance of the differences among the mean scores of two or more groups on one or more variables or factors. It is an extension of the t test, which can handle only 2 groups, to a larger number of groups. More specifically, it is used for assessing the statistical significance of the relationship between categorical independent variables and a continuous dependent variable. The procedure in ANOVA involves computing a ratio (F ratio) of the variance between the groups.
A test of statistical significance, often of the difference between two group means, such as the average score on a manual dexterity test of those who have and have not been given caffeine. Also used as a test statistic for correlation and regression coefficients.
Analysis of Covariance (ANCOVA)
An extension of ANOVA that provides a way of statistically controlling the linear effects of variables one does not want to examine in a study. These extraneous variables are called covariates, or control variables. Covariates should be measured on an interval or ratio scale. ANCOVA allows you to remove covariates from the list of possible explanations of variance in the dependent variable.
Shifting multiples factors to evaluate system performance
Multiple Analysis of Variance
compares the effects of several nonmetric independent variables on the mean or means of one or more metric dependent variables
Any of several statistical techniques concerned with predicting some variables by knowing others. Regression is used to answer such questions as "How well can I predict the values of one variable, by knowing the values of another variable?"
Stepwise Multiple Regression
A technique for calculating a regression equation that instructs a computer to find the best equation by entering independent variables in various combinations and orders. Stepwise regression combines the methods of backward elimination and forward selection. The variables are in turn subject first to the inclusion criteria of forward selection and then to the exclusion procedures of backward elimination. Variables are selected and eliminated until there are none left that meet the criteria for removal.
A kind of regression analysis often used when the dependent variable is dichotomous and scored 0 or 1. It is usually used for predicting whether something will happen or not, such as graduation, business failure, or heart attack-anything that can be expressed as event/non-event. Independent variables may be categorical or continuous in logistic regression analysis.
Type I Error
An error made by wrongly rejecting a true null hypothesis. This might involve incorrectly concluding that two variables are related when they are not, or wrongly deciding that a sample statistic exceeds the value that would by expected by chance. Also called alpha error.
Type II Error
An error made by wrongly accepting (or retaining or failing to reject) a false null hypothesis. Also called beta error.
A statistical analysis that is chosen after the experimental data have been collected, not as a part of the original design of the experiment.
A Priori Analysis
Used to describe pre-existing (prior) conditions among groups of subjects, especially potential confounding variables. Said of conclusions reached on the basis of reasoning from self-evident propositions-without or before examining facts-or of research that proceeds in a deductive way. Loosely, theoretical.
Inaccuracy resulting from flaws in a measuring instrument-as contrasted with other sorts of error, or unexplained variance.
A theoretical continuous probability distribution in which the horizonal axis represents all possible values of a variable and the vertical axis represents the probability of those values occurring. The scores on the variable are clustered around the mean in a symmetrical, unimodal pattern known as the bell-shaped curve or normal curve. Mean, median, and mode are all the same. There are many different normal distributions, one for every possible combination of mean and standard deviation.
(statistics) any of the 99 numbered points that divide an ordered set of scores into 100 parts each of which contains one-hundredth of the total
A test statistic for categorical data. Chi-squared as a test statistic is used to test independence, but the chi-square test also is used as a goodness-of-fit test. The chi-square test statistic can be converted into one of several measures of association, including the phi coefficient, the contingency coefficient, and Cramer's V.
Homogeneity of Variance
An assumption that the populations from which two or more samples have been drawn have equal variances. If this assumption is not true, test statistics may be inaccurate. Also called equality of variance assumption.
Central Limit Theorem
A statistical proposition to the effect that the larger a sample size, the more closely the sampling distribution of the mean will approach a normal distribution. This is true even if the population from which the sample is drawn is not normally distributed. A sample size of 30 or more usually will result in a sampling distribution of the mean that is very close to a normal distribution.
A method for testing the statistical significance of multiple comparisons. It involves adjusting the significance level needed to reject the null hypothesis by dividing the alpha level you want to use by the number of comparisons you are making. Helps the researcher avoid the increased risk of Type I error that comes with multiple comparisons.