Upgrade to remove ads
Adventures in Statistics
Terms in this set (277)
The probability of making a Type I error (usually this value is 0.05)
The horizontal axis of a graph. Also known as the x-axis.
The probability of the union of mutually exclusive outcomes can be ascertained by adding the probability of the individual outcomes. If two outcomes A and B are mutually exclusive, then the probability of A or B occurring is the sum of their individual probabilities:
The prediction that there will be an effect (i.e., that your experimental manipulation will have some effect or that certain variables will relate to each other)
An alternative name for the 'mean'
An alternative name for the 'mean'
The probability of making a Type II error (Cohen⁵⁰ suggests a maximum value of 0.2)
A graph in which a summary statistic (usually the mean) is plotted on the y-axis against a categorical variable on the x-axis (this categorical variable could represent, for example, groups of people, different times or different experimental conditions). The value of the mean for each category is shown by a bar. Different-coloured bars may be used to represent levels of a second categorical variable.
The ratio of the probability of the observed data given the alternative hypothesis to the probability of the observed data given the null hypothesis. It is the likelihood of the alternative hypothesis relative to the null. Values between 1 and 3 are considered evidence for the alternative hypothesis that is 'barely worth mentioning', values between 3 and 10 are considered 'substantial evidence', and values greater than 10 are strong evidence.
A mathematical description of the relationship between the conditional probability of events A and B, p(A|B), their reverse conditional probability, p(B|A), and individual probabilities of the events, p(A) and p(B). The theorem states that (see left).
In probability, a Bernoulli trial is a random experiment that has two possible outcomes. Tossing a (fair) coin is a Bernoulli trial: it will land either heads up or tails up.
Another name for 'independent design'
Another name for 'independent design'
Bias corrected and accelerated (BCa) confidence interval
A variant of the percentile bootstrap confidence interval that is adjusted for skewness and to be median unbiased. In general, it has better coverage (i.e., more often contains the true value being estimated) than the percentile confidence interval.
A statistic taken from a random sample that does not equal its corresponding population parameter; for example, a sample mean that is not equal to the population mean. See also 'unbiased estimator'.
A description of a distribution of observations that has two modes
A categorical variable that has exactly two mutually exclusive categories (e.g., being dead or alive) - see also 'dichotomous variable'
A standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous. The biserial correlation coefficient is used when one variable is a continuous dichotomy (e.g., has an underlying continuum between the categories).
A correlation between two variables
A correction applied to the α-level to control the overall Type I error rate when multiple significance tests are carried out. Each test conducted should use a criterion of significance of the α-level (normally 0.05) divided by the number of tests conducted. This is a simple but effective correction, but tends to be too strict when lots of tests are performed.
A technique for estimating the sampling distribution of a statistic of interest (e.g., the mean or the b coefficient) by taking repeated samples (with replacement) from the data set. The standard error of the statistic is estimated as the standard deviation of the sampling distribution created from the bootstrap samples. From this, confidence intervals and significance tests can be computed.
A bootstrap sample is constructed by randomly selecting scores from an observed sample of scores and replacing them (so that the score is available for selection again) until the bootstrap sample contains as many scores as the original sample.
Refers to the possibility that performance in tasks may be influenced (the assumption is a negative influence) by boredom/lack of concentration if there are many tasks, or the task goes on for a long period of time.
Boxplot (or 'box whisker diagram')
A graphical representation of some important characteristics of a set of observations. At the centre is the median, which is surrounded by a box the top and bottom of which are the limits within which the middle 50% of observations fall (the interquartile range). Sticking out of the top and bottom of the box are two whiskers, which extend to the smallest and largest extreme scores, excluding scores considered to be outliers.
Box whisker diagram
Any variable made up of categories of objects/entities. The degree you are studying is a categorical variable: you can major in psychology, biology, medicine, or maths, but you can't major in more than one of them. We can compare people who major in maths with those who major in psychology, for example.
Central limit theorem
States that when samples are large the sampling distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was drawn. For small samples the t-distribution better approximates the shape of the sampling distribution. The standard deviation of the sampling distribution will be equal to the standard deviation of the sample (s) divided by the square root of the sample size (N).
A generic term describing a parameter or statistic that represents the centre of a frequency distribution of observations as measured by the mean, mode, and median
Information displayed as a diagram, graph, or table. The word is often used synonymously with graph, although the term chart encompasses a wider range of information displays than just graphs.
Superfluous material that distracts from the data being displayed on a graph
A probability distribution of the sum of squares of several normally distributed variables. It tends to be used to test hypotheses about categorical data, and the fit of models to the observed data.
Although this term can apply to any test statistic having a chi-square distribution, it generally refers to Pearson's chi-square test of the independence of two categorical variables. Essentially, it tests whether two categorical variables forming a contingency table are associated.
When data along an interval or ratio scale of measurement are grouped by dividing the scale into equal portions, these portions are known asclasses. For example, a scale ranging from 0 to 10 might be divided into five equal classes of 0-2, 2-4, 4-6, 6-8, and 8-10, or two equal class intervals of 0-5 and 5-10.
Class interval width
When data along an interval or ratio scale of measurement are grouped into equal class intervals, the class interval width is the distance from the smallest to the largest value within the interval.
Theoretical probability of an event. For a given trial or set of trials, the classical probability of an event, assuming that all outcomes are equally likely, is the frequency of an event divided by the sample space, or total number of possible outcomes. Compare with 'Empirical probability'.
Converting a variable measured on a nominal scale (e.g., a categorical variable) to a numerical variable. This conversion is achieved by assigning numbers to each category; for example, instead of using Male and Female as response outcomes for the sex of a person, you could use 0 and 1 (e.g., 0 = Male, 1 = Female).
Coefficient of determination
The proportion of variance in one variable explained by a second variable. It is the Pearson correlation coefficient squared.
In probability theory the probabilities for outcomes are complementary if they sum to 1. For example, the probability that you are bored and the probability that you are not bored are complementary, because if you are bored then, by definition, you can't also be not bored, so the probabilities of these outcomes will sum to 1.
A form of criterion validity where there is evidence that scores from an instrument correspond to concurrently recorded external measures conceptually related to the measured construct.
The probability of an outcome given that some other outcome has already happened. For example, the probability that you are bored given that you have read this flashcard is a conditional probability, p(boredom|read flashcard).
For a given statistic calculated for a sample of observations, the confidence interval is a range of values around that statistic that are believed to contain, in a certain proportion of samples, the true value of that statistic. What that also means is that for the other proportion of samples, the confidence interval won't contain that true value. The trouble is, you don't know which category your particular sample falls into.
A variable (that we may or may not have measured) other than the predictor variables in which we're interested that potentially affects an outcome variable
Something that cannot be measured directly but is indicated by things that can be measured directly. For example, intelligence is a construct; although we cannot measure it directly, intelligent people tend to do well on IQ tests and don't generally drink their own urine or fall into holes in the road.
Contaminated normal distribution
See 'mixed normal distribution'
Evidence that the content of a test corresponds to the content of the construct it was designed to cover.
A table representing the cross-classification of two or more categorical variables. The levels of each variable are arranged in a grid, and the number of observations falling into each category is noted in the cells of the table.
A variable that can be measured to any level of precision. ( Time is a continuous variable, because there is in principle no limit on how finely it could be measured.)
A measure of the strength of association or relationship between two variables. See 'Pearson's correlation coefficient', 'Spearman's rho', 'Kendall's tau'.
A form of research in which you observe what naturally goes on in the world without directly interfering with it. This term implies that data will be analysed so as to look at relationships between naturally occurring variables rather than making statements about cause and effect.
A process of systematically varying the order in which experimental conditions are conducted. In the simplest case of there being two conditions (A and B), counterbalancing simply implies that half of the participants complete condition A followed by condition B, while the remainder do condition B followed by condition A. The aim is to remove systematic bias caused by practice effects or boredom effects. See 'Latin square design'.
A measure of the 'average' relationship between two variables. It is the average cross-product deviation (i.e., the cross-product divided by one less than the number of observations)
In Bayesian statistics a credible interval is an interval within which a certain percentage of the posterior distribution falls (usually 95%). It can be used to express the limits within which a parameter falls with a fixed probability.
Evidence that scores from an instrument correspond with ( concurrent validity ) or predict ( predictive validity ) external measures conceptually related to the measured construct
A measure of the total relationship between two variables. It is the deviation of one variable from its mean multiplied by the other variable's deviation from its mean.
A form of research in which you observe what naturally goes on in the world without directly interfering with it by measuring several variables at a single time point. In psychology, this term usually implies that data come from people at different age points with different people representing each age point. See also 'correlational research', 'longitudinal research' .
The frequency of a category in a series of ordered categories along a scale of measurement expressed as the total of the current category and all those preceding it. The categories can be in either descending or ascending order.
The frequency of a category in a series of ordered categories along a scale of measurement in the current category and all those preceding it expressed as the percentage of the total number of scores. The categories can be in either descending or ascending order.
Degrees of freedom
The number of scores that are free to vary when estimating some kind of statistical parameter. It has a bearing on significance tests for many commonly used test statistics (such as the F-ratio, t-test, chi-square statistic) and determines the exact form of the probability distribution for these test statistics.
The relative likelihood (or probability) of a given score on a scale of measurement occurring
Another name for the paired-samples t-test (and also the matched-pairs t-test)
Another name for outcome variable. This name is usually associated with experimental methodology (which is the only time it really makes sense), and it is so called because it is the variable that is not manipulated by the experimenter and so its value depends on the variables that have been manipulated.
Procedures for summarizing or organizing data collected from a sample
The difference between the observed value of a variable and the value of that variable predicted by a statistical model
Description of a variable that consists of only two categories (e.g., gender is dichotomous because it consists of only two categories: male and female) - see 'binary variable'
A qualitative method that operates on the assumption that by studying what we say (and how we interact) we can gain access to real-life processes. The starting point for a discourse analysis could be a transcribed individual interview (which has the advantage of control) or a group discussion (which has the advantage that you can look at natural interactions).
A variable that can only take on certain values (usually whole numbers) on the scale. For example, the number of boyfriends you have had ought to be discrete (unless you're a psychopath, you should not have had 3.7 boyfriends).
See 'mutually exclusive'
A way of recoding a categorical variable with two or more categories into one or more variables that are dichotomous with values of only 0 or 1. There are seven steps in creating such variables. (1) Count the number of groups you want to recode and subtract 1. (2) Create as many new variables as the value you calculated in step 1 (these are called dummy variables). (3) Choose one of your groups as a baseline (i.e., a group against which all other groups should be compared, such as a control group). (4) Assign that baseline group values of 0 for every dummy variable. (5) For the first dummy variable, assign the value 1 to the first group that you want to compare against the baseline group (if relevant, assign all other groups 0 for this variable). (6) If you have a second dummy variable assign the value 1 to the second group that you want to compare against the baseline group (assign all other groups 0 for this variable). (7) Repeat step 6 until you run out of dummy variables.
Tests for serial correlations between errors in regression models. It tests whether adjacent residuals are correlated, which is useful in assessing the assumption of independent errors. The test statistic can vary between 0 and 4, with a value of 2 meaning that the residuals are uncorrelated. A value greater than 2 indicates a negative correlation between adjacent residuals, whereas a value below 2 indicates a positive correlation.
Another term for 'relative frequency'. The empirical probability is the probability of an event based on the observation of many trials. Like classical probability, it is the frequency of an event divided by the sample space, but the frequency and sample space are determined by actual observations.
A question that can be answered through collecting data such as by observation and/or experimentation
A vertical line protruding from the end of a bar (or line) on a bar graph (or line graph) that represents the precision of the value being displayed by the bar (or line). Typically they are used when the bar (or line) represents the mean, and the error bar will represent the precision of the mean.
Eta squared (η²)
An effect size measure that is the ratio of the model sum of squares to the total sum of squares. In essence, the coefficient of determination by another name.
In probability theory, an event is a subset of the sample space to which you can assign a probability. It consists of one or more outcomes of a trial .
In Bayes' theorem, evidence is another name for the marginal likelihood
In probability theory, an experiment is a procedure that has a well-defined set of outcomes and can be repeated over several trials .
A procedure to test a hypothesis. Ideally, an experiment should provide a comparison of conditions in which a proposed cause is present with a comparable condition in which the cause is absent, while controlling for all other variables that might influence the effect of interest.
Synonym for alternative hypothesis
A research method in which one or more variables is systematically manipulated to see their effect (alone or in combination) on an outcome variable. This term implies that data will be able to be used to make statements about cause and effect. Compare with cross-sectional research and correlational research.
Experimentwise error rate
The probability of making a Type I error in an experiment involving one or more statistical comparisons when the null hypothesis is true in each case
A test statistic with a known probability distribution (the F-distribution). It is the ratio of the average variability in the data that a given model can explain to the average variability unexplained by that same model. It is used to test the overall fit of a linear model.
An experimental design incorporating two or more categorical predictors (or independent variables)
Familywise error rate
The probability of making a Type I error in any family of tests when the null hypothesis is true in each case. The family of tests can be loosely defined as a set of tests conducted on the same data set and addressing the same empirical question.
Fisher's exact test
Fisher's exact test⁹³ is not so much a test as a way of computing the exact probability of a statistic. It was designed originally to overcome the problem that with small samples the sampling distribution of the chi-square statistic deviates substantially from a chi-square distribution. It should be used with small samples.
The degree to which a statistical model is an accurate representation of some observed data
Forced entry regression
A method of multiple regression in which predictor variables are entered into the model, and their regression coefficients estimated, simultaneously
An experimental design incorporating four categorical predictors (or independent variables)
The number of times that a score, range of scores or category occurs
A graph or table showing the categories along a scale of measurement alongside how many scores within a data set fall into each category. See also 'histogram'.
A graph with ascending values of observations on the horizontal axis, and points displaying the frequency (on the vertical axis) with which each value occurs in the data set. The dots are connected with straight lines to form a polygon. See also 'histogram'.
See 'Likelihood ratio test'
The ability of a statistical model to say something beyond the set of observations that spawned it. If a model generalises it is assumed that predictions from that model can be applied not just to the sample on which it is based, but also to a wider population from which the sample came.
A visual display of data depicting the values or a summary of one or more variables. Typically, two axes at right angles to each other are used to quantify values of the variables.
An estimate of the departure from sphericity. The maximum value is 1 (the data completely meet the assumption of sphericity). Values below 1 indicate departures from sphericity and are used to correct the degrees of freedom associated with the corresponding F-ratios by multiplying them by the value of the estimate.
A qualitative method in which the analysis of qualitative data informs the development of a theory rather than vice versa
Grouped frequency distribution
A frequency distribution in which categories on the scale of measurement are grouped using class intervals
Heterogeneity of variance
The opposite of homogeneity of variance. This term means that the variance of one variable varies (i.e., is different) across levels of another variable.
The opposite of homoscedasticity. This occurs when the residuals at each level of the predictor variables(s) have unequal variances. Put another way, at each point along any predictor variable, the spread of residuals is different.
A method of multiple regression where the order in which predictors are entered into the regression model is determined by the researcher based on previous research: variables already known to be predictors are entered first, new variables are entered subsequently.
A graph with ascending values of observations on the horizontal axis, and vertical bars rising up to the frequency (on the vertical axis) with which each value occurs in the data set. See also 'frequency polygon'.
Homogeneity of variance
The assumption that the variance of one variable is stable (i.e., relatively similar) at all levels of another variable
An assumption in regression analysis that the residuals at each level of the predictor variables(s) have similar variances. Put another way, at each point along any predictor variable, the spread of residuals should be fairly constant.
An estimate of the departure from sphericity. The maximum value is 1(the data completely meet the assumption of sphericity). Values below this indicate departures from sphericity and are used to correct the degrees of freedom associated with the corresponding F-ratios by multiplying them by the value of the estimate.
A proposed explanation for a fairly narrow phenomenon or set of observations. It is not a guess, but an informed, theory-driven attempt to explain what has been observed. A hypothesis cannot be tested directly but must first be operationalized as predictions about variables that can be measured.
An experimental design in which different treatment conditions utilize different entities (in psychology, this would mean using different people in different treatment conditions) and so the resulting data are independent. Also known as between-groups or between-subjects design.
An assumption in ordinary least squares regression that says that for any two observations the residuals should be uncorrelated (or independent).
A test using the t-statistic that establishes whether two means collected from independent samples differ significantly
Another name for a 'predictor variable'. This name is usually associated with experimental methodology, and it is so called because it is the variable that is manipulated by the experimenter and so its value does not depend on any other variables (just on the experimenter).
Statistical procedures for generalizing findings based on data from a sample to the population from which that sample came
The combined effect of two or more predictor variables on an outcome variable
A graph showing the means of two or more independent variables in which means of one variable are shown at different levels of the other variable. Usually the means are connected with lines, or are displayed as bars. These graphs are used to help understand interaction effects.
The limits within which the middle 50% of an ordered set of observations falls. It is the difference between the values of the upper quartile and lower quartile.
The probability of two or more outcomes occurring simultaneously is the probability of the intersection. The probability of the intersection of two outcomes A and B would be denoted as p(A∩B), meaning the probability of A and B occurring. Compare with 'union'.
Using a range of values to estimate the likely value of an unknown quantity. See 'confidence interval'.
A scale of ordered categories along the whole of which intervals are equal.
A correlation coefficient similar to Spearman's rho, but preferred for small data sets with a large number of tied ranks
A test of whether a distribution of scores is significantly different from a normal distribution. A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.
This measures the degree to which scores cluster in the tails of a frequency distribution.
Latin square design
Method for counterbalancing tasks using an n by n grid of symbols where every symbol appears exactly once in every row and column.
A distribution with positive kurtosis (leptokurtic, kurtosis > 0) has too many scores in the tails and is too peaked
Levels of measurement
The relationship between what is being measured and the numbers obtained on a scale
Tests the hypothesis that the variances in different groups are equal. It basically does a one-way ANOVA on the deviations. A significant result indicates that the variances are significantly different - therefore, the assumption of homogeneity of variances has been violated.
When using Bayes' theorem to test a hypothesis, the likelihood is the probability that the observed data could be produced given the hypothesis or model being considered, p (data|model). It is the inverse conditional probability of the posterior probability. See also 'marginal likelihood'.
A model that is based upon a straight line
A graph in which a summary statistic (usually the mean) is plotted on the y-axis against a categorical variable on the x-axis. The value of the mean for each category is shown by a symbol, and means across categories are connected by a line.
A form of research in which you observe what naturally goes on in the world without directly interfering with it, by measuring several variables at multiple time points. See also 'correlational research', 'cross-sectional research'.
The value that cuts off the lowest 25% of the data. If the data are ordered and then divided into two halves at the median, then the lower quartile is the median of the lower half of the scores. It is also known as the first quartile.
A robust measure of location. One example is the median. In some cases it is a measure of location computed after outliers have been removed; unlike a trimmed mean, the amount of trimming used to remove outliers is determined empirically.
Marginal likelihood (evidence)
When using Bayes' theorem to test a hypothesis, the marginal likelihood (sometimes called evidence) is the probability of the observed data, p(data). See also 'likelihood'.
Another name for the paired-samples t-test (and also the dependent t-test)
A test of the assumption of sphericity. If this test is significant then the assumption of sphericity has not been met and an appropriate correction should be applied to the degrees of freedom of the F-ratio in repeated-measures ANOVA. The test works by comparing the variance-covariance matrix of the data to an identity matrix; if the variance-covariance matrix is a scalar multiple of an identity matrix then sphericity is met.
A simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the typical score. It is the sum of observed scores divided by the number of observed scores.
A measure of average variability. For every sum of squares (which measure the total variability) it is possible to create mean squares by dividing by the number of things used to calculate the sum of squares (or some function of it).
The discrepancy between the numbers used to represent the thing that we're measuring and the actual value of the thing we'Re measuring (i.e., the value we would get if we could measure it directly).
The middle score of a set of ordered observations. When there is an even number of observations the median is the average of the two scores that fall either side of what would be the middle value.
Statistical procedure for assimilating research findings based on the idea that we can take effect sizes from individual studies that research the same question, quantify the observed effect in a standard way and then combine these effects to get a more accurate idea of the true effect in the population.
Method of least squares
A method of estimating parameters (such as the mean , or a regression coefficient) that is based on minimizing the sum of squared errors. The parameter estimate will be the value, out of all of those possible, which has the smallest sum of squared errors.
Mixed normal distribution
A normal-looking distribution that is contaminated by a small proportion of scores from a different distribution. These distributions are not normal and have too many scores in the tails. The effect is to inflate the estimate of the population variance. This, in turn, makes significance tests lack power.
The most frequently occurring score in a set of observations
Moderation occurs when the relationship between two variables changes as a function of a third variable. For example, the relationship between watching horror films (predictor) and feeling scared at bedtime (outcome) might increase as a function of how vivid an imagination a person has (moderator).
A situation in which two or more variables are very closely linearly related
Multilevel linear model
A linear model in which the hierarchical structure of the data is explicitly considered. In this analysis regression parameters can be fixed but also random. This means that for each regression parameter there is a fixed component but also an estimate of how much the parameter varies across contexts.
Description of a distribution of observations that has more than two modes
A linear model in which an outcome is predicted by a linear combination of two or more predictor variables. The outcome is denoted as Y, and each predictor is denoted as X. Each predictor has a regression coefficient, or parameter, b, associated with it, and b₀ is the value of the outcome when all predictors are zero.
In probability, outcomes are mutually exclusive or disjoint if they cannot co-occur. In other words, if the probability of their intersection is zero: P(A∩B)=0
A distribution with positive kurtosis (leptokurtic, kurtosis > 0) has too many scores in the tails and is too peaked, whereas a distribution with negative kurtosis (platykurtic, kurtosis < 0) has too few scores in the tails and is quite flat.
Common abbreviation for 'null hypothesis significance testing'
A scale on which numbers represent names. For example, the numbers on sports players shirts: a player with the number 1 on her back is not necessarily worse than a player with a 2 on her back. The numbers have no meaning other than denoting the type of player (full back, centre forward, etc.).
A type of quantile; they are values that split the data into nine equal parts. They are commonly used in educational research.
A probability distribution of a random variable that is known to have certain properties. It is perfectly symmetrical and has a kurtosis of 3. The distribution is described for any variable, v, with a mean of µ and a standard deviation of σ
The reverse of the experimental hypothesis that your prediction is wrong and the predicted effect doesn't exist (i.e., it is zero). Essentially, this hypothesis is never true, but that doesn't stop lots of people from pretending that it might be.
Null hypothesis significance testing (NHST)
A framework for establishing whether a hypothesis is true by working out the probability of observing a statistic at least as large as the one observed if the null hypothesis were true. If this probability is below 0.05, then the null hypothesis is rejected and the alternative hypothesis is accepted.
The probability of an event occurring divided by the probability of that event not occurring
The ratio of the odds of an event occurring in one group compared to another. An odds ratio of 1 would indicate that the odds of a particular outcome are equal in both groups.
An effect size measure associated with comparing several means. It is a function of the model sum of squares and the residual sum of squares and isn't a lot of use because it quantifies the overall difference between lots of means and so can't be interpreted in a meaningful way.
A test of a directional hypothesis. For example, the hypothesis taking a stats test while pretending to be Florence Nightingale will lead to higher marks than taking the test as yourself requires a one-tailed test because I've stated the direction of the relationship (see also 'two-tailed test').
A procedure (or set of procedures) for determining and quantifying the existence of something. In research, we usually want to quantify the existence of a construct .
A scale that tells us not only that things have occurred, but also the order in which they occurred. These data tell us nothing about the differences between values.
The vertical axis of a graph. Also known as the y-axis.
Means perpendicular (at right angles) to something. It tends to be equated to independence in statistics because of the con-notation that perpendicular linear models in geometric space are completely independent (one is not influenced by the other).
A measurable result of some process. For example, if you toss a coin the outcome will be that it lands face up or face down, and if you take an exam the outcome might be your mark.
A variable whose values we are trying to predict from one or more predictor variables
An observation or observations very different from most others. Outliers bias statistics (e.g., the mean) and their standard errors and confidence intervals.
The name often used for the probability of observing a test statistic at least as big as the one observed if the null hypothesis were true. It is used in null hypothesis significance testing.
A test using the t-statistic that establishes whether two means collected from the same sample (or related observations) differ significantly. Also known as the dependent t-test or matched-pairs t-test.
Comparisons of pairs of means
A parameter is something that summarizes a population. Statistical models have variables and parameters: parameters are estimated from the data and are (usually) constants believed to represent some fundamental truth about the relations between variables.
A measure of the relationship between two variables while adjusting for the effect of one or more additional variables on both
Pearson's correlation coefficient (or Pearson's product-moment correlation coefficient)
A standardised measure of the strength of linear relationship between two variables. It can take a value of -1 (as one variable changes, the other changes in the opposite direction), 0 (as one variable changes the other doesn't change), or +1 (as one variable changes, the other changes in the same direction).
Percentage bend correlation
A robust statistic to measure the linear relationship between two variables (in place of the Pearson correlation coefficient)
Percentile bootstrap confidence interval
A confidence interval constructed empirically by taking several (e.g., 1000) bootstrap samples, estimating the parameter of interest in each one, and noting the limits within which a certain percentage (usually 95%) of estimates fall. See also 'Bias corrected and accelerated (BCa) confidence interval'.
Are a type of quantile; they are values that split the data into 100 equal parts
Exists when at least one predictor in a regression model is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated they have a correlation coefficient of 1)
Another name for 'planned contrasts'
A set of comparisons between group means that are constructed before any data are collected. These are theory-led comparisons and are based on the idea of partitioning the variance created by the overall effect of group differences into gradually smaller portions of variance.
A distribution with negative kurtosis (platykurtic, kurtosis < 0) has too few scores in the tails and is quite flat.
The ratio of posterior probability for one hypothesis to another. In Bayesian hypothesis testing the posterior odds are the ratio of the probability of the alternative hypothesis given the data, p(alternative|data), divided by the probability of the null hypothesis given the data, p(null|data).
When using Bayes' theorem to test a hypothesis, the posterior probability, is our belief in a hypothesis or model after we have considered the data, p(model|data). This is the value that we are usually interested in knowing. It is the inverse conditional probability of the likelihood.
Post hoc tests
A set of comparisons between group means that were not thought of before data were collected. These tests involve comparing the means of all combinations of pairs of groups. Each test uses a strict criterion for significance so tend to have less power than planned contrasts. They are usually used for exploratory work for which no firm hypotheses were available on which to base planned contrasts.
The ability of a test to detect an effect of a particular size (a value of 0.8 is a good level to aim for)
Refers to the possibility that participants' performance in a task may be influenced (positively or negatively) if they repeat the task because of familiarity with the experimental situation and/or the measures being used
In statistics, precision has a definition of 1/ variance , which quantifies the error around a statistic. The term is used generally to refer to the extent to which a statistic represents what it is supposed to represent.
The value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model
A form of criterion validity where there is evidence that scores from an instrument predict external measures (recorded at a different point in time) conceptually related to the measured construct
A variable that is used to try to predict values of another variable known as an outcome variable
The ratio of the probability of one hypothesis/model compared to a second. In Bayesian hypothesis testing, the prior odds are the probability of the alternative hypothesis, p(alternative), divided by the probability of the null hypothesis, p(null).
When using Bayes' theorem to test a hypothesis, the prior probability is our belief in a hypothesis or model before, or prior to, considering the data, p(model). See also 'posterior probability', 'likelihood', 'marginal likelihood'.
Probability ( p )
The probability of an event, p(event), is the number of times the event occurs divided by the total number of possible events (i.e., the sample space)
Probability density function
The function that describes the probability of a random variable taking a certain value. It is the mathematical function that describes the probability distribution.
A curve describing an idealized frequency distribution of a particular variable from which it is possible to ascertain the probability with which specific values of that variable will occur. For categorical variables it is simply a formula yielding the probability with which each category occurs.
Matter that can change from one physical form to another
A proportion in statistics usually quantifies the portion of all measured data in a particular category in a scale of measurement. It is the frequency of a particular score/category relative to the total number of scores.
Extrapolating evidence for a theory from what people say or write. Contrast with quantitative methods.
Values that split a data set into equal portions. Quartiles, for example, are a special case of quantiles that split the data into four equal parts. Similarly, percentiles are points that split the data into 100 equal parts and noniles are points that split the data into 9 equal parts (you get the general idea).
Inferring evidence for a theory through measurement of variables that produce numeric outcomes. Contrast with qualitative methods.
A generic term for the three values that cut an ordered data set into four equal parts. The three quartiles are known as the lower (or first) quartile, the second quartile (or median) and the upper (or third) quartile.
A research design in which the experimenter has no control over either the allocation of participants to conditions, or the timing of the experimental manipulations
If the entities in a population or sample space have an equal chance of being selected then a sample resulting from this selection process is a random sample
The range of scores is value of the smallest score subtracted from the highest score. It is a measure of the dispersion of a set of scores. See also 'variance', 'standard deviation','interquartile range'.
The process of transforming raw scores into numbers that represent their position in an ordered list of those scores. The raw scores are ordered from lowest to highest and the lowest score is assigned a rank of 1, the next highest score is assigned a rank of 2, and so on.
An interval scale, but with the additional property that 0 is a meaningful value and, therefore, ratios on the scale are also meaningful
Another name for a score, used to indicate that the score is expressed in its original units of measurement (i.e., has not undergone a transformation)
A line on a scatterplot representing the regression model of the relationship between the two variables plotted
See 'multiple regression', 'simple regression'
Another name for a repeated-measures design
The frequency of a score, range of scores or category expressed relative to the total number of observations
The ability of a measure to produce consistent results when the same entities are measured under different conditions
An experimental design in which different treatment conditions utilize the same entities (in psychology, this would mean the same people take part in all experimental conditions) and so the resulting data are related. Also known as related design or within-subject design.
The difference between the value a model predicts and the value observed in the data on which the model is based. Basically, an error. When the residual is calculated for each observation in a data set the resulting collection is referred to as the residuals. See also 'deviance'.
A general term for participants giving responses that do not necessarily reflect their true beliefs
Normally questions on a survey are phrased so that a large response represents more of the construct being measured
When reverse phrasing is used, responses need to be flipped so that high scores on the same questionnaire items consistently represent more of the construct being measured. This is done by subtracting the score on a reverse-phrased item from the maximum score for that item plus the minimum score.
A term applied to procedures to estimate statistics or the standard errors and confidence intervals of statistics that are not unduly biased by the shape of the probability distribution or outliers and extreme scores
A smaller (but hopefully representative) collection of units from a population used to determine truths about that population (e.g., how a given population behaves in certain conditions)
The mean of a sample of scores
In classical probability theory, the sample space is the set of possible outcomes that can occur for a given trial of an experiment. In empirical probability theory, the sample space is the set of observed outcomes of several trials of an experiment.
The probability distribution of a statistic. If we take a sample and calculate some statistic, the value will depend somewhat on the sample we took. The sampling distribution represents the distribution of possible values of a given statistic that we could expect to get from a given population.
The difference between the value of a population parameter, and the value estimated from the sample
The extent to which a statistic (the mean, median, t, F, etc.) varies in samples taken from the same population
Sampling with replacement
When a sample is constructed from a population or sample space such that after an entity is selected to be in the sample, it is put back into the population or sample space so that it can be selected to be in the sample on a subsequent occasion
Sampling without replacement
When a sample is constructed from a population or sample space such that after an entity is selected to be in the sample, it is taken out of the population or sample space so that it cannot be selected to be in the sample on a subsequent occasion
Scales of measurement
See 'levels of measurement'
A graph that plots values of one variable against the corresponding value of another variable
A score is a measurement or observation of a single instance of a variable. Scores are a collection of measurements or observations of many instances of a variable. See also 'raw score' .
A test of whether a distribution of scores is significantly different from a normal distribution. A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.
Simple effects analysis
This analysis looks at the effect of one independent variable (categorical predictor variable) at individual levels of another independent variable
A linear model in which one variable or outcome is predicted from a single predictor variable. The model takes the form in which Y is the outcome variable, X is the predictor, b₁ is the regression coefficient associated with the predictor and b₀ is the value of the outcome when the predictor is zero.
Simultaneous entry regression
See forced entry regression
A measure of the symmetry of a frequency distribution. Symmetrical distributions have a skew of 0. Frequent scores at the lower end of the distribution (tail points towards higher scores) have a positive skew. Frequent scores at the higher end of the distribution (tail points towards lower scores) have a negative skew.
A standardized measure of the strength of relationship between two variables that does not rely on the assumptions of the linear model. It is Pearson's correlation coefficient calculated on data that have been converted into ranked scores.
The assumption that the variances of the differences between scores from the same entity across different treatments are equal. This assumption is most commonly found in repeated-measures ANOVA, but applies only where there are more than two points of data from the same participant.
An estimate of the average variability (spread) of a set of scores around the mean expressed in the same units of measurement as the raw scores. It is the square root of the variance.
The standard deviation of the sampling distribution of a statistic. It tells us how much variability there is in a statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came.
Standard error of differences
If we plotted these differences between sample means as a frequency distribution, we would have the sampling distribution of differences. The standard deviation of this sampling distribution is the standard error of differences. As such it is a measure of the variability of differences between sample means.
Standard error of the mean
The standard error associated with the mean
Standard normal distribution
A normal distribution of a standardized variable (i.e., z -scores); it has a mean (µ) of 0, and a standard deviation (σ) of 1
The process of converting a variable into a standard unit of measurement. The unit of measurement typically used is standard deviation units (see also z-scores). Standardization allows us to compare data when different units of measurement have been used.
A standardized distribution is a version of a distribution of raw scores where the scores have been transformed to yield specific values of the mean (µ) and standard deviation (σ). Standardized distributions are usually used to make incomparable distributions comparable.
A statistic is something that summarizes a sample of scores. It is usually computed from the sample data and is sometimes used to estimate the corresponding parameter in the population.
A method of multiple regression in which predictor variables are entered into the model based on a statistical criterion (the semi-partial correlation with the outcome variable). Once a new predictor is entered into the model, all predictors in the model are assessed to see whether they should be removed.
Sum of squared errors
Another name for the 'sum of squares'
Sum of squares (SS)
An estimate of total variability (spread) of a set of observations around a parameter (such as the mean). First the deviance for each score is calculated, and then this value is squared. The SS is the sum of these squared deviances.
Variation due to some genuine effect (be that the effect of an experimenter doing something to all of the participants in one sample but not in other samples, or natural variation between sets of variables). We can think of this as variation that can be explained by the model that we have fitted to the data.
A family of probability distributions describing samples taken from a population that is normally distributed. The shapes of the distributions are symmetrical and bell-shaped, and determined by the degrees of freedom. As these increase, the distribution gets closer to a normal distribution until it approximates it at about 30 degrees of freedom.
The possibility that an apparent relationship between two variables is actually caused by the effect of a third variable on them both (often called the third-variable problem)
The ability of a measure to produce consistent results when the same entities are tested at two different points in time
A statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses
Although it can be defined more formally, a theory is a hypothesized general principle or set of principles that explain known findings about a topic and from which new hypotheses can be generated. Theories have typically been well-substantiated by repeated testing.
An experimental design incorporating three categorical predictors (or independent variables)
Tolerance statistics measure multicollinearity and are simply the reciprocal of the variance inflation factor (1/VIF). Values below 0.1 indicate serious problems, although Menard⁹¹ suggests that values below 0.2 are worthy of concern.
Total sum of squares, SS T
A measure of the total variability within a set of observations. It is the total squared deviance between each observation and the overall mean of all observations.
The process of applying a mathematical function to all observations in a data set; for example, to correct some distributional abnormality such as skew or kurtosis, or converting to a z-score.
In probability theory, a trial is a repetition of an experiment
Data after a certain percentage of the distribution has been removed at the extremes
A statistic used in many robust tests. It is a mean calculated using trimmed data. The mean depends on a symmetrical distribution to be accurate, but a trimmed mean produces accurate results even when the distribution is not symmetrical.
A test of a non-directional hypothesis. Two-tailed test are required when the hypothesis does not suggest the direction of the relationship.
An experimental design incorporating two categorical predictors (or independent variables)
Type I error
Occurs when we believe that there is a genuine effect in our population, when in fact there isn't
Type II error
Occurs when we believe that there is no effect in the population when, in reality, there is
A statistic from a sample that equals the corresponding population parameter. For example, a sample mean that is equal to the population mean
The probability of one outcome or another occurring is the probability of the union. The probability of the union of two outcomes A and B would be denoted as, meaning the probability of A or B occurring, but not both. Compare with intersection.
This is variation that is not due to the effect in which we are interested (so could be due to natural differences between people in different samples, such as differences in intelligence or motivation). We can think of this as variation that cannot be explained by whatever model we have fitted to the data.
The value that cuts off the highest 25% of ordered scores. If the scores are ordered and then divided into two halves at the median, then the upper quartile is the median of the top half of the scores. It is also known as the third quartile.
Evidence that a study allows correct inferences about the question it was aimed to answer or that a test measures what it set out to measure conceptually. See also 'concurrent validity', 'content validity', 'criterion validity', 'face validity', 'predictive validity'.
Anything that can be measured and can differ across entities or across time
An estimate of average variability (spread) of a set of scores. In a sample, it is the sum of squares divided by the number of values on which the sum of squares is based, or, if estimating the value in the population, the sum of squares is divided by the number of values on which the sum of squares is based minus 1.
Variance inflation factor (VIF)
A measure of multicollinearity. The VIF indicates whether a predictor has a strong linear relationship with the other predictor(s).
A number by which something (usually a variable in statistics) is multiplied. The weight assigned to a variable determines the influence that variable has within a mathematical equation: large weights give the variable a lot of influence.
A version of the F-ratio designed to be accurate when the assumption of homogeneity of variance has been violated. See also 'Brown-Forsythe F'.
A method for reducing the impact of extreme scores and outliers. The basic idea is to replace the most extreme scores with the next highest score in the data set. The most common implementations are to replace a fixed percentage of scores at the extreme or to identify scores that are extreme and replace only those.
Another name for a 'repeated-measures design'
The horizontal axis of a graph. Also known as the 'abscissa'.
The vertical axis of a graph. Also known as the ordinate.
Yates's continuity correction
An adjustment made to the chi-square test when the contingency table is 2 rows by 2 columns (i.e., there are two categorical variables both of which consist of only two categories). In large samples the adjustment makes little difference and is slightly dubious anyway.
The value of an observation expressed in standard deviation units. The sign of the score indicates whether it is above (positive) or below (negative) the mean, and the value quantifies how many standard deviations the score is from the mean. A z-score is calculated by taking the observation, subtracting from it the mean of all observations, and dividing the result by the standard deviation of all observations. By converting a distribution of observations into z-scores a new distribution is created that has a mean of 0 and a standard deviation of 1.
THIS SET IS OFTEN IN FOLDERS WITH...
FIELD_Discovering Statistics Using IBM S…
Landers - A Step by Step Introduction to Statistic…
Field SPSS 4th edition - chapter 1
Field SPSS 4th edition - chapter 13
YOU MIGHT ALSO LIKE...
Field SPSS 4th edition - chapter 1
Statistics chapter 1-3, 5-6
AP Statistics Unit 2 Vocabulary
OTHER SETS BY THIS CREATOR
Cook: A&P, 2e: Full Glossary
All Higgs Biological Psychology Cards
Chapter 10 - Higgs Biological Psychology
Chapter 9 - Higgs Biological Psychology
OTHER QUIZLET SETS
Chapter 6-7-8 WH TEst study guide