How can we help?

You can also find more resources in our Help Center.

raw data

numbers and categories that have been collected but have not yet been processed in any way

variable

a characteristic that can differ from one individual to the next

observational unit

a single individual who participates in a study

sample data

measurements that are taken from a subset of a population

sample size

the total number of observational units

dataset

the complete set of raw data, for all observational units and variables in a survey or experiment

population data

measurements that are taken from all individuals of a population

statistic

a summary measure computed from sample data

parameter

a summary measure for an entire population

descriptive statistics

the summary numbers for either a population or sample

categorical variable

consists of a group or category names that don't necessarily have any logical ordering; each individual only falls into one category

ordinal variable

a categorical variable that may be used to describe the data when a categorical variable has ordered categories

quantitative variable

raw data that are recorded as numerical values (either measurements or counts)

continuous variable

a type of quantitative variable that is used when every value within some interval is a possible result

explanatory variable

the value of which for an individual is thought to partially explain the value of the response variable for that same individual

response variable

a variable that is an effect of another variable

frequency

count of how many observations fall into a category

relative frequency

the proportion or percentage in a category relative to the total count over all categories

frequency distribution

the listing of all categories along with their frequencies

relative frequency distribution

a listing of all categories along with relative frequencies (given a proportions or percentages)

pie charts

visual representations that are useful for summarizing a single categorical variable if there aren't too many categories

bar graphs

visual representations that are useful for summarizing one or two categorical variables; especially useful for comparing two categorical variables

distribution

the overall pattern of how often the possible values occur

location

on a distribution, this is represented by the center or the average

median

approximate middle value of data

mean

the arithmetic average of data

variability

on a distribution, the spread among individual measurements

shape

on a distribution, can be clumped or skewed; describes the graph

outliers

data points that are not consistent with the bulk of the data

histogram

similar to a bar graph, though not extremely informative when the sample size is small

stem-and-leaf plots

present all individual values; can be overwhelming for large datasets

boxplot

displays information in a five-number summary; useful for comparing multiple groups and identifying outliers

right

a graph is skewed to the _______ if higher values are more spread out than lower values

left

a graph is skewed to the _______ if lower values are more spread out than higher values

mode

the most frequent value

unimodal

if there is a single prominent peak in a histogram, stemplot, or dotplot

range

the highest value minus the lowest value

interquartile range

upper quartile - lower quartile

resistant statistic

a numerical summary of the data that is "resistant" to the influence of outliers, meaning outliers won't have a major influence on a statistic's numerical value

first summary number

the mean of a bell-shaped distribution

second summary number

the standard deviation of a bell-shaped distribution

standard deviation

the measure of the spread of values, represented by s; the average distance that values fall from the mean

variance

the squared value of the standard deviation

empirical rule

68% of values fall within 1 standard deviation of the mean in either direction; 95% of values fall within 2 standard deviations of the mean in either direction; 99.7% of values fall within 3 standard deviations of the mean in either direction