How can we help?

You can also find more resources in our Help Center.

statistics

the science of gaining information from numerical data

population

group of interest

sample

subgroup of the population meant to represent the population

individuals

objects being described by data set

variable

characteristic of an individual

categorical/qualitative

values are labels and categories

numerical/quantitative

values are numerical

distribution of a variable

tells what values the variable assumes and how often it takes these values

dotplot

simple plot that allows one to visualize a relatively small data set. not convenient is data set is large.

histogram

common distribution graph for one variable data. areas of bars represent percent. histogram shows distribution of values of a quantitative variable. choice of bin width changes how histogram looks

bar chart

compares sizes of different items, used to display frequencies related to categorical variables

outlier

observation that falls outside the overall pattern of a data set

distribution

described by center, spread, shape, outliers

center

described by mean, median, mode

spread

smallest number to largest number. described by quartiles, range, interquartile range

shape

symmetric, skewed to the right, skewed to the left

outliers

anything more/less than 1.5XIQR

mean

always follows tail in a skewed distribution. influenced by outlier.

stemplots

effective for small data set, include a key, split stems split one category into multiple, back-to-back = two stemplots in one to compare

5 number summary

min, q1, median, q3, max

modified boxplot

shows outliers

median

middle value of data set. not influenced by outliers

range

largest value - smallest value. influenced by outliers.

IQR

Q3-Q1. not influenced by outliers.

statistic

number computed from a sample

parameter

number computed from a population

standard deviation

measure of spread, variance = stdev^2, 2 different st devs depending on population or sample. if s = 0, no spread so all observations are the same. use for symmetric data, not for skewed