# AP Statistics Module 1: Exploring Univariate Data

## 27 terms · Module 1

### statistics

the science of gaining information from numerical data

### population

group of interest

### sample

subgroup of the population meant to represent the population

### individuals

objects being described by data set

### variable

characteristic of an individual

### categorical/qualitative

values are labels and categories

### numerical/quantitative

values are numerical

### distribution of a variable

tells what values the variable assumes and how often it takes these values

### dotplot

simple plot that allows one to visualize a relatively small data set. not convenient is data set is large.

### histogram

common distribution graph for one variable data. areas of bars represent percent. histogram shows distribution of values of a quantitative variable. choice of bin width changes how histogram looks

### bar chart

compares sizes of different items, used to display frequencies related to categorical variables

### outlier

observation that falls outside the overall pattern of a data set

### distribution

described by center, spread, shape, outliers

### center

described by mean, median, mode

smallest number to largest number. described by quartiles, range, interquartile range

### shape

symmetric, skewed to the right, skewed to the left

### outliers

anything more/less than 1.5XIQR

### mean

always follows tail in a skewed distribution. influenced by outlier.

### stemplots

effective for small data set, include a key, split stems split one category into multiple, back-to-back = two stemplots in one to compare

### 5 number summary

min, q1, median, q3, max

shows outliers

### median

middle value of data set. not influenced by outliers

### range

largest value - smallest value. influenced by outliers.

### IQR

Q3-Q1. not influenced by outliers.

### statistic

number computed from a sample

### parameter

number computed from a population

### standard deviation

measure of spread, variance = stdev^2, 2 different st devs depending on population or sample. if s = 0, no spread so all observations are the same. use for symmetric data, not for skewed