Terms in this set (125)
Standardizing
We ________ to eliminate units
Standardized Value
Value found by subtracting the mean and dividing by the standard deviation.
Shifting
Adding a constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR.
Rescaling
Multiple each data value by a constant multiplies both the measures of position and the measures of spread by that constant.
Normal Model
A useful family of models for unimodel, symmetric distributions.
Parameter
A numerically valued attribute of a model.
Statistic
A value calculated from data to summarize aspects of the data.
Z-score
Tells how many standard deviations a value is from the mean.
Boxplot
Displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values.
Far Outlier
If a point is more than 3.0 IQR from either end of the box in a boxplot.
Comparing Distributions
Consider: shape, center, spread
Comparing Boxplots
Compare Shapes; Compare Medians; Compare IQRS; Check for outliers
Timeplot
Displays data that change overtime.
Standard Deviation
Square Root of the Var.
Variance
The sum of squared dev. from the mean, divided by the count minus 1.
Resistant
A calculated summary is said to be ________ if outliers have only a small effect on it.
Mean
Found by summing all the data values and dividing by the count.
5 Number Summary
Reports the min., Q1, the median, Q3 and the max.
Percentile
The # that falls above i% of the data.
Interquartile Range (IQR)
The difference between the 1st and 3rd Quartiles.
Range
The difference between the lowest and highest values in a data set. Range = Max-Mir
Median
Middle value, if it is not an even #, you take the average of the 2 middle #'s.
Outliers
Extreme values that don't appear to belong with the rest of the data. Any point more than 1.5 IQR from either end of the box in a Boxplot.
Skewed
Distribution is _________ if it's not symmetric and 1 tail stretches out farther than the other.
Tails
The parts that typically trail off on either side.
Symmetric
2 Halves on either side of the center look approximately like mirror images of each other.
Uniform
A distribution that's roughly flat.
Unimodal
1 mode
Bimodal
2 modes
Multimodal
More than 2 modes
Mode
A hump or local high point in the shape of the distribution of a var.
Spread
A numerical summary of how tightly the values are clustered around the center. Measures: IQR, Standard Dev.
Center
The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single #. Measures: Mean, Median
Shape
To describe the _____ of a distribution, look for: single vs. mult. modes; symmetry vs skewness; outliers and gaps.
Dotplot
Graphs a dot for each case against a single axis.
Stem and Leaf Display
Shows quantitative data values in a way that sketches the distribution of the data.
Gap
A region of the distribution where there are no values.
Histogram
Uses adjacent bars to show the distribution of a quantitative var.
Frequency Table (Relative Frequency Table)
Lists the categories in a categorical var. and gives the count of percentages of each categories observation.
Distribution
The _____________ of a var. gives: possible values of the variance; the relative frequency of each value.
Area Principle
In a statistical display, each data value should be represented by the same amount of area.
Bar Chart
Shows a bar whose area represents the count (or percentage) of observations for each category of a categorical variance.
Pie Chart
Show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category.
Contingency Table
Displays counts and, sometimes, percentages of individuals falling into named categories on 2 or more var.
Marginal Distribution
In a contingency table, the distribution of either var. alone.
Conditional Distribution
The distribution of a var. restricting the who to consider only a smaller group of individuals.
Independence
Variables are ________ if the conditional distribution of one variables is the same for each category of the other.
Segmented Bar Chart
Displays the conditional distribution of a categorical var. within each category of another var.
Simpson's paradox
When averages are taken across different groups, they can appear to contradict the overall averages.
Context
Tells who was measured, what was measured, how the data were collected, where the data was collected, and when and why the study was performed.
Data
Systematically recorded info., whether #'s or labels, together with its contact.
Data Table
An arrangement of data in which each row represents a case and each column represents a variable.
Case
Individual about whom or which we have data.
Population
All the cases we wish we knew about.
Sample
The cases we actually examine in seeking to understand the much larger population.
Variable
Holds info about the same characteristic for many cases.
Units
A quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams.
Categorical Variable
A variable that names categories (words/numbers)
Quantitative Variable
A variable in which the numbers act as numerical values - always have units.
Random Phenomenon
If we know what outcomes could happen, but not which particular valves will happen.
Trial
A single attempt or realization of a random phenomenon.
Outcome
The value measured, observed, or reported for an individual instance of that trial.
Event
A collection of outcomes.
Sample Space
The collection of all possible outcome values.
Law of Large Numbers
States that the long run-run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increases.
Independence
If one event occurs it does not change the probability thta that the other event occurs.
Empirical Probability
The probability comes from the long-run relative frequency of the event's occurence.
Theoretical Probability
When the probability comes from a model.
Personal Probability
When the probability is subjective and represents your personal degree of belief.
Observational Study
A study based on data in which no manipulation of factors has been employed.
Retrospective Study
An observational study in which subjects are selected and then their previous conditions or behaviors are determined.
Prospective Study
An observational study in which subjects are followed to observe future outcomes.
Experiment
Manipulates factor levels to create treatments. Randomly assigns subjects to these treatment levels. Compares the responses of the subject groups across treatment levels.
Factor
A variance whose levels are manipulated by the experiment.
Response
A variance whose values are compared across different treatments.
Experimental Units
Individuals on whom an experiment is performed.
Level
The specific values that the experimenter chooses for a factor.
Treatment
The process, intervention, or other controlled circumstance applied to randomly assigned experimental units.
Priciples of Experimental Design
Control; Randomize; Replicate; Block
Control Group
The experimental units assigned to a basseline treatment level.
Placebo Effect
The tendency of many human subjects to show a response even when adminstered a placebo.
Blinding
Any individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups.
Placebo
A treatment known to have no affect.
Confounding
Levels of one factor are associated with the levels of another factor in such a way that their effects cannot be separated.
Sample Survey
A study that asks questions of a sample drawn from some population in the hope of learning something about the entire population.
Bias
Any systematic failure of a sampling method.
Randomization
The best defense against bias; each individual is given a fair, random chance of selection.
Sample Size
Number of individuals in a sample represents the population.
Census
Sample that consists of the entire population.
Population Parameter
Numericlaly valued attribute of a model for a population.
Representative
A sample is said to be ___________ if the stats computed from it accurately reflect the corresponding population parameters.
Simple Random Sample (SRS)
A sample in which each set of "n" elements in the population has an equal chance of selection.
SRS
Simple Random Sample
Sampling Frame
List of individuals from whom the same is drawn.
Sampling Variability
The natural tendency of randomly drawn samples to differ, one from another.
Stratified Random Sample
A sampling design in which the population is divided into several subpopulations, or strata, and random samples are then drawn from each stratum.
Cluster Sample
A sampling design in which entire groups are chosen at random.
Multistage Sample
Sampling schemes that combine several sampling methods.
Systematic Sample
A sample drawn by selecting individuals systematically from a sampling frame.
Pilot
A small trial run of a survey to check whether questions are clear.
Voluntary Response Bias
Bias introduced to a sample when individuals can choose on their own whether to participate in the sample.
Convenience Sample
Consists of the individuals who are conveniently available to sample.
Undercoverage
A sampling scheme that biases the sample in a way that gives a part of the population less representation.
Nonresponse Bias
Bias introduced when a large fraction of those sampled fails to respond.
Response Bias
Anything in a survey design that influences response.
Random
If we know the possible values it can have, but not which particular value it takes.
Simulation
Models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variance of interest.
Simulation Component
A component uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely.
Trial (Chapter 11)
The sequence of several componets representing events that we are pretending will take place.
Re-expression
We _______ data by taking the logarithm, the square root, the reciprocal, or some other mathematical operation on all values of a variance.
Ladder of Powers
Places in order the effects that many re-expressions have on the data.
Correlation Coefficient
Numerical measure of the direciton and strength of a line or association.
Scatterplot
Shows relationship between two quantitative variables measured on the same cases.
Lurking Variable
A variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two.
Model
An equation of formula that simplifies and represents reality.
Linear Model
An equation of a line. To interpret a linear model, we need to know the variables and their units.
Predicted Value
The value of y^ found for a given x-value in the data. This is found by substituting the x-value in reg. equation.
Residuals
Difference between data values and the corresponding values predicted by the regression model. Observed Value minus predicted value (e= y-y^)
Least Squares
Specifics the unique line that minimizes the variance of the residuals or, equivalently, the sum of the squared residuals.
Regression to the mean
Because correlation is always less than 1.0 in magnitude, each predicted y^ tends to be fewer standard deviation from its mean than its corresponding x was from its mean.
Intercept
The intercept b (little o), gives a starting value in y-units. It's the y^ - value when x = 0.
Extrapolation
Although linear models provide an easy way to predict values of y for a given value of x, it is unsafe to predict for values of x far from the ones used to find the linear model equation.
Leverage
Data points whose x-value are far from the man of x, are said to exert _____________ on a linear model.
Influential Point
If omitting a point from the data results in a very different regression model.
Disjoint(mutually exclusive)
2 events share no outcomes in common.
