Upgrade to remove ads
Terms in this set (125)
We ________ to eliminate units
Value found by subtracting the mean and dividing by the standard deviation.
Adding a constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR.
Multiple each data value by a constant multiplies both the measures of position and the measures of spread by that constant.
A useful family of models for unimodel, symmetric distributions.
A numerically valued attribute of a model.
A value calculated from data to summarize aspects of the data.
Tells how many standard deviations a value is from the mean.
Displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values.
If a point is more than 3.0 IQR from either end of the box in a boxplot.
Consider: shape, center, spread
Compare Shapes; Compare Medians; Compare IQRS; Check for outliers
Displays data that change overtime.
Square Root of the Var.
The sum of squared dev. from the mean, divided by the count minus 1.
A calculated summary is said to be ________ if outliers have only a small effect on it.
Found by summing all the data values and dividing by the count.
5 Number Summary
Reports the min., Q1, the median, Q3 and the max.
The # that falls above i% of the data.
Interquartile Range (IQR)
The difference between the 1st and 3rd Quartiles.
The difference between the lowest and highest values in a data set. Range = Max-Mir
Middle value, if it is not an even #, you take the average of the 2 middle #'s.
Extreme values that don't appear to belong with the rest of the data. Any point more than 1.5 IQR from either end of the box in a Boxplot.
Distribution is _________ if it's not symmetric and 1 tail stretches out farther than the other.
The parts that typically trail off on either side.
2 Halves on either side of the center look approximately like mirror images of each other.
A distribution that's roughly flat.
More than 2 modes
A hump or local high point in the shape of the distribution of a var.
A numerical summary of how tightly the values are clustered around the center. Measures: IQR, Standard Dev.
The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single #. Measures: Mean, Median
To describe the _____ of a distribution, look for: single vs. mult. modes; symmetry vs skewness; outliers and gaps.
Graphs a dot for each case against a single axis.
Stem and Leaf Display
Shows quantitative data values in a way that sketches the distribution of the data.
A region of the distribution where there are no values.
Uses adjacent bars to show the distribution of a quantitative var.
Frequency Table (Relative Frequency Table)
Lists the categories in a categorical var. and gives the count of percentages of each categories observation.
The _____________ of a var. gives: possible values of the variance; the relative frequency of each value.
In a statistical display, each data value should be represented by the same amount of area.
Shows a bar whose area represents the count (or percentage) of observations for each category of a categorical variance.
Show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category.
Displays counts and, sometimes, percentages of individuals falling into named categories on 2 or more var.
In a contingency table, the distribution of either var. alone.
The distribution of a var. restricting the who to consider only a smaller group of individuals.
Variables are ________ if the conditional distribution of one variables is the same for each category of the other.
Segmented Bar Chart
Displays the conditional distribution of a categorical var. within each category of another var.
When averages are taken across different groups, they can appear to contradict the overall averages.
Tells who was measured, what was measured, how the data were collected, where the data was collected, and when and why the study was performed.
Systematically recorded info., whether #'s or labels, together with its contact.
An arrangement of data in which each row represents a case and each column represents a variable.
Individual about whom or which we have data.
All the cases we wish we knew about.
The cases we actually examine in seeking to understand the much larger population.
Holds info about the same characteristic for many cases.
A quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams.
A variable that names categories (words/numbers)
A variable in which the numbers act as numerical values - always have units.
If we know what outcomes could happen, but not which particular valves will happen.
A single attempt or realization of a random phenomenon.
The value measured, observed, or reported for an individual instance of that trial.
A collection of outcomes.
The collection of all possible outcome values.
Law of Large Numbers
States that the long run-run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increases.
If one event occurs it does not change the probability thta that the other event occurs.
The probability comes from the long-run relative frequency of the event's occurence.
When the probability comes from a model.
When the probability is subjective and represents your personal degree of belief.
A study based on data in which no manipulation of factors has been employed.
An observational study in which subjects are selected and then their previous conditions or behaviors are determined.
An observational study in which subjects are followed to observe future outcomes.
Manipulates factor levels to create treatments. Randomly assigns subjects to these treatment levels. Compares the responses of the subject groups across treatment levels.
A variance whose levels are manipulated by the experiment.
A variance whose values are compared across different treatments.
Individuals on whom an experiment is performed.
The specific values that the experimenter chooses for a factor.
The process, intervention, or other controlled circumstance applied to randomly assigned experimental units.
Priciples of Experimental Design
Control; Randomize; Replicate; Block
The experimental units assigned to a basseline treatment level.
The tendency of many human subjects to show a response even when adminstered a placebo.
Any individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups.
A treatment known to have no affect.
Levels of one factor are associated with the levels of another factor in such a way that their effects cannot be separated.
A study that asks questions of a sample drawn from some population in the hope of learning something about the entire population.
Any systematic failure of a sampling method.
The best defense against bias; each individual is given a fair, random chance of selection.
Number of individuals in a sample represents the population.
Sample that consists of the entire population.
Numericlaly valued attribute of a model for a population.
A sample is said to be ___________ if the stats computed from it accurately reflect the corresponding population parameters.
Simple Random Sample (SRS)
A sample in which each set of "n" elements in the population has an equal chance of selection.
Simple Random Sample
List of individuals from whom the same is drawn.
The natural tendency of randomly drawn samples to differ, one from another.
Stratified Random Sample
A sampling design in which the population is divided into several subpopulations, or strata, and random samples are then drawn from each stratum.
A sampling design in which entire groups are chosen at random.
Sampling schemes that combine several sampling methods.
A sample drawn by selecting individuals systematically from a sampling frame.
A small trial run of a survey to check whether questions are clear.
Voluntary Response Bias
Bias introduced to a sample when individuals can choose on their own whether to participate in the sample.
Consists of the individuals who are conveniently available to sample.
A sampling scheme that biases the sample in a way that gives a part of the population less representation.
Bias introduced when a large fraction of those sampled fails to respond.
Anything in a survey design that influences response.
If we know the possible values it can have, but not which particular value it takes.
Models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variance of interest.
A component uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely.
Trial (Chapter 11)
The sequence of several componets representing events that we are pretending will take place.
We _______ data by taking the logarithm, the square root, the reciprocal, or some other mathematical operation on all values of a variance.
Ladder of Powers
Places in order the effects that many re-expressions have on the data.
Numerical measure of the direciton and strength of a line or association.
Shows relationship between two quantitative variables measured on the same cases.
A variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two.
An equation of formula that simplifies and represents reality.
An equation of a line. To interpret a linear model, we need to know the variables and their units.
The value of y^ found for a given x-value in the data. This is found by substituting the x-value in reg. equation.
Difference between data values and the corresponding values predicted by the regression model. Observed Value minus predicted value (e= y-y^)
Specifics the unique line that minimizes the variance of the residuals or, equivalently, the sum of the squared residuals.
Regression to the mean
Because correlation is always less than 1.0 in magnitude, each predicted y^ tends to be fewer standard deviation from its mean than its corresponding x was from its mean.
The intercept b (little o), gives a starting value in y-units. It's the y^ - value when x = 0.
Although linear models provide an easy way to predict values of y for a given value of x, it is unsafe to predict for values of x far from the ones used to find the linear model equation.
Data points whose x-value are far from the man of x, are said to exert _____________ on a linear model.
If omitting a point from the data results in a very different regression model.
2 events share no outcomes in common.
This set is often in folders with...
Statistics, Data, and Statistical Thinking
You might also like...
AP Statistics First Semester Review
stats first quiz
AP Statistics Review
Other sets by this creator
RM exam 2
Risk Management Exam 2
CPCU 500 Exam Study Guide
Risk Management Exam 1
Other Quizlet sets
Cell Bio Exam 2