52 terms

Symmetric

data on which both sides are fairly the same shape and size. "Bell Curve"

Parameter

value of a population (typically unknown)

Statistic

a calculated value about a population from a sample(s).

Median

the middle point of the data (50th percentile) when the data is in numerical order.

Variability

allows statisticians to distinguish between usual and unusual occurrences.

Standard Deviation

measures the typical or average deviation of observations from the mean

Skewed Right

mean is a larger value than the median.

Z-score/T-score

is a standardized score. This tells you how many standard deviations from the mean an observation is.

Normal Model

is a bell shaped and symmetrical curve.

As σ increases the curve flattens.

As σ decreases the curve thins.

As σ increases the curve flattens.

As σ decreases the curve thins.

Mutually Exclusive

A and B have no intersection. They cannot happen at the same time.

Independent

if knowing one event does not change the outcome of another.

Law of Large Numbers

as an experiment is repeated the experimental probability gets closer and closer to the true (theoretical) probability.

Correlation Coefficient (r)

is a quantitative assessment of the strength and direction of a linear relationship. between -1 and 1.

Least Squares Regression Line (LSRL)

is a line of mathematical best fit. Minimizes the deviations

(residuals) from the line. Used with bivariate data.

(residuals) from the line. Used with bivariate data.

Residual (error)

is vertical difference of a point from the LSRL. They should all add to zero. Is the difference between the observed and expected value.

Coefficient of Determination (r-squared)

gives the proportion of variation in y (response) that is explained by the relationship of (x, y).

Extrapolation

LRSL cannot be used to find values outside of the range of the original data.

Influential Points

are points that if removed significantly change the LSRL.

Census

a complete count of the population. Disadvantages of this: Not accurate, Expensive, Impossible to do

Simple Random Sample

one chooses so that each unit has an equal chance and every set of units has an equal chance of being selected.

Stratified Sampling

divide the population into homogeneous groups then SRS from every group. [Observational studies]

Cluster Sampling

Usually can be based on location. Select a random location and sample ALL at that location. Divide the population into heterogeneous groups and SRS a certain amount of groups. Take all members/things in that group.

Bias

favors a certain outcome, has to do with center of sampling distributions - if centered over true parameter then considered unbiased

Voluntary Response Bias

people choose themselves to participate.

Convenience Sampling

ask people who are easy, friendly, or comfortable asking.

Undercoverage

some group(s) are left out of the selection process.

Nonresponse Bias

someone cannot or does not want to be contacted or participate.

Control Group

a group used to compare the factor to for effectiveness - does NOT have to be placebo

Single Blind

a method used so that the subjects are unaware of the treatment (who gets a placebo or the real treatment).

Double Blind

neither the subjects nor the evaluators know which treatment is being given.

Replication

A MUST for EVERY experimental design. Uses many subjects to quantify the natural variation in the response.

Completely Randomized Design

all units are allocated to all of the treatments randomly [Experiment]

Randomized Block

units are separated based on a KNOWN factor. Then randomly assign treatments in each group -reduces variation

Matched-Pair Design

Once a pair receives a certain treatment, then the other pair automatically receives the second treatment.

OR individuals do both treatments in random order (before/after or pretest/post-test)

Assignment is dependent

OR individuals do both treatments in random order (before/after or pretest/post-test)

Assignment is dependent

Confounding Variables

are where the effect of the variable on the response cannot be separated from the effects of the factor being tested - happens in observational studies - when you use random assignment to treatments you do NOT have this!

Randomization

reduces bias by spreading extraneous variables to all groups in the experiment. MUST have in EVERY experiment

Binomial Probability

Trials have two outcomes; Trials are independent; and most importantly, the number of trials are fixed!

Geometric Probability

two mutually exclusive outcomes, each trial is independent, probability (p) of success is the same for all trials. (NOT a fixed number of trials)

Sampling Distribution

is the distribution of all possible values of all possible samples. Use normalcdf to calculate probabilities

Central Limit Theorem

when n is sufficiently large (n > 30) the sampLING distribution is approximately normal even if the population distribution is not normal.

Lurking Variable

is a variable that is not included as an explanatory or response variable in the analysis but can affect the interpretation of relationships between variables. It can falsely identify a strong relationship between variables or it can hide the true relationship.

Simulation

is a way to model random events, such that simulated outcomes closely match real-world outcomes

Placebo effect

A remarkable phenomenon in which a fake treatment, can sometimes improve a patient's condition simply because the person has the expectation that it will be helpful

Histogram

A graphical display that represents a frequency distribution by means of rectangles whose widths represent class intervals or "bins"

Interquartile Range

A numerical description of a distribution requires both a measure of center and a measure of spread

"X bar"

Sample mean

68-95-99.7 rule

percentage of data within 1, 2, and 3 standard deviations of a normally distributed dataset.

Explanatory variable

Helps explain or influence change in a response variable

Response variable

Measures an outcome of a study

Treatment

A specific condition applied to the individuals in an experiment.

"p hat"

Sample proportion used to estimate unknown parameter

unbiased estimator

A statistic used to estimate a parameter, if the mean of its sampling distribution is equal to the value of the parameter being estimated.