## 44 terms · test 2

### Bivariate data

two measurements on a single individual during a study (response variable, explanatory variable)

### Direction relationship

positive if while X increases Y increases

Negative if while X increases Y decreases

### Correlation coefficient

denoted by r, a number that gives the direction and strength of a linear relationship between two Quantitative variables

### properties for r

both variables must be quantitive

sign of r denotes direction

r is between -1 and +1

no unit of measure

is affected by outliers

### Statistical model

an equation that fits the pattern between a response variable and explanatory variable, accounting for deviations in the model

### residuals

prediction errors

y-(y-hat)=prediction error

vertical distance from observed y to the line

### facts about regression line

a change in one standard deviation x accounts for a r change in standard deviation y...

regression line passes through point (Xbar, Ybar)

### r^2

it tells us the percentage of variation in Y that is explained by the least-squares regression line...

or.... it is a measure of how successfully the regression explains the response y

### residual plot diagnostics

smile or frown shape-means there is a non-linear relationship

Megaphone-indicates constant variation (variation in y is dependant on x)

shoe-boX: point outside indicates outlier in x or Y direction

### influential observation

an observation that if removed would change the regression line slope and y-intercept noticeably

-otliers in x direction are often influential

-influential observations may have small residuals

-not all outliers are influential observations

### drawbacks of observational studies

-cannot systematically change x to observe y

-cannot randomize

-cannot establish causation; on correlation or association

### Simpson's paradox

demonstrates that a great deal of care has to be taken when combining small data sets into a large one. Sometimes conclusions from the large data set are exactly the opposite of conclusion from the smaller sets. Unfortunately, the conclusions from the large set are also usually wrong.

### rules of data analysis

-always plot your data

-always describe shape, center, spreadof distributions

### random phenomenon

the outcome of one play is unpredictable, but the outcome of many plays forms a distribution and then we can make a prediction

### probability of an outcome

portion of how many times an out come occurs based on repition of plays (total)

### probability rules

-between 0-1

-sum of all probabilities must equal 1

- the probe that event will NOT occur = 1-prob that it will occur

### disjoint events

two events that have no outcomes in common and, thus, cannot both occur simultaneously.

### Law of large numbers

If population has a finite mean mu or if x-bar is used to estimate mu,

Then....

as sample sixe increases, x-bar gets closer to mu

### sampling distribution of x-bar

the distribution of all x-bar values from all possible samples of the same size from a population - (x-bar=mu)

-standard deviation of x-bar= standard deviation of pop/square root of n

-standard deviation of x-bar is always less than pop standard deviation where n>1

### Central limit theorem

if you take a large srs of size n from any population shape gets more normal as n increases

### out-of-control signals

-one point above or below control limits

-9 points in a row on the same side of the centerline