Bivariate data

two measurements on a single individual during a study (response variable, explanatory variable)

Direction relationship

positive if while X increases Y increases

Negative if while X increases Y decreases

Negative if while X increases Y decreases

Explanitory variable

denoted as the X...it predicts Y

Response variable

measures outcome on each individual, denoted as Y

Form of relationship

strength

direction

form (linear non-linear)

direction

form (linear non-linear)

Correlation coefficient

denoted by r, a number that gives the direction and strength of a linear relationship between two Quantitative variables

properties for r

both variables must be quantitive

sign of r denotes direction

r is between -1 and +1

no unit of measure

is affected by outliers

sign of r denotes direction

r is between -1 and +1

no unit of measure

is affected by outliers

Statistical model

an equation that fits the pattern between a response variable and explanatory variable, accounting for deviations in the model

statistical equaition

Y-hat=a+bx

a=intercept

b=slope

a=intercept

b=slope

residuals

prediction errors

y-(y-hat)=prediction error

vertical distance from observed y to the line

y-(y-hat)=prediction error

vertical distance from observed y to the line

least-squares regression line

when sum of squared errors (SSE) is the least....

E(y-y-hat)^2

E(y-y-hat)^2

a=

(Y-bar)-b(x-bar)

b=

r(stan of y/stan of x)

correlation

measures direction and strengthof linear association between X and Y

regression line

models the linear relationship and can be used to make predictions for y values

facts about regression line

a change in one standard deviation x accounts for a r change in standard deviation y...

regression line passes through point (Xbar, Ybar)

regression line passes through point (Xbar, Ybar)

r^2

it tells us the percentage of variation in Y that is explained by the least-squares regression line...

or.... it is a measure of how successfully the regression explains the response y

or.... it is a measure of how successfully the regression explains the response y

residual plot

a scatterplot of the residuals

residual plot diagnostics

smile or frown shape-means there is a non-linear relationship

Megaphone-indicates constant variation (variation in y is dependant on x)

shoe-boX: point outside indicates outlier in x or Y direction

Megaphone-indicates constant variation (variation in y is dependant on x)

shoe-boX: point outside indicates outlier in x or Y direction

influential observation

an observation that if removed would change the regression line slope and y-intercept noticeably

-otliers in x direction are often influential

-influential observations may have small residuals

-not all outliers are influential observations

-otliers in x direction are often influential

-influential observations may have small residuals

-not all outliers are influential observations

extrapolation

predicting y for an x value that is outside the range of observed x values

drawbacks of observational studies

-cannot systematically change x to observe y

-cannot randomize

-cannot establish causation; on correlation or association

-cannot randomize

-cannot establish causation; on correlation or association

to display categorical data

use a table

-explanitory variable is the row

-response is the colomb

-explanitory variable is the row

-response is the colomb

margins of a table

show the totals for each coulomb and each row

Simpson's paradox

demonstrates that a great deal of care has to be taken when combining small data sets into a large one. Sometimes conclusions from the large data set are exactly the opposite of conclusion from the smaller sets. Unfortunately, the conclusions from the large set are also usually wrong.

rules of data analysis

-always plot your data

-always describe shape, center, spreadof distributions

-always describe shape, center, spreadof distributions

measures affected by outliers

-mean

standard deviation

-correlation

-r^2

-slope and y-intercept

standard deviation

-correlation

-r^2

-slope and y-intercept

random phenomenon

the outcome of one play is unpredictable, but the outcome of many plays forms a distribution and then we can make a prediction

probability of an outcome

portion of how many times an out come occurs based on repition of plays (total)

Random doesn't mean haphazard

fact!

probability =

(# of outcomes in the event of interest)/(count of outcomes in sample space)

probability rules

-between 0-1

-sum of all probabilities must equal 1

- the probe that event will NOT occur = 1-prob that it will occur

-sum of all probabilities must equal 1

- the probe that event will NOT occur = 1-prob that it will occur

disjoint events

two events that have no outcomes in common and, thus, cannot both occur simultaneously.

you cannot state a parameter in statistics without saying mean or proportion

fact 2

parameter

a number describing a characteristic of a population

statistic

a number computed from sample data, estimating an unknown parameter

Law of large numbers

If population has a finite mean mu or if x-bar is used to estimate mu,

Then....

as sample sixe increases, x-bar gets closer to mu

Then....

as sample sixe increases, x-bar gets closer to mu

sampling distribution of x-bar

the distribution of all x-bar values from all possible samples of the same size from a population - (x-bar=mu)

-standard deviation of x-bar= standard deviation of pop/square root of n

-standard deviation of x-bar is always less than pop standard deviation where n>1

-standard deviation of x-bar= standard deviation of pop/square root of n

-standard deviation of x-bar is always less than pop standard deviation where n>1

Central limit theorem

if you take a large srs of size n from any population shape gets more normal as n increases

for x-bar... Z=

x-bar-mu/standard deviation of x-bar

control limits =

mu plus or minus 3standard deviation of x-bar

center line =

mu

out-of-control signals

-one point above or below control limits

-9 points in a row on the same side of the centerline

-9 points in a row on the same side of the centerline

random variable

is a variable whose value is a numerical outcome of a random phenomenon