stats 121

test 2
Bivariate data
two measurements on a single individual during a study (response variable, explanatory variable)
Direction relationship
positive if while X increases Y increases
Negative if while X increases Y decreases
Explanitory variable
denoted as the predicts Y
Response variable
measures outcome on each individual, denoted as Y
Form of relationship
form (linear non-linear)
Correlation coefficient
denoted by r, a number that gives the direction and strength of a linear relationship between two Quantitative variables
properties for r
both variables must be quantitive
sign of r denotes direction
r is between -1 and +1
no unit of measure
is affected by outliers
Statistical model
an equation that fits the pattern between a response variable and explanatory variable, accounting for deviations in the model
statistical equaition
prediction errors
y-(y-hat)=prediction error
vertical distance from observed y to the line
least-squares regression line
when sum of squared errors (SSE) is the least....
r(stan of y/stan of x)
measures direction and strengthof linear association between X and Y
regression line
models the linear relationship and can be used to make predictions for y values
facts about regression line
a change in one standard deviation x accounts for a r change in standard deviation y...
regression line passes through point (Xbar, Ybar)
it tells us the percentage of variation in Y that is explained by the least-squares regression line...
or.... it is a measure of how successfully the regression explains the response y
residual plot
a scatterplot of the residuals
residual plot diagnostics
smile or frown shape-means there is a non-linear relationship
Megaphone-indicates constant variation (variation in y is dependant on x)
shoe-boX: point outside indicates outlier in x or Y direction
influential observation
an observation that if removed would change the regression line slope and y-intercept noticeably
-otliers in x direction are often influential
-influential observations may have small residuals
-not all outliers are influential observations
predicting y for an x value that is outside the range of observed x values
drawbacks of observational studies
-cannot systematically change x to observe y
-cannot randomize
-cannot establish causation; on correlation or association
to display categorical data
use a table
-explanitory variable is the row
-response is the colomb
margins of a table
show the totals for each coulomb and each row
Simpson's paradox
demonstrates that a great deal of care has to be taken when combining small data sets into a large one. Sometimes conclusions from the large data set are exactly the opposite of conclusion from the smaller sets. Unfortunately, the conclusions from the large set are also usually wrong.
rules of data analysis
-always plot your data
-always describe shape, center, spreadof distributions
measures affected by outliers
standard deviation
-slope and y-intercept
random phenomenon
the outcome of one play is unpredictable, but the outcome of many plays forms a distribution and then we can make a prediction
probability of an outcome
portion of how many times an out come occurs based on repition of plays (total)
Random doesn't mean haphazard
probability =
(# of outcomes in the event of interest)/(count of outcomes in sample space)
probability rules
-between 0-1
-sum of all probabilities must equal 1
- the probe that event will NOT occur = 1-prob that it will occur
disjoint events
two events that have no outcomes in common and, thus, cannot both occur simultaneously.
you cannot state a parameter in statistics without saying mean or proportion
fact 2
a number describing a characteristic of a population
a number computed from sample data, estimating an unknown parameter
Law of large numbers
If population has a finite mean mu or if x-bar is used to estimate mu,
as sample sixe increases, x-bar gets closer to mu
sampling distribution of x-bar
the distribution of all x-bar values from all possible samples of the same size from a population - (x-bar=mu)
-standard deviation of x-bar= standard deviation of pop/square root of n
-standard deviation of x-bar is always less than pop standard deviation where n>1
Central limit theorem
if you take a large srs of size n from any population shape gets more normal as n increases
for x-bar... Z=
x-bar-mu/standard deviation of x-bar
control limits =
mu plus or minus 3standard deviation of x-bar
center line =
out-of-control signals
-one point above or below control limits
-9 points in a row on the same side of the centerline
random variable
is a variable whose value is a numerical outcome of a random phenomenon