EXAM 1
Terms in this set (47)
What we collect data from. Can be individuals, companies, animals, plants, or any object of interest
Cases
A special variable used in some data sets to distinguish the different cases
Label (Subj. ID)
Any "specific" characteristic. Varies among cases. Examples: age, height, ethnicity
Variable
Tells us what values the variable takes and how often it takes these values
Distribution
Something that takes numerical values for which arithmetic operations, such as adding and averaging, makes sense (ex: age, height)
Quantitative
Something that falls into one of several categories. What can be counted is the count or proportion of cases in each category (ex: hair color, blood type)
categorical (qualitative)
A chart that breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class
Histograms
A chart that plots each observation against the time at which it was measured
line graphs
A distribution is ___ if the right and left sides of the histogram are approximately mirror images of each other
Symmetric
Observations that lie outside the overall pattern of distribution
outlier
A rise or fall that persists over time, despite irregularities
trend
A pattern that repeats itself at regular intervals of time
seasonal variation
adding all values and dividing by the number of cases. "the center of mass"
Mean
the midpoint of a distribution- the number such that half the observations are smaller and half are larger
Median
The value in the sample that has 25% of the data at or below it
first quartile, Q1
The value in the sample that has 75% of the data at or below it
Third quartile, Q3
We call an observation a ___ if it falls more than 1.5 times the size of the interquartile range (IQR) above the first quartile or below the third quartile. This is called the "____"
suspected outlier
1.5 * IQR rule for outliers
A mathematical model of a distribution
density curve
Because all normal distributions share the same properties, we can ___ our data to transform any normal curve into the standard normal curve
Standardize
Measures the number of standard deviations that a data value x is from the mean
z-score
One way to assess if a distribution is indeed approximately normal is to plot the data on a ___
Normal quantile plot
When one axis is used to represent each of the variables, and the data are plotted as points on the graph
scatterplot
After plotting 2 variables on a scatterplot, we describe the relationship by examining the ___, ___, and ___ of the association
form
direction
strength
high values of one variable tend to occur together with high values of the other variable
Positive association
High values of one variable tend to occur together with low values of the other variable
Negative association
The ___ of the relationship between 2 variables can be seen by how much variation, or ___, there is around the main form
strength
scatter
When an association is more complex than linear, we can still describe the overall pattern by ___ the scatterplot
smoothing
The correlation coefficient is a measure of the ___ and ___ of a linear relationship
direction
strength
calculated using the mean and the standard deviation of both the x and y variables
correlation coefficient
Correlation can only by used to describe ___ variables
quantitative
Allows us to compare correlation between data sets where variables are measures in different units or when variables are different
standardization
A ___ measures or records an outcome of a study. A ___ explains changes in the response variable
response variable
explanatory variable
A ___ is a straight line that describes how a response variable y changes as an explanatory variable x changes
regression line
The use of a regression line for predictions outside the range of x values used to obtain the line
Extrapolation
The square of the correlation coefficient. Represents the percentage of the variance in y that can be explained by changes in x
r2, coefficient of determination
The distances from each point to the least-squared regression line that give us potentially useful information about the contribution of individual data point to the overall pattern of scatter
residuals
we plot residuals in a ___
residual plot
observation that markedly changes the regression if removed
influential individuals
A variable not included in the study design that does have an effect on the variables studies
lurking variable
2 variables are ___ when their effects on a response variable cannot be distinguished from each other
Confounded
An experiment has a ___, or block, design if 2 categorical factors are studied with several levels of each factor
two-way
We can look at each categorical variable separately in a 2-way table by studying the row totals and column totals. They represent the ___, expressed in counts or %
Marginal distribution
An association or comparison that holds for all of several groups can reverse direction when the data are combines to form a single group
Simpson's paradox
% rule for normal distributions
68-95-99.7%
The form of a scatterplot graph can be...
linear, curved, clusters, or no pattern
The direction of a scatterplot graph can be...
positive, negative, or no direction
The strength of a scatterplot graph depends on...
how closely the points fit the "form"
