Chapter 1: The What and the Why of Statistics Chapter 2: Organization of Information Chapter 3: Graphic Presentation Chapter 4: Measures of Central Tendency Chapter 5: Measure of Variability
Terms in this set (75)
a set of procedures used by social scientists to organize, summarize and communicate information;
only information represented by ____ can be the subject of statistical analysis
research based on information that can be verified by using our direct experience
a property of people or objects that takes one two or more values
exhaustive & mutually exclusive
each variable must include categories that are both ______
means that there should be enough categories composing the variables to classify every observation
means that there is only one category suitable for each observation (no one should be classified as a protestant and methodist)
unit of analysis
the level of social life on which social scientists focus (i.e. organizations: variables = hospitals or university)
the variable the research wants to explain (always the object of the research) "the effect"
the variable that is expected to cause or account for the dependent variable "cause" (usually occurs earlier in time)
1. cause precedes the effect in time
2. empirical relationship between cause and effect
3. relationship cannot be explained by other factors
What are the three conditions that must be met to establish that two varialbes are casually related
these are never considered dependent variables because they cannot be explained (i.e. race, age, ethnicity)
3. interval ratio
what are the three levels of measurement
numbers or other symbols are assigned to a set of categories for the purpose of naming, labeling, or classifying the observations; qualitative (i.e. males vs females)
numbers are assigned to rank-ordered categories ranging from low to high (don't know the difference between stages) i.e. strongly agree vs agree...how do you know the degree to agreement ?
measurements for all cases are expressed in the same units ; can be used to compare values not only in terms of which is larger or smaller but also in terms of how much larger or smaller one is compared with other
natural zero point
where zero means the absence of the property (i.e. weight and length but not temperature, which has an arbitrary zero point)
higher ; lower
properties that can be measured at a _______ level can be measured at _______ levels, but not vice versa
variable that has only two values (i.e. gender, employment status, martial status) can be considered as ordinal and/or interval ratio
gives off more power than other nominal-level variables (white vs nonwhite)
researchers often dichotomize some of their variables because dichotomy _____
have a minimum-sized unit of measurement, which cannot be subdivided (i.e. number of children per family and/or currency)
do not have minimum-sized unit of measurement; their range of values can be subdivided into increasingly smaller fractional values (i.e. length)
reliability and validity
the two characteristics of measurement are _______ & _______
refers to the extent to which measures indicate what they are intended to measure
1. individual error
2. method error
what are the two types of error?
1. individuals may want to provide socially desireable responses
2. questions may be unclear or poorly written
1. descriptive statistics
2. inferntial statistics
what are the two major categories that statistical procedures can be divided into?
population is total set of numbers while sample is a relatively small portion
what is the difrerence between sample and population ?
includes procedures that help us organize and describe data collected from either a sample or a population (i.e. divorce rate median age @ marriage)
concerned with making predictions or inferences about a population from observations and analyses of a sample (i.e. age at marriage---> (-) divorce)
D: theory--> data
I: data --> theory
deductive process starts with ____ ->
inductive process starts with ___ ->
to highlight a key info
if you see frequency, the actual number instead of a % out of 100, what is the author trying to do ?
table that reports the number of observations that fall into each category of the variable we are analyzing (constructing this is usually the first step in the statistical analysis of data)
relative frequency obtained by dividing the frequency in each category by the total number of cases
relative frequency obtained by dividing the frequency in each category by the total number of cases and multiplying by 100
table showing the percentage of observation falling into each category of the variable.
cumulative frequency distribution
shows the frequencies at or below each category (class interval or score) of the variable (only for variables that are measured at an ordinal level or higher) allows us to locate the relative position of a given score in a distribution.
number obtained by dividing the number of actual occurrences in a given time period by the number of possible occurrences
constructing a frequency distribution
what is usually the first step in the statistical analysis of data ?
pie chart & bar graph
the _______ & ______ are appropriate for nominal and ordinal variables
histograms & line graphs
_____ & ______ are used with interval-ratio variables
used to show the differences in frequencies or percentages among categories of an interval-ratio variable.
these are better suited for comparing how a variable is distributed across two or more groups or across two or more time periods
displays changes in a variable at different points in time
the differences is that line graphs display frequency distributions of a single variable, whereas time-series charts display two variables. Additionally, time is always one of the variables displayed in a time-series chart.
how does the time-series chart differ from a line graph ?
frequencies dont control or adjust for the total number of people in each group; so percentages must be used to make the bars comparable
why shouldn't you construct a grouped bar chart showing the frequencies rather than the percentages ?
measures of central tendency
numbers that describe what is average or typical of the distribution
what are the three types of measures of central tendency ?
category or score, frequency
mode is always a ______, not a _____
the only measure of a central tendency that can be used with nominal-level variables
nominal, ordinal, interval ratio
what level of measurement can you use for mode ?
ordinal, interval ratio
what level of measurement can you use for median ?
what level of measurement can you use for mean ?
the 75th percentile is a score that divides the distribution so that 75% of the cases are below it
1. the way the variables are measured (their level of measurement)
2. the shape of the distribution
3. the purpose of the research
what are the three factors that determines which measure of central tendency to be used to represent a distribution ?
1. interval-ratio measurement
2. center of gravity
3. sensitivity to extremes
what are the three mathematical properties that make the mean the most important measure of central tendency ?
center of gravity
the mean is the point that perfectly balances all the scores in the distribution; if we subtract the mean from each score and add up all the differences, the sum will always be zero
measures of variability
numbers that describe diversity or variability in the distribution of a variable
1. index of qualitative variation
3. interquartile range
4. standard deviation
what are the five measures of variability
measure of central tendency /// measures of variation
whereas the similarities and commonalities in the experiences of Asian American women are depicted by ______, the diversity of their experiences can be described only by using _____
index of qualitative variation (IQV)
measure of variability for nominal variables such as race and ethnicity (vary from 0.00 to 1.00)
= K(100^2 - Sigma Pct^2) / 100^2 (K-1)
what is the formula for IQV
1. construct a percentage distribution
2. square the percentages for each category
3. sum the squared percentages
4. calculate the IQV using the formula
what are the four steps to calculating the IQV ?
because the two scores might be extreme
why is range a rather crude measure ?
interquartile range (IQR)
measure of variation for interval-ratio variable (middle 50% of the distribution) (difference between the lower and upper quartile)
graphic device called that visually present the range, the IQR, the median, the lowest/ highest score;;
1. draw a box between the lower and upper quartile (IQR)
2. draw a solid line within the box to mark the median
3. draw vertical lines (called whiskers) outside the box, extending to the lowest and the highest values
what are the three steps to creating a box plot ?
based on all the scores in the distribution
we use the mean as the reference point rather than other kinds of averages (the mode or the median) because the mean is ______
the average of the squared deviations from the center (mean) of the distribution (symbolized as S)
square root of the variance (Y)
divide the sum of the squared deviations by the number of scores in the distribution
how do you determine the average of the squared deviations from the mean ?
(Y-Y) second Y has a underscore above it
what is the deviation from the mean ? symbol-wise
1. calculate the mean, Y_ = sigma(Y)/N
2. subtract the mean from each score to find the deviation, Y-Y_
3. square each deviation, (Y-Y_)^2
4. sum the squared deviations, sigma (Y-Y_)^2
5. Divide the sum by N-1, sigma (Y-Y_)^2 / (N-1)
what are the five steps to calculating the variance ?
For ____ level, your choice is restricted to the IQV as a measure of variability
For the ____ level, the IQV can be used to reflect variability in distributions, but because it is not sensitive to the rank ordering of values implied in ____ variables, it loses some information. Also can use IQR
For _______ level, you can choose the variance or standard deviation, the range, or the IQR. The variance and/or standard deviation is usually preferred.