Statistics

a set of procedures used by social scientists to organize, summarize and communicate information

Data

information represented by numbers, which can be the subject of statistical analysis

Research Process or Social Research

a set of activities in which social scientists engage to answer questions, examine ideas, or test theories.

Ex: Has drug abuse increased during the last decade?What factors influence the economic mobility of female workers?

Ex: Has drug abuse increased during the last decade?What factors influence the economic mobility of female workers?

Empirical Research

Research based on evidence that can be verified by using our direct experience.

Theory

an elaborate explanation of the relationship between two or more observable attributes of individuals or groups.

Hypothesis

a tentative answer to a research problem.

Variable

A characteristic that differs or varies from one individual t o another. The variable can be social class, monthly income, religion, gender, age, Education, Race, Ethnicity etc.

Unit of analysis

The level of social life on which social scientists focus. Examples of different levels are individuals (How old are you?) and groups (How many children are in the family?).

Dependent variable (Y)

the variable to be explained (the "effect" ). It is always the property that you are trying to explain; it is always the object of research (the output we want to explain). Always what is being measured in response to what you change. Think about the "Why?"

Independent Variable (X)

the variable expected to account for "the cause of" the dependent variable. It is always what is being manipulated, changed (input) Race, Sex or Gender, Political affiliation, Religion, Income, Education, Poverty, location, social statistics.

Level of measurement

Nominal, Ordinal, Interval, Ratio - Four levels of measurement each containing differing amount of information as follows: Nominal = category, Ordinal = rank, Interval = equal distance, Ratio = all plus true 0 point

Nominal Measurement

Involves naming, labeling, or classifying the observations. Ex. Gender, Political Party, Race, Religion.

Ordinal Measurement

Involves ranking ordered categories ranging from low to high. Ex. Social Class: upper, middle, lower : Strongly agree..... Strongly disagree .

Interval measurement

Involves ordering and exact distance Example: Income, SAT scores, dollars, degrees, pounds, temperature

Ratio measurement

same as interval but includes an absolute or true zero point (age, hours worked, re-arrests)

Population

The total set of individuals, objects, groups, or events in which research is interested.

Sample

a relatively small subset selected from a population.

Descriptive Statistics

Procedures that help organize and describe data collected from either a sample or a population. (Census, survey, administrative data)

Inferential Statistics

Used to make predictions or inferences about a population from observations and analyses of a sample. Shows cause and effect relationships.

Frequency Distribution

A table listing all categories (nominal/ordinal) or observed scores (interval/ratio) and the frequency (f) of each category or observed score.Tells me how many cases I have for each category I am interested in. (a way to see each variable in a category)

Proportion

A relative frequency obtained by dividing the frequency in each category by the total number of cases. P=f/N

*** when working with proportion you cannot get a # higher than 1 .If that happens manipulate the highest # of cases **** only work with 1 decimal point

*

Percentage distribution

a table showing the percentage of observations falling into each category of the variable. (Shows relative size

Percentage

a relative frequency obtained by dividing the frequency in each category by the total number of cases and multiplying by 100. P=F/N x 100

Cumulative frequency distribution

a distribution showing the frequency at or below each category (class interval or score) of the variable.

Cumulative Percentage Distribution

A distribution showing the percentage at or below each category (class interval or score) of the variable. c%=(100)cf/N

****cf =sum of frequencies in that category + all lower category frequencies****

**

Rates and Ratios

a number obtained by dividing the number of actual occurrences in a given time period by the number of possible occurrences.

****birthrate, unemployment rate, poverty rate, and marriage rate****

*Rate= f actual cases/f potential cases x k (100, 1,000, ...)

**Ratio= f1/f2 ***comparison of one category to another

Ex: College Graduation Rate = # Students who graduated / # of students x 100

**

*Rate= f actual cases/f potential cases x k (100, 1,000, ...)

Ex: College Graduation Rate = # Students who graduated / # of students x 100

pie chart

shows the differences in frequencies or percentages among the categories of a NOMINAL or an ORDINAL variable.

(Ex: Feelings about computer and technology)

(Ex: Feelings about computer and technology)

bar graph

a graph showing the differences in frequencies or percentages among the categories of a NOMINAL or an ORDINAL variable. The categories are displayed as rectangles of equal width with their height proportional to the frequency or percentage of the category.

(Ex:Graduation rates at a 4-year university)

(Ex:Graduation rates at a 4-year university)

Histogram

a graph showing the differences in frequencies or percentages among categories of an INTERVAL-RATIO variable. The categories are displayed as contiguous bars, with width proportional to the width of the category (x-axis) and height proportional to the frequency or percentage of that category. (y-axis)

(Ex: Public Assistance income in the past 12 months)

(Ex: Public Assistance income in the past 12 months)

Frequency Polygons

A graphic display of a frequency distribution in which the frequency of each score is plotted on the vertical axis, with the plotted points connected by straight lines. Most useful when the data represent INTERVAL-RATIO variables. (Ex:student examination grades)

***Best suited to show continuity rather than differences***

*

Line Graph

It shows the differences in frequencies or percentages among categories of an INTERVAL-RATIO variable.

(Ex: Graduation Rates by gender for a 4-year university)

(Ex: Graduation Rates by gender for a 4-year university)

Statistical maps

Displays geographic variations in variables

Shading represents different frequencies or percentages

Ex: % of people 5 years and over who speak Spanish at Home, 2007

Shading represents different frequencies or percentages

Ex: % of people 5 years and over who speak Spanish at Home, 2007

Measures of central tendency

Categories or scores that describe what is average or typical of the distribution. Mean, Median, and Mode

Mode

The category or score with the highest frequency (cases) or percentage in the distribution. Can be used with nominal, ordinal , or interval-ratio level variables but is the only MCT appropriate to describe NOMINAL variables.

bimodal

two data values occur with the same greatest frequency

Median

The middle score of a distribution. The score that divides the distribution into two equal parts (50%) so that half the cases are above it and half below it. Can be used with ORDINAL or INTERVAL RATIO variables. (Extremely useful when distribution of scores is skewed)

To calculate the Median

-order score from lowest to highest

-if odd, the median will be an actual score (middle score) in the distribution

-if even, the median is located between two middle scores and is calculated by taking the average of those two scores

-to find the middle score:

Position of the Median= N + 1 / 2

-if odd, the median will be an actual score (middle score) in the distribution

-if even, the median is located between two middle scores and is calculated by taking the average of those two scores

-to find the middle score:

Position of the Median= N + 1 / 2

Mean

A measure of central tendency that is obtained by adding up all the scores and dividing by the total number of scores. It is the arithmetic average score of distribution. Typically used in INTERVAL RATIO variables.

Symmetrical Distribution

The frequencies at the right and left of the distribution are identical; each half of the distribution is the mirror image of the other.

Skewed Distribution

a distribution with a few extreme values on one side of the distribution.

Negatively skewed distribution

a distribution with a few extremely low values (tilts right)

Positively skewed distribution

a distribution with a few extremely high values (tilts left)

Mean Formula

Mean (Y bar)= ΣY/N

***see notes for examples***

*

Factors in choosing a measure of central tendency

...