Define individuals

objects described by a set of data such as people, animals, or things

define variable

any characteristic of an individual. A variable can take different values for dinnerent individuals.

define categorical variables

places an individual into one of several groups or categories

define quantitative variable

takes numerical values for which arithmetic operations such as adding and averaging make sense. The values of a quantitative variable are usually recorded in a unit of measurement such as seconds or kilograms.

How is the data usually formatted?

row1 usually records data on the individual while the column contains the values of the variable

Explain exploratory data analysis and describe the 2 principles that help us organize the set of data

exploratory data analysis - statistical tools and ideas that help us examine data in order to describe their main features

2 principles:

1) examine variable by itself, then study the relationships among the variables

2) begin with graph; then add numerical summaries of specific aspects of the data

2 principles:

1) examine variable by itself, then study the relationships among the variables

2) begin with graph; then add numerical summaries of specific aspects of the data

define distribution

the distribution of a variable tells us what values it takes and how often it takes these values

define categorical variable

distribution of a categorical variable lists the categories and gives either the count or the percent of individuals who fall in each category

define round off errors

When % numbers are rounded, the rounding errors keeps them from adding to 100%; this is not an error rather a rounding result

Which 2 tools are used to examine categorical variables?

Pie charts and bar graphs.

which tools are used to examine quantitative data?

histograms, stemplots, timeplots

What should we look at when examining a histogram?

1) look for overall pattern and any striking deviations from the pattern

2) describe the overall pattern of a histogram by its shape, center, and spread

3) look for any kind of outliers (individual value that falls outside the overall pattern)

2) describe the overall pattern of a histogram by its shape, center, and spread

3) look for any kind of outliers (individual value that falls outside the overall pattern)

define a spread

the smallest and largest values on the graph

define the center

it is the center of the graph

What does it mean for a graph to be symmetric?

the right and left sides of the histogram ar eapproximately mirror images of each other

What does it mean for a graph to be skewed to the right?

if the right side of the histogram extends much farther out than the left side

What does it mean for a graph to be skewed to the left?

the left side of the histogram extends much farther out than the right side

when is it appropriate to use stemplots

for small data sets

Explain a time plot.

A time plot of a variable plots each observation against the time at which it was measured. always put time on the horizontal scale of your plot and the variable you are measuring on the vertical scale. Connecting the data points by lines helps emphasize any change over time. Time plots MUST be accompanied by a histogram or another graph.

How is the center measured?

ordinary arthmetic average; mean

define resistant measure

mean resists the influence of extreme observations or outliers

what will happen if the center is not resistant?

mean is highly affected by the outliers

define median

is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger.

How is the median determined?

1) arrange all observations in order of size, from smallest or largest 2) if the number of observatins n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is midway between the 2 center observations in the ordered list 3) You can always located the median in the ordered list of observations by counting up (n+1)/2 observations from the start of the list

Is mean resistant or not resistant to outliers?

it is not a resistant measure

is the median resistant or not resistant to outliers

resistant

How are the quartiles calculated?

Q1 - median of the observations whose position in the ordered list is to the left of the location of the overall median

Q3 - median of hte observations whose position in the ordered list is to the right of the lcoation of the overall median

Q3 - median of hte observations whose position in the ordered list is to the right of the lcoation of the overall median

What is the 5 number summary ?

Minimum

Quartile 1

Median

Quartile 3

Maximum

Quartile 1

Median

Quartile 3

Maximum

What type of graph utilizes the 5 number summary?

boxplot

Explain a boxplot.

1) central box spans the quartiles Q1 and Q3

2) line in the box marks the median M

3) Lines extend from the box out to the msallest and largest observations

2) line in the box marks the median M

3) Lines extend from the box out to the msallest and largest observations

define interquartile range (IQR)

distance between the first and third quartiles in a boxplot : IQR - Q3-Q1

What is standard deviation?

average distance of each point is from the mean

List the properties that determine the usefulness of standard deviation?

1) s measures spread about hte mean and should be used only when the mean is chosen as the measure of center

2) s is always zero or greater than zero

3) s has the same units of measurement as the original observations

4) like the mean x, s is not resistant. A few outliers can make s very large.

2) s is always zero or greater than zero

3) s has the same units of measurement as the original observations

4) like the mean x, s is not resistant. A few outliers can make s very large.

How do we choose measures of center and spread?

5 number summary is better than the mean and standard deviation for describing a skewed distribution or distribution with strong outliers. Use X and s only for reasonably symmetric distributions that are free of outliers.

Do numerical measures of center and spread describe the shape of the graph?

No, do not describe the shape, how they do report specific facts about a distribution.

What is the 4 step process for organizing a statistical problem?

1) STATE: what is the practical question, in the context of the real-world setting?

2) PLAN: what specific statistical operations does this problem call for?

3) SOLVE: make the graphs and carry out the calculations needed for this problem

4) CONCLUDE: Give your practical conclusion in the setting of the real-world problem

2) PLAN: what specific statistical operations does this problem call for?

3) SOLVE: make the graphs and carry out the calculations needed for this problem

4) CONCLUDE: Give your practical conclusion in the setting of the real-world problem

What steps do we take to explore a distribution?

1) always plot your data (graph, histogram or stemplot)

2) look for theoverall pattern (shape, center, spread) and for striking deviations such as outliers

3) calculate a numerical summary to briefly describe center and spread

4) sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve

2) look for theoverall pattern (shape, center, spread) and for striking deviations such as outliers

3) calculate a numerical summary to briefly describe center and spread

4) sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve

Define a density curve.

A density curve is a curve that

1) always on or above the horizontal axis

2) has an area of 1

it is a curve that describes the overall pattern of a distribution. The area under the curve any range of values is the proportion of all observatins that fall in that range.

1) always on or above the horizontal axis

2) has an area of 1

it is a curve that describes the overall pattern of a distribution. The area under the curve any range of values is the proportion of all observatins that fall in that range.

In a density curve how is the median and mean defined?

The median and mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of the skewed curve is pulled away from the median in the direction of the long tail.

notation for the mean of a density curve

greek letter mu

notation for the standard deviation of a density curve

the greek letter sigma

what is a normal curve?

A normal curve is classified as a density curve. Their distrivutions are normal distributions.

List the characteristics of a normal curve or normal distribution

1) all normal curves have the same overall shape: symmetric, single-peaked, bell-shaped

2) any specific normal curve is completely described by giving its mean u and it standard deviation q

3) the mean is located at the center of the symmetric curve and is the same as the median. Changing u without changing q moves the normal curve along the horizontal axis without changing its spread

4) the standard deviation q controls the spread of a normal curve. Curves with larger standard deviation are more spread out.

2) any specific normal curve is completely described by giving its mean u and it standard deviation q

3) the mean is located at the center of the symmetric curve and is the same as the median. Changing u without changing q moves the normal curve along the horizontal axis without changing its spread

4) the standard deviation q controls the spread of a normal curve. Curves with larger standard deviation are more spread out.

describe a normal distribution

described by a normal density curve. Any particular Normal distribution is completely specified by 2 numbers, its mean u and standard deviation q. The mean of a normal distribution is at the center of the symmetric Normal curve. the standard deviation is the distance from the center to the change of curvature points on either side.

what is the 68-95-99.7 rule?

In the Normal distribution with mean u and standard deviation q. descibves distributions that are exactly Normal

1) approximately 68% of the observations fall within q of the mean u

2) approximately 95% of the observations fall within 2q of u

3) approximately 99.7% of the observations fall within 3q of u

1) approximately 68% of the observations fall within q of the mean u

2) approximately 95% of the observations fall within 2q of u

3) approximately 99.7% of the observations fall within 3q of u

What is a z score?

If x is an observation from a distribution that has mean u and standard deviation q, the standardized value of x is

z=x-u/q

The standardized value is called the z-score.

z=x-u/q

The standardized value is called the z-score.

What does the z score tell us?

z-score tells us how many standard deviations the original observaton falls away from the mean, and in which direction. Observations larger than the mean are positive when standardized, and observations smaller than the mean are negative.

What is the standard normal distribution?

The standard Normal distribution is the Normal distribution N(0.1) with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(u,q) with mean u and standard deviation q, then the standardized variable

z=x-u/q

has the standad normal distribution.

z=x-u/q

has the standad normal distribution.

How are Normal proportions found?

Areas under a normal curve represent proportions. There is no formula for areas under a Normal curve. SW is used to calculate the table areas.

How do we use Table A to find Normal Proportions?

1) state the problem in terms of the observed variable x. Draw a picture that shows the proportion you want in terms of cumulative proportions.

2) Standardize x to restate the problem in terms of a standard Normal variable z.

3) Use Table A and the fact that the total area under the curve is 1 to find the required area under the standard Normal curve.

2) Standardize x to restate the problem in terms of a standard Normal variable z.

3) Use Table A and the fact that the total area under the curve is 1 to find the required area under the standard Normal curve.