Math
Statistics
Chapter 1: Describing Data- Graphical
Terms in this set (41)
Population
The complete set of all items that interest an investigator
N
Population size
Sample
observed portion of the population with sample size
n
Sample size
Simple Random Sample
Procedure used to select a sample n objects from a population by chance, select of one doesn't influence another, each member is equally likely to be chosen, every possible sample has equal opportunity to be chosen. Ex.: A bunch of little pieces of paper with names on them in a hat, shuffle, choose one name randomly
Systematic Sampling
The selection of every jth item in a population, where j is the ratio of the population size N to the desired sample size n: j = N/n
Parameter
A numerical measure that describe a specific characteristic of a population
Statistic
A numerical measure that describes a specific characteristic of a sample
Sampling Error
an error that occurs when the information is available on only a subset of all the population members
Nonsampling Error
an error that occurs that is not related to the kind of sample procedure
Descriptive Statistics
a statistic that focuses on graphical and numerical procedures that are used to summarize and process data
Inferential Statistics
a statistic that focuses on using the data to make predictions, forecasts, and estimates to make better decisions.
Variable
a specific characteristic of an individual or object.
Categorical Variables
a variable which produces responses that belong to groups or categories
Numerical Variables
Data is inherently numerical
Discrete Numerical Variables
Data take on only discrete integer value (there is overlap w/ categorical variables).
Continuous Numerical Variables
data can take on any real number value within specific bounds
Qualitative Data
data which has no measurable meaning to the "difference" in numbers; Example: one b-ball player is number 20; another is number 10. One cannot conclude that #20 plays 2x as much and as #10
Quantitative Data
data which has measurable meaning to the difference in numbers. Example: One b-ball player scores 90 points; another scores 45 points. I can conclude that one scored 2x as much as the other.
Nominal Data
that is chosen strictly for convenience and does not imply ranking of responses; weakest or lowest type of data; a. Example: 1=Male, 2=Female; 1=Yes, 2=No
Ordinal Data
data which indicate the rank ordering of items, and similar to nominal data the values are words that describes responses; Example: 1=poor, 2=average, 3=good
Interval and ratio levels of measurement
data obtained from numerical variables, and meaning exists between the difference between measurements; indicates rank and distance from an arbitrary zero measured in unit intervals; data is relative to an arbitrarily determined benchmark; Example: Temperature, measured in Fahrenheit or Celsius, Kelvin, arbitrary determined benchmark
Ratio data
indicates both rank and distance from a natural zero, with ratios of two measures having meaning; example: A person weighs 200 lbs. Another weighs 100 lbs. The first weighs 2x as much as the second
Frequency Distribution
A table used to organize data, Left Column (classes and groups): All possible responses on a variable being studied; Right Column: A list of the frequencies, or number of observations for each class
Bar Chart
Height of the rectangle represents each frequency
Cross Table
Table that lists the number of observations for every combination of values for two categorical or ordinal variables.
Pie Chart
used to depict division of a whole into its constituent parts
Pareto Diagram
a bar chart that displays the frequency of defect causes; a. The bar at the left indicates the most frequent cause and the bars to the right indicate causes with decreasing frequencies
Time Series Data
measured at successive points in time
Time Series
a set of measurements, ordered over time, on a particular quantity of interest
Line Chart (Time-Series Plot)
A series of data plotted at various time intervals; Measuring time along the horizontal axis and the numerical quantity of interest along the vertical axis yields a point on the graph for each observation; Example: Annual University Enrollment and GDP over a period of a year
Frequency Distribution
table that summarizes data by listing the classes in the left column and the number of observations in each class in the column
Cumulative Frequency Distribution
a distribution that contains the total number of observations whose values are less than the upper limit for each class
Relative Cumulative Frequency Distribution
a distribution where the cumulative frequencies can be expressed as cumulative proportions or percents
Histogram
a graph that consists of vertical bars constructed on a horizontal line that is marked off with intervals for the variable being displayed; Intervals = each class of frequency distribution table; Height of bar = number of observations in that interval
Ogive
a line that connects points that are the cumulative percent of observations below the upper limit of each interval in a cumulative frequency distribution
Symmetry
a quality of the graph that tells us if the observations are balanced, or approximately evenly distributed, about its center
Skewed
quality of the graph that tells us if the observations are not symmetrically distributed on either side of the center; Skewed-right distribution (positively skewed) = Skewed to the right; Skewed-left distribution (negative skewed) = Skewed to the left
EDA
Exploratory Data Analysis, which is a procedure that describes data in simple arithmetic terms
Stem-and-Leaf Display
an EDA graph that is an alternative to the histogram.
Scatter Plots
a graph used to investigate possible relationships between two numerical variables; a. One variable may depend upon a certain extent on the other variables; Dependent Variables= X; Independent Variables= Y; To prepare a @##@#, locate one point for each pair of two variables that represent an observation in the data set.
