Chapter 1: Data and Statistics
Terms in this set (69)
Statistics
numerical facts
that help us understand different
business and economic
situations
Statistics
(4)
(broader sense)
art and science
of
 collecting
 analyzing
 presenting
 interpreting data
Why understanding stats is
important in
business and economics
?
 get
better understanding
of environment THUS make
better decisions
6 functions
to use statistics
1. accounting
2. finance
3. marketing
4. production
5. economics
6. Information Systems
#1 FUNCTIONS
accounting
 who uses it
 what do they use
 when is it used
 example AR balance
WHO
public accounting firms
WHAT
sampling procedure
WHEN
auditing clients
ex) ACCT firm wants to know whether
AR on balance sheet
is the same as
actual amount.
Usually there is so many AR accounts so it is very timeconsuming and expensive. If they get a
subset or sample
of it. and then can make a conclusion.
#2 FUNCTIONS
finance
 who use it
 what do they use
 example stocks (3)
WHO
financial analysts
WHAT
investment recommendations
EX
 P/E ratio and dividend yields.
 compare individual stock with market average
 can see whether the stock is over or underpriced
(buy, sell, or hold recommendations)
#3 FUNCTIONS
marketing
 what is it used for
 how do they collect data
 example grocery store (2)
 why do they do this ?
WHAT
market research
HOW
electronic scanners at retail
EX
 use Point of Sale scanner sold to manufacturers
 stat summary for special pricing and instore display
WHY
understand relationship between promotional activities and sales
(better strategies)
#4 FUNCTIONS
production
 what is it used for
 example
WHAt
quality control chart to monitor
output of production process
HOW
xbar
ex) to see if they are
 overfilling
 underfilling
 (when adjustments are necessary to correct production process)
ex. machine fill container with 12 ox drinks. production worker select sample of containers and compute average number of oz in sample. if plotted above chart upper control limit = over fill. below = underfill. it is in control as long as it is in between limits.
#5 FUNCTIONS
economics
 how do they use it
 example of how inflation rates are calculated (3 meh)
HOW
forecast about future of economy
ex)
 Producer price index
 unemployment rate
 manufacturing capacity utilization
#6 FUNCTIONS
Information System
 what is it used for
assess performance of
computer networks
Data
(3)
facts and figures
 collected
 analysed
 summarized
for presentation and interpretation
information
 aka
 ex
AKA
meaningful data or processed data
EX) data alone has no meaning, only when they go together do they have some
data set
 example
ALL data collected in a
particular study
ex. complete table of 25 mutual funds
elements
what
 basically
 example
WHAT
entities on which data are collected
BASICALLY
rows
ex. each individual mutual fund
ex) 25 mutual fund = data set contain 25 elements
variable
 what
 basically
 example
WHAT
characteristic of interest for the elements
BASICALLY
the COLUMNS
ex.
 fund type
 net asset value
 5 year average return
expense ratio
 morningstar rank
observation
 what
 correlated with
 example
WHAT
set of measurements for a particular element
CORRELATED
data set with 25 elements = 25 observations
ex. american centurl intl disc (MF) is IE, 14.36, 30.53, 1.41, adnd 3 star
ex. american century tax free bond  F1, 10.73, 3.34, 0..49, 4 star
OTHERRRR
1. data set with n elements =
2.
total number of data value
in complete data set =
EQUAL
n observations
TOTAL
# elements x # of variables
scale of measurement
 determine
 indicates (2)
DETERMINE
AMOUNT of info in data
INDICATES
most appropriate ....
 data summarization
 statistical analysis
Scales of Measurement
(4)
1. nominal
2. ordinal
3. interval
4. ratio
#1 SCALE OF MEASUREMENT
nominal scale
 what
 numeric vs. nonnumeric
 example
WHAT
data with
labels or names
to identify an attribute of the element
NUM
both
ex. Fund type variable is nominal because DE, IE, and FI are labels
ex. nonnumeric = business, humanities, education school
How can a nominal scale use
numeric code
?
 denote ... (3)
 example
1 denote Domestic Equity
2 denote International Equity
3 denote Fixed Income
ex) 1,2,3 identify the
category of fund.
ex. 1 = business, 2 = humanities, 3 = education FOR SCHOOL VARIABLE
#2 SCALE OF MEASUREMENT
ordinal scale
 requirements (2)
 numeric/nonnumeric
 examples (3)
WHEN
data exhibit the properties of
1. nominal data
2. the
order of rank is meaningful
CAN
both
ex) quality of repair service. rating can be
Excellent, good, or poor
THUS
1.
nominal
 because excellent, good, or poor are labels
2.
ranked
 with service quality (excellent = best service)
ex) Morningstar Rank  it has a rank from 1  5 stars based on their assessment of the fund's risk adjusted return
ex)
with numeric code
class rank in school.
nominal: freshmen, sophomore, junior, senior
order: 1 = freshment , 2 = sophomore
#3 SCALE OF MEASUREMENT
interval scale
 requirement (2)
 numeric or nonnumeric
 example SAT scores
WHEN
if data have all the properties of
1. ordinal data
2. interval is expressed in terms of a
FIXED UNIT of measure
ALWAYS
numeric
EX
 SAT scores: 620, 550, 470
ordinal
ranked or ordered in terms of best performance to poorest.
interval
difference between the score are meaningful
( 620 550 = 70 point more than student 2 while student 2 scored 80 points more than student 3)
#4 SCALE OF MEASUREMENT
ratio scale
 requirements (2)
 difference
 why
 examples of measurements (4)
 example of car
WHEN
data have
1. interval data
2.
ratio of two values is meaningful
DIFF
need a zero value
WHY
show that at zero value = nothing exist
EX
 height
 weight
 time
 distance
ex) cost of car. zero value would mean it is free. but if you compare cost of 30,000 for 1 car to cost of 15,000 for second car the ratio property =
2 times as much
Two ways in which
data can be
classified
1. categorical
2. quantitative
#1 CLASS
categorical data
(QUAL)
 what (2)
 numeric vs. nonnumeric
 2 scale measurements
WHAT
data grouped by
1.
specific categories
2. label and names
NUMERIC
both
MEASUREMENT
 nominal
 ordinal
#2 CLASS
quantitative data
 what (2)
 2 types
 numeric vs. nonnumeric
 2 scale measurements
WHAT
value to indicate
how much or how many
TYPE
1. discrete
2. continuous
NUMERIC
always numeric
MEASUREMNET
 interval
 ratio
#1 QUAN
discrete
how many
#2 QUAN
continuous
how much
How do we know which
Statistical analysis
is appropriate for a particular variable ?
 depends on
depend if a variable is
categorical or quantitative
Categorical variable
 what
 how does it affect statistical analysis
 how to summarize observations (2)
 what does not provide meaningful results?
WHAT
variable with categorical data
STAT
it is limited
SUMMARIZE
 count #
 compute proportion
NOT
arithmetic operations
(CANT GET THE AVERAGE VALUE)
( can't add, subtract, multiply, divide, even when if a numerical code)
Quantitative variable
 what
 how does it affect statistical analysis?
 why is there more?
WHAT
variable with quantitative data
STAT
more alternative possible
WHY
arithmetic operations to get meaningful results
ex. can add or divide by the number of observations to compute an
average value
Crosssectional data
data COLLECTED the at the
same
or approximately the same point in time
ex) data for November 2012
Times series data
 what
 3 graph
WHAT
data collected
over several times
periods
GRAPH
 line
 bar
 histogram
ex. data for last 36 month
which graph is found in
business and economic publications
?
 which
 why (3)
WHICH
time series
WHY
 what happened in the past
 trend
 project future levels
Where can data be obtained (2)
1.
existing sources
2.
statistical study
5 Types of
Existing Sources
1. firms (internal)
2. business database
3. government agency
4. industry association/special interest
5. internet
#1 SOURCE
internal company records
 what type of information (3)
 how do they get it (2)
 examples (read)
WHO
employee
 customer
 operations
GET IT
 lease it
 purchasing
ex)
 salaries
 ages
 year of experiences
 sales
 advertising/ distribution cost
 inventory level
 production quantities
#2 EXIST SOURCE
3 firms that provide extensive
business databased services
(+1 advertising/ product manufacturers)
1. dun and bradstreet
2. bloomberg
3. dow jones and company
 ACNielsen and information Resource sInc.
#3 SOURCE
government agencies
 examples
ex)
US department of Labor
employment rates, wage rates, labor force , union membership
OTHERS:

census bureau
(population, household)

Federal reserve board
(money supply, exchange rate, discountrate)

office of managemnet and budget
(revenue, expenditure, debt)

dept of commerce
( business activity, shipment, level of profits by industry)

bureau of labor stats
( consumer spending, hourly earning, unemployment rate
#4 SOURCE
industry association/ special interest organizations
 cost
 examples of
industry association
 example of
special interest
COST
modest
ex) travel industry association  number of tourist and travel spending by state
ex) graduate management admission council  test score, demographics, and programs
#5 SOURCE
How has the
internet
grown as an
important source of data? (2)
HOW
 all companies have websites that provide general info
( sales, employees #, product #, prices, and product specifications )
 companies specialize in making info available over internet
(ex. stock quote, resturant price, salary data)
#2 DATA SOURCE
Statistical study
 when used
 2 type
WHEN
cant get data from existing sources
TYPE
 experimental
 observational
#1 STAT STUDY
experimental
 2 steps
 largest example
1. identify
variable of interest
2. identify other variables and control it to see how
they influence the variable of interest
LARGEST EX
polio vaccine 2 million kids
ex) pharma want to conduct experiment about new drug affect on blood pressure. so
blood pressure = variable of interest. dosage level = other variable that they hope to have a causal effect on it. researcher get sample of people. they control dosage level of new drug is controlled while different groups are given different dosage levels.
#2 STAT STUDY
nonexperimental/ observational
 different from experimental
 most common
 example (look)
DIFF
do NOT try to control variable of interest
MOST COMMON
survey
ex) smoker vs. nonsmoker because do not determine who will smoke and who will not.
ex) resturant give survey for quality of food/ service
OVERALL SOURCES
 downfall/ need to be aware of for observational
 when is
existing source data
desirable
 rule of law
AWARE
of time and cost
WHEN
need data in short period of time.
RULE
cost of data acquisition and stat analysis SHOULD NOT EXCEED saving (benefit) generated by using info
3 Data Acquisition Considerations
1. time requirement
2. cost
3. data error
#1 CONSIDERATION
time requirement
 2 downfalls
 takes forever
 may not be useful by time it is available
(took too long)
#2 CONSIDERATION
cost of acquisition
 company will charge for info even if not their primary
#3 CONSIDERATION
Data Acquisition Errors
 why is it bad
 when does it occur
 usually most errors occur
 reasons (2)
 examples (look)
BAD
can be worse than not using any data at all
OCCUR
data value you get is NOT EQUAL to actual value that you would have got using
correct
procedure
USUALLY
data acquisition
REASON
 blindly using data that is available
 use data that were acquired with little care
ex)
 recording error like writing the age of 24 yo as 42
 person misinterpret question
ex) SOLUTION: procedures like looking at outliers
 someone with 22 years of age should not have 20 years of experience
Descriptive statistics
 what
 types (3)
WHAT
summary of data that is presented in a
form that is easy to understand
ex)
1.
tabular
(table)
2.
graphical
(bar chart, histogram (bars touching)
3.
numerical
(average)
Average/ Mean
 aka
 calculations
 purpose
AKA
most common descriptive
CALC
add all and divide by sum number
PURPOSE
measure
central tendency/location
Population
set of
all elements
of interest in a particular study (LARGE)
Sample
 what
 why is it used
subset of population (SMALL)
WHY
save time, cost
Census
 used for
 what
USED
population
WHAT
process of
conducting a survey
to collect data for the ENTIRE population
Sample survey
process of conducting a survey to collect data for a sample
statistical inference
WHAT
use data from SAMPLE to
make ESTIMATE and test hypotheses
about the characteristics of a population
ex) made new lightbulb.
population = all lightbulbs that can be produced with this new filament
if you want to measure advantage of new filament, 200 bulbs with new filamnet were made and tested. Data collected showed number of hours each lightbulb operated before burnout.
Now he wants to use sample to make an inference about average hour os useful life for the POP of all lightbulbs. He would
add 200 values in table, and divide by total 200
to get
SAMPLE average lifetime for the lightbulbs = 76 hours. so POP average lifetime for lightbulbs is also 76 hours.
When you use a
sample to estimate population
what do you need to provide a statement on (3)
1. quality
2. precision
(can also state confidence)
ex) POP average life for new lifebulbs is 76 hours with
margin of error
of +/ 4 hours. thus interval estimate of average life for ALL lightbulbs is 72 hr  80 hr
Computers and Statistical Analysis
 when do you use computers to perform stat computation
WHEN
large amount of data because it is tedious
ex) average lifetime for 200 lightbulbs would be tedious
Examples of Data collection
 daily basis (3)
 small restaurants
 issues (3)
DAILY
1. magnetic card reader
2. bard code scanner
3. point of sale terminal
SMALL
 touch screen monitors to enter order/ billing
ISSUE
 hard to conceptualize sheer volume of info
 figure out how to effectively use it to improve profitability
 storing and managing transaction data
ex) Walmart (mass retailer) get 2030 million transaction every day
ex) France telecom = 300 million call record
Visa = 6800 payment per second or 600 million transacation
Data Warehousing
 what (3)
 how can we retrieve/ store large quantities in seconds (2)
 benefits of data warehousing analysis (2)
WHAT
process of
 capturing
 storing
 maintaining
data
HOW
 computer
 data collection tools
BENEFIT
better decisions for
 new strategies
 higher profits
Data mining
 what
 combines (3)
 keywords (2)
 most effective data mining system
 benefit
 applies well with these type of companies
 example of companies (3)
 stats method it relies on (3)
 effective (read)
WHAT
analyst mine the data in warehouse(large database) to convert it into useful info
COMBO
 stat
 math
 computer scient
key words
 automated
 predictive
MOST EFFECTIVE
use AUTOMATED procedures to extract info from the data using only the
most general or even vague queries
by the user
BENEFIT
automate process of
uncovering hidden predictive info
that before needed to be done by hand
APPLY
strong consumer focus
EX
 retail
 financial
 communication company
RELIES
stat methods like
 multiple regression
 logistic regression
 correlation
EFFECTIVE
creative integration of these methods, computer science, AI, machine learning
ex) amazon/barnes and nobles help determine one or more related products that customer who have already purchased a product are also likely to purchase
ex) can identify people who are liekly to spend more than $20 on shopping trip they get special e mail or regular discount offers
Theraling definition of
data mining
automated extraction
of PREDICTIVE information from LARGE databases
Issues with data mining
 commercial data mining software package
 because it helps develop predictive model. .. an issue is
PACKAGE
need a lot of
time and money
B/C
model reliability
: might work well with one sample doesnt mean it will apply well for others
Statistic approach for
evaluating model reliability
 approach (2)
 rule for
reliability
divide the sample data into two parts
1. training data set
2. test data set
RULE
if model using training data is able to ACCURATELY predict values in test data, than it is reliable.
Advantage vs. Disadvantage
of data mining over classical statistics
ADV
enormous amount of data available > model developed for the
training data
may be tested for reliability on other data.
(data mining can develop model/relationships and then quickly observe if they are repeatable and valid with new and different data )
DIS
so much data available , there is danger of
overfitting
model to the point of
misleading association and cause/effect
Ethical Guidelines
 why should we care
 examples of
unethical behavior
(5)
WHY
stat important for collecting, analyzing, presenting, and interpreting data
EX
 improper sampling
 inappropriate analysis of data
 development of misleading graphs
 inappropriate summary stat
 biased interpretation of results
Ethics
1.
what should you be aware of others(3)
2. national leading org for statistics
3. report to help make ethical decision
 areas covered in report (8)
4.
professionalism issue
5.
publication/testimony issue with handling data
(2)
7.
shared value issue
8. typically occur when
HOW
 sources
 purpose
 objectivity
of other data
ASSOCIATION
american statistical association
REPORT
ethical guideline for stats practice (67 guidelines)
eight topics
 professionalism
 responsibilities to funders client employers
 responsibilities in publication and testimony
responsible to research subject
 responsible to research team colleagues
 responsible to other stats people
 responsible regarding allegations of misconduct
 responsible of employer including org, individual, attorney, or the client employing stat practitioner
PROFESSIONALISM
issue of running multiple test until desired result is obtained
PUBLICATION/TESTIMONY
 must account for all data considered and explain how sample is actually used
ex. Norris discard all lightbulbs with 70 or less hours due to "imperfections" to bump average to 82
SHARE VALUES
 dont slant stat work toward predetermined outcome
TYPICALLY
unrepresentative samples are used to make claims
ex. smoking not permitted in restaurants but lobbyist interview people where smoking is permitted to show who is in favor of allowing more smoking in restaurant so they claim 90% of people are in favor.
EXAMPLE:
1. This trip to Hawaii is my: 1st, 2nd, 3rd, 4th, etc
2. The primary reason for this trip is: vacation, convention, honeymoon)
3. Where I plan to stay: (hotel, apartment, relatives, camping)
4. Total days in Hawaii
5. If you are asked about your age
6. Ask about social security number
7. Male vs. female
8. Type of vehicle you drive
9. Size of soft drink
10. Annual sales of department
1. quantitative
2.Categorical (qual)
3. Categorical (qual)
4. quantitative
5. QUAN
6. QUAL(represent when you were born)
7. QUAL
8. QUAL
9. QUAL
10. QUAN
;