STA 125 Chapter 1
Terms in this set (37)
Variable
Any characteristic observed in a study
Data
The values of a variable for one or more people or things
Observation
Each individual piece of data
Data set
The collection of all observation for a particular variable
A variable can also be called what?
A case
Categorical Variable?
A non-numerical variable with different categories
Quantitative Variable?
numerical variable
Types of quantitative variables?
discrete
continuous
Discrete quantitative variables
possible values from a set of separate numbers
ex: shoe sizes
something you can count
Continuous quantitative variables
form a continuum of values over the real number line
ex: height, time
something you can measure
Why is it important to identify different data types?
1. to choose appropriate graphical displays
2. to choose correct method/statistical procedure
Five W's and an H
Who
What
Where
When
Why
How
Who:
respondents
What:
variable
where:
where data was collected or found
when:
when the data were collected or when study was published
Why:
What we hope to learn from analyzing the variables
How:
How was the data collected
What is statistics?
a way of reasoning, along with a collection of tools and methods, designed to help us understand the world
What is Data?
values along with their context
What is Big Data?
data sets so large that traditional methods of storage and analysis are inadequate
What is Data mining?
process of using data to make decisions and predictions
Rows correspond to ____
individual cases
Columns correspond to ____
what has been recorded, records
Respondents
individuals who answer a survey
Subjects
people who we experiment on
Participants
People on whom we experiment
Experimental units
inanimate objects experimented on
metadata
typically contains info about how, when and where data was collected
categorical variables can also be called ___
qualitative variables
ordinal data
when the values of a categorical variable have an intrinsic value
nominal data
a categorical variable with unordered categories
Time series
an ordered sequence of a single quantitative variable
measure at regular intervals over time
cross-sectional data
where several variables are measured at the same point in time
In performing data analysis you NEED to know ____
why you are looking at the data, what you want to know and who each row of data corresponds
Ethical dilemmas in data analysis:
1. multiple perspectives
2. sharing some info while hiding other info through manipulation in "unethical"
3. misrepresentation of data
identifier variables:
categorical variables whose only purpose is to assign a unique identifier code to each individual in the data set
