Study sets, textbooks, questions
Upgrade to remove ads
Module Lesson 1
Terms in this set (19)
The degree to which the data conform to defined business rules or constraints
Values in a particular column must be of a particular datatype, e.g., boolean, numeric, date, etc.
certain columns cannot be empty
typically, numbers or dates should fall within a certain range
a field, or a combination of fields, must be unique across a dataset.
values of a column from a set of discrete values, e.g. a person's gender may be male or female.
as in relational databases, a foreign key column can't have a value that does not exist in the referenced primary key
Regular Expression Patterns
text fields that have to be in a certain pattern
Certain conditions that span across multiple fields must hold. For example, a patient's date of discharge from the hospital cannot be earlier than the date of admission
The degree to which the data is close to the true values
The degree to which all required data is known
The degree to which the data is consistent, within the same data set or across multiple data sets
The degree to which the data is specified using the same unit of measure.
What are the steps in a data cleaning workflow?
Detect unexpected, incorrect, and inconsistent data.
Fix or remove the anomalies discovered
After cleaning, the results are inspected to verify correctness.
A report about the changes made and the quality of the currently stored data is recorded
Scaling / Transformation
Scaling means to transform your data so that it fits within a specific scale, such as 0-100 or 0-1
Other sets by this creator
Chapter 2:Paradigms, Theory, Research
Ch 1: Human Inquiry and Science
Other Quizlet sets
Week One Physiology Cards
Lesson 13 Vocab Notes