Terms in this set (147)
Facts gathered together for analysis
adding machine similar to odometer
1805-Joseph Marie Jacquard
punch card for textile weaving patterns (Loom)
loom records patterns in punch cards
tabulator/sorter punch cards Amercian census-ABM
automatic feeders Powers tabulators machine comp>UNIVAC>UnisysCorp
Erasable magnetic tape for storage
magnetic drum's begin prototype
A file having no internal hierarchy
A file that has been encrypted for security purposes
An unsorted set of records
The transformation of raw data into useful facts
an index built over a field
A card that is perforated and can hold commands or data
Information with a high degree of organization, organized in rows and columns
Information that does not have structure (such as text) cannot be stored in rows and text such as email, sticky notes, business cards IM's...
structured and unstructured data, large amount of data to handle
stored as strips of binary numbers
It is a relationship between two entity types. (simplest relationship)
(Means Count) It is the maximum number of entities that can be involved in a particular relationship ie: one-one, one-many, many-many (Closest to entity cannot be zero, crows foot)
E-R Model (entity-relationship)
It is well named, as it diagrams entities (together with their attributes) and the relationships among them
It is a data that describes a many-to-many relationship.
It is a minimum number of entity occurrences that can be involved in a relationship. (farthest from entity can be zero)
One-to-One Binary Relationship
It means that a single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
Simple Linear Index
very close in structure to the index in the back of a book.
A relationship that involves three different entity types.
It is an associate occurrence of an entity type with other occurrences of the same entity type.
It is used to uniquely identify each record in a database table.
trying to refer data in one relation in the DB, data staying the same across tables when you insert/update/delete
a property of, a characteristic of, or a fact that we know about an entity
B+ tree index
family tree organizational method better to insert/delete records.
Each index record, at every level of the tree, contains space for the same number of key value/pointer pairs. This index record capacity is arbitrary, but once it is set, it must be the same for every index record at every level of the index.
Candidate key (alternate key)
If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key. unique, non-volatile, minimal. A candidate key is a candidate to become a primary key.
2 or more attributes are combined 2 columns (ie: flight # and flight date)
It includes protecting the physical hardware environment, defending against hacker attacks, encrypting data transmitted over networks, educating employees on the importance of protecting the company's data, and many more.
If two or more users are trying to update a particular record simultaneously, they run the risk of generating what is known as a "concurrency problem."
Improves DB query performance. Data structure that contains a copy of some of the data from one or more existing DB tables.
It refers to the ability to tie together pieces of related data within an information system.
It refers to the same fact about the business environment being stored more than once within an information system.
It involves the fetching of desired data from a database.
It is a retrieval of a single record of a file or a subset of the records of a file based on one or more values of a field or a combination of fields in the file.
A "thing" or "object" in our environment that we want to keep track of. Something of importance (theme) needs to be represented in a DB. You want to track using attributes IE Employee: #, name, dep.
A collection of entities of the same type (e.g., all the company's employees) is called an entity set.
combines two or more tables based on a column that is common to the tables.
The columns representing the facts are called fields.
The entire structure is called a file.
when in a collection of relations that make up a relational database, an attribute or group of attributes serves as the primary key of one relation and also appears in another relation
value of one attribute depends on the value of another attribute (relationships of attributes) Defining associations.
one or more columns of a relation whose values are used to identify a row
It specifies how to relate tables in the query
Logical sequential access
the records are retrieved in order based on the values of one or a combination of the fields.
matches each row in a table against each row in another table based on common values found in columns sharing a common name and data type.
data value may be shared among several rows ie: Dep same for multiple employees
no data exists in the cell, can be an anomaly/ambigious not "zero"
adds an attribute that accounts for data integration issue
Physical sequential access
the records are retrieved one after the other, just as they are stored on the disk device.
It is an attribute or group of attributes whose values are unique throughout all rows of the relation. Chosen to be the main key.
a collection of related data items
structural definition of every record in file
specific record of file
data and information systems are aggressively used to redesign business processes for maximum efficiency
every value of a foreign key must match a value of an existing primary key
2D table that has specific characteristics.
-rows must contain instances of an entity.
-Columns is an attribute
-Cells hold a single value
-all values in a column use same data type
-each column has a unique name
-order of row or column is unimportant
-no two rows can be the same
are tables not all tables are relations
It is a formal system for manipulating relations.
Retrieve or Read
It refers to the data without changing it.
It means the retrieval of all or a portion of the records of a file one after another, in some sequence, starting from the beginning, until all the required records have been retrieved
Simple linear file
It is a collection of records listed one after the other on a long line
used when no primary key can be identified no value when numbers are used ie employee id #
non-key attribute is functionally dependent on another.
data value is unique for each row ie: employee ID
add all necessary records underneath the related row
4 Major DBMS approaches
4- Object oriented
* (SQL) in the SELECT clause
It indicates that all attributes of the selected row are to be retrieved
AND (SQL) operator
It displays a record if more than one condition is true.
AVG ( ) (SQL) function
It returns the average value of a numeric column.
BETWEEN (SQL) operator
It allows you to specify a range of numeric values in a search
It is operationalized with a data definition language (DDL), involves instructing the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, and so forth.
DDL data definition langauge
instructing the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, and so forth.
There are two aspects: data definition and data manipulation
DML data management language
users communicate data manipulation command to the DBMS; select, insert, update, delete, merge etc.
DISTINCT (SQL) operator
It is used to eliminate duplicate rows in a query result.
Inserts higher level language program then executes as program in DB
IN (SQL) operator
It allows you to specify a list of character strings to be included in a search.
JOIN (SQL) clause
It is used to combine rows from more than one table, based on a common field between them.
LIKE (SQL) operator
It allows you to specify partial character strings in a "wildcard" sense.
where empName like DA%; -match any# of chracters
where empName like '657-287-____"; - match single character
OR (SQL) operator
It displays a record if either the first condition OR the second condition is true
ORDER BY (SQL) clause
It simply takes the results of a SQL query and orders them by one or more specified attributes, sorts the rows by one or more columns.
goes directly to DB
From : DB (name) Table
Where: Row Table; (; always include semicolon at end) and, or, between, like, join, in, distinct, order by, avg sum, +, <,>
Select commperct, yearhire From Salesperson where SPNUM+ 186;
Sub query command
From : DB (name) Table
Where: Row Table = (Select: From : Where: );
Relational select operation
retrieves one or more row of a tale whether the search is unique or a non-unique attribute
Relational project operation
retrieves all rows and some attributes of a table without using a where clause to limit rows
SELECT (SQL) command
Data retrieval in SQL is accomplished with the SELECT command.
Structured Query Language
It is a standard language for data management in relational databases, known as Structured Query Language or SQL.
When one SELECT statement is "nested" within another in a format, it is known as subquery.
CREATE TABLE command
The command that creates base tables and tells the system what attributes will be in them
CREATE VIEW command
Within this command, you specify the base table(s) on which the view is to be based and the attributes and rows of the table(s) that are to be included in the view.
you have to specify which row(s) of a table are to be deleted based on data values within those rows
DROP TABLE command
It is used to discard a table from a database.
DROP VIEW command
Views are discarded
First normal form
exists when all the columns in a table are atomic, i.e., only a single value is allowed in each column. All the attributes are listed in one table and a primary key is selected.
It is used to add a new record in a table. (data)
It is the process of organizing the fields and tables of a relational database to minimize redundancy (duplication) and dependency. Organizes attributes into tables so that redundancy among non-key attributes is eliminated. (Decomposition process). Produces E-R diagrams.
Second normal form
Second normal form (2NF) is the second step in normalizing a database. 2NF builds on the first normal form (1NF). In second normal form, all non-key attributes must be fully functionally dependent on the entire key of that table. will decompose table one into smaller table and reduce data redundancy.
Third normal form
In third normal form, non-key attributes are not allowed to define other non-key attributes.
you have to identify which row(s) of a table are to be updated based on data values within those rows. Then you have to specify which columns are to be updated and what the new data values of those columns in those rows will be.
evaluating relationships or associations between data elements. used as a way to create product packages that have appeal to large market segments.
The transformation of raw data into useful information. Turns data into information into knowledge into plans.
-increased profitability -financial value -productivity -trust value -risk value -increased throughput with decreased workloads
Predictive - maps data into predefined groups or classes
Descriptive - groups similar data together into clusters. Dividing large collection of entities into smaller groups of similar entities.
*centralized repository of information
* organized around relevant data to organization
* platform for different consumers to submit queries
* used for analysis
* target location internal/external enterprise
*primary source of information that feeds analytical processing
Data Cube (OLAP)
uses a nested hierarchical structure to provide an aggregate view of data variables
Description (data mining)
trying to explain the results of the data mining process
A maintained table of facts, each of which is related to a set of dimensions,
assigning some continuously valued numeric value to an object
the process of extracting raw data and then transforming and loading into a target to be used with Business intelligence.
1. Get the data from the source location.
2.Map the data from its original form into a data model that is suitable for manipulation at the staging area.
3. Validate and clean the data.
4. Apply any transformations to the data that are required before the data sets are loaded into the repository.
5. Map the data from its staging area model to its loading model.
6. Move the data set to the repository.
7.Load the data into the warehouse.
raw data in -> transform
What data should be extracted?
How should that data be extracted?
-Data type conversion.
-Referential integrity checking.
-Denormalization and renormalization.
-Refresh volume and frequency
-Target dependencies, such as where and on how many machines the repository lives, and the specifics of loading data into that platform.
OLAP (online analytical data processing)
"multidimensional" "slice and dice" drill down method presents data from a data warehouse so consumers can compare data metrics
classifying expected future behavior in DB
Descriptive - maps data into subsets with associated simple descriptions or generalizations.
The use of a network of remote hosted servers on the internet to store and manage data.
decreases IT cost, business agility, flexible scaling, High availability, security risk.
Intelligent Storage System
the use of 4 components to retrieve data
front-end<->cache<-> back-end<-> physical disk.
It is intelligent because data can be retrieved from the cache if it has already been requested.
infrastructure as a service. offers computer power and storage capacity. Organization is responsible for entire IT infrastructure, host and Applications.
platform as a service. Organization is responsible for development maintenance and management of applications.
software as a service. Ready to use.
dimensional model:relationships between the fact table and the dimensions resemble a star
the snowflake relationship is one-to-many, "inward" towards the center of the star
Redundant Array of Independent/Inexpensive Disk - a term used to describe the use of multiple disk drives to store data for performance and redundancy
used to boost servers performance (disk striping). no data saved if drive failure.
Disk mirroring no internal hierarchy, no disk performance increase.
combination of raid levels ie: raid 1+0
striped set with parallel access and a dedicated parity disk
striped set with independent disk access and a dedicated parity disk
striped with parity better performance uses 3 or more disks
is a technique to spread data across multiple drives to use the drives in parallel
is a technique whereby the same data is stored on two different disk drives, yielding two copies of the data.
is a method to protect striped data from disk drive failure without the cost of mirroring. An additional disk drive is added to hold parity, a mathematical construct that allows re-creation of the missing data.
RAID controllers (Array)
▪Management and control of disk aggregations
▪Translation of I/O requests between logical disks and physical disks
▪Data regeneration in the event of disk failures
2 Drawbacks of software raid
1. decrease system performance
2. all raid levels are not supported
Create,read, update, delete
4 Components to a DB system
User <-> DB application <-> DBMS <-> DB
Major problems with simple lists
-more opportunity for wrong information
-more than one theme
-must update multiple cells instead of one (anomaly)
-delete cells can lose all information (anomaly)
-insertion "missing information" empty cells (anomaly)
-stores information in 2d tables that relate to each other
-One big list broke apart (must be able to be joined back together)
-PK primary key and Foreign key
-can create user interfaces forms and reports