C175

STUDY
PLAY

Terms in this set (...)

ad hoc query
A "spur-of-the-moment" question.
analytical database
A database focused primarily on storing historical data and business metrics used for tactical or strategic decision making.
availability
In the context of data security, it refers to the accessibility of data whenever required by authorized users and for authorized purposes.
Centralized
A database located at a single site.
cloud database
A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS.
data
Raw facts, or facts that have not yet been processed to reveal their meaning to the end user.
data anomaly
A data abnormality in which inconsistent changes have been made to a database. For example, an employee moves, but the address change is not corrected in all files in the database.
data dependence
A data condition in which data representation and manipulation are dependent on the physical data storage characteristics.
data dictionary
A DBMS component that stores metadata—data about data. Thus, the data dictionary contains the data definition as well as their characteristics and relationships. A data dictionary may also include data that are external to the DBMS. Also known as an information resource dictionary. See also active data dictionary, metadata, and passive data dictionary.
data inconsistency
A condition in which different versions of the same data yield different (inconsistent) results.
data independance
A condition in which data access is unaffected by changes in the physical data storage characteristics.
data integrity
In a relational database, a condition in which the data in the database complies with all entity and referential integrity constraints.
data management
A process that focuses on data collection, storage, and retrieval. Common data management functions include addition, deletion, modification, and listing.
data processing (DP) specialist
The person responsible for developing and managing a computerized file processing system.
data redundancy
Exists when the same data is stored unnecessarily at different places.
database
A shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw facts) and metadata.
database design
The process that yields the description of the database structure and determines the database components. The second phase of the Database Life Cycle.
database management system (DBMS)
The collection of programs that manages the database structure and controls access to the data stored in the database.
database system
An organization of components that defines and regulates the collection, storage, management, and use of data in a database environment.
desktop database
A single-user database that runs on a personal computer.
discipline-specific databases
A database that contains data focused on specific subject areas.
Enterprise Database
The overall company data representation, which provides support for present and expected future needs.
field
An alphabetic or numeric character or group of characters that defines a characteristic of a person, place, or thing. For example, a person's Social Security number, address, phone number, and bank balance all constitute fields.
file
A named collection of related records.
general-purpose databases
A database that contains a wide variety of data used in multiple disciplines.
hub
A warehouse of data packets housed in a central location on a local area network. It contains multiple ports that copy the data in the data packets to make it accessible to selected or all segments of the network.
Information
The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making.
islands of information
In the old file system environment, pools of independent, often duplicated, and inconsistent data created and managed by different departments.
knowledge
The body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.
logical data format
The way a person views data within the context of a problem domain.
metadata
Data about data; that is, data about data characteristics and relationships. See also data dictionary.
multiuser database
A database that supports multiple concurrent users.
noSQL
A new generation of database management systems that is not based on the traditional relational database model.
online analytical processing (OLAP)
Decision support system (DSS) tools that use multidimensional data analysis techniques. OLAP creates an advanced data analysis environment that supports decision making, business modeling, and operations research.
online transaction processing (OLTP) database
See operational database.
Operational Database
A database designed primarily to support a company's day-to-day operations. Also known as a transactional database, OLTP database, or production database.
performance tuning
Activities that make a database perform more efficiently in terms of storage and access speed.
physical data format
The way a computer "sees" (stores) data.
production database
See operational database.
query
A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user or the application to the DBMS.
query language
A nonprocedural language that is used by a DBMS to manipulate its data. An example of a query language is SQL.
query result set
The collection of data rows returned by a query.
record
A collection of related (logically connected) fields.
role
n Oracle, a named collection of database access privileges that authorize a user to connect to a database and use its system resources.
router
(1) An intelligent device used to connect dissimilar networks.
(2) Hardware/software equipment that connects multiple and diverse networks.
semistructured data
Data that has already been processed to some extent.
single-user database
A database that supports only one user at a time.
social media
Web and mobile technologies that enable "anywhere, anytime, always on" human interactions.
structural dependence
A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs.
structured data
Data that has been formatted to facilitate storage, use, and information generation.
Structured Query Language (SQL)
A powerful and flexible relational database language composed of commands that enable users to create database and table structures, perform various types of data manipulation and data administration, and query the database to extract useful information.
transactional database
See operational database.
unstructured data
Data that exists in its original, raw state; that is, in the format in which it was collected.
website
Refers to the Web server and the collection of Web pages stored on the local hard disk of the server computer.
workgroup database
A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization.
World Wide Web (WWW or the web)
Worldwide network collection of specially formatted and interconnected documents known as Web pages. Also called the Web.
XML database
A database system that stores and manages semistructured XML data.
3 Vs
Three basic characteristics of Big Data databases: volume, velocity, and variety.
abstract data type (ADT)
Data type that describes a set of similar objects with shared and encapsulated data representation and methods. An abstract data type is generally used to describe complex objects. See also class.
American National Standards Institute (ANSI)
The group that accepted the DBTG recommendations and augmented database standards in 1975 through its SPARC committee.
application programming interface (API)
Software through which programmers interact with middleware. An API allows the use of generic SQL code, thereby allowing client processes to be database server-independent.
attribute
A characteristic of an entity or object. An attribute has a name and a data type.
balancing
Ensuring that the processing load is distributed evenly among multiple servers.
Big Data
A movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost.
Chen notation
See entity relationship (ER) model.
class
A collection of similar objects with shared structure (attributes) and behavior (methods). A class encapsulates an object's data representation and a method's implementation. Classes are organized in a class hierarchy.
class diagram notation
The set of symbols used in the creation of class diagrams in UML object modeling.
class diagrams
A diagram used to represent data and their relationships in UML object notation.
class hierarchy
The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. See also inheritance.
client
Any process that requests specific services from server processes in a client/server environment.
client node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts as the interface between the user application and the HDFS. See also name node and data node.
complex object
An object formed by several different objects in complex relationships. See also abstract data types.
conceptual model
The output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details.
conceptual schema
A representation of the conceptual model, usually expressed graphically. See also conceptual model.
connectivity
The type of relationship between entities. Classifications include 1:1, 1:M, and M:N.
constraint
A restriction placed on data, usually expressed in the form of rules. For example, "A student's GPA must be between 0.00 and 4.00." Constraints are important because they help to ensure data integrity.
Crow's Foot notation
A representation of the entity relationship diagram that uses a three-pronged symbol to represent the "many" sides of the relationship.
data definition language (DDL)
The language that allows a database administrator to define the database structure, schema, and subschema.
data manipulation language
The set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK.
data model
A representation, usually graphic, of a complex "real-world" data structure. Data models are used in the database design phase of the Database Life Cycle.
data modeling
The process of creating a specific data model for a determined problem domain.
data node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node stores fixed-size data blocks (that could be replicated to other data nodes). See also client node and name node.
entity
A person, place, thing, concept, or event for which data can be stored. See also attribute.
entity instance
A row in a relational table. Also known as entity occurrence.
entity occurrence
A row in a relational table. Also known as entity instance.
entity relationship (ER) model
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen.
entity relationship diagram (ERD)
A diagram that depicts an entity relationship model's entities, attributes, and relations.
entity set
A collection of like entities.
ERM
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen.
eventual consistency
A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually.
extended relational data model
A model that includes the object-oriented model's best features in an inherently simpler relational database structural environment. See extended entity relationship model (EERM).
Extensible Markup Language (XML)
A meta-language used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet.
external model
The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema.
external schema
The specific representation of an external view; the end user's view of the data environment.
Hadoop
A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.
hardware independence
A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level.
hierarchical model
An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it.
inheritance
In the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy. See also class hierarchy.
internal model
In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internal model is the representation of a database as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
internal schema
A representation of an internal model using the database constructs supported by the chosen database.
Internet
A global network of computers connected together through a standard network protocol known as Transmission Control Protocol/Internet Protocol (TCP/IP). You can think of the Internet as the "highway" on which the data travel. The terms Internet and World Wide Web are often used interchangeably, but they are not synonyms.
key-value
A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. The key-value data model is also called the associative or attribute-value data model.
many-to-many (M:N or ..) relationship
Association among two or more entities in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrences of the first entity.
MapReduce
An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores.
method
In the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions, and are invoked through messages.
name node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node stores all the metadata about the file system. See also client node and data node.
network model
An early data model that represented data as a collection of record types in 1:M relationships.
object
An abstract representation of a real world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.
object-oriented data model (OODM)
A data model whose basic modeling structure is an object.
object-oriented database management system (OODBMS)
Data management software used to manage data in an object-oriented database model.
object/relational database management system (O/R DBMS)
A DBMS based on the extended relational model (ERDM). The ERDM, championed by many relational database researchers, constitutes the relational model's response to the OODM. This model includes many of the object-oriented model's best features within an inherently simpler relational database structure.
one-to-many (1:M or 1..*) relationship
Associations among two or more entities that are used by data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity.
one-to-one (1:1 or 1..1) relationship
Associations among two or more entities that are used by data models. In a 1:1 relationship, one entity instance is associated with only one instance of the related entity.
relation
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.
relational database management system (RDBMS)
A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data.
relational diagram
A graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities.
relational model
Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns).
relationship
An association between entities.
schema
A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other. Usually, a schema belongs to a single user or application.
segment (SEGM)
In the hierarchical data model, the equivalent of a file system's record type.
subclasses
See class hierarchy.
subschema
The portion of the database that interacts with application programs.
superclass
In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors.
table
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.
tuple
In the relational model, a table row.
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system.
versioning
A property of an OODBMS that allows the database to keep track of the different transformations performed on an object.
associative entity
See composite entity.
bridge entity
See composite entity.
business rule
A description of a policy, procedure, or principle within an organization. For example, a pilot cannot be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four classes during a semester.
candidate key
A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey. See key.
composite entity
An entity designed to transform an M:N relationship into two 1:M relationships. The composite entity's primary key comprises at least the primary keys of the entities that it connects. Also known as a bridge entity or associative entity. See also linking table.
composite key
A multiple-attribute key.
dependent
An attribute whose value is determined by another attribute.
determination
The role of a key. In the context of a database table, the statement "A determines B" indicates that knowing the value of attribute A means that the value of attribute B can be looked up.
domain
In data modeling, the construct used to organize and describe an attribute's set of possible values.
entity integrity
The property of a relational table that guarantees each entity has a unique value in a primary key and that the key has no null values.
flags
Special codes implemented by designers to trigger a required response, alert end users to specified conditions, or encode values. Flags may be used to prevent nulls by bringing attention to the absence of a value in a table.
foreign key (FK)
An attribute or attributes in one table whose values must match the primary key in another table or whose values must be null. See key.
full functional dependence
A condition in which an attribute is functionally dependent on a composite key but not on any subset of the key.
functional dependence
Within a relation R, an attribute B is functionally dependent on an attribute A if and only if a given value of attribute A determines exactly one value of attribute B. The relationship "B is dependent on A" is equivalent to "A determines B," and is written as A B.
index
An ordered array of index key values and row ID values (pointers). Indexes are generally used to speed up and facilitate data retrieval. Also known as an index key.
index key
See index.
key
One or more attributes that determine other attributes. See also superkey, candidate key, primary key (PK), secondary key, and foreign key.
key attribute
The attributes that form a primary key. See also prime attribute.
linking table
In the relational model, a table that implements an M:M relationship. See also composite entity.
null
The absence of an attribute value. Note that a null is not a blank.
predicate logic
Used extensively in mathematics to provide a framework in which an assertion (statement of fact) can be verified as either true or false.
primary key (PK)
In the relational model, an identifier composed of one or more attributes that uniquely identifies a row. Also, a candidate key selected as a unique entity identifier. See also key.
prime attribute
A key attribute; that is, an attribute that is part of a key or is the whole key. See also key attributes.
referential integrity
A condition by which a dependent table's foreign key must have either a null entry or a matching entry in the related table.
secondary key
A key used strictly for data retrieval purposes. For example, customers are not likely to know their customer number (primary key), but the combination of last name, first name, middle initial, and telephone number will probably match the appropriate table row. See also key.
set theory
A part of mathematical science that deals with sets, or groups of things, and is used as the basis for data manipulation in the relational model.
superkey
An attribute or attributes that uniquely identify each entity in a table. See key.
unique index
An index in which the index key can have only one associated pointer value (row).
binary relationship
An ER term for an association (relationship) between two entities. For example, PROFESSOR teaches CLASS.
cardinality
A property that assigns a specific value to connectivity and expresses the range of allowed entity occurrences associated with a single occurrence of the related entity.
composite attribute
An attribute that can be further subdivided to yield additional attributes. For example, a phone number such as 615-898-2368 may be divided into an area code (615), an exchange number (898), and a four-digit code (2368). Compare to simple attribute.
composite identifier
In ER modeling, a key composed of more than one attribute.
derived attribute
An attribute that does not physically exist within the entity and is derived via an algorithm. For example, the Age attribute might be derived by subtracting the birth date from the current date.
existence-dependent
A property of an entity whose existence depends on one or more other entities. In such an environment, the existence-independent table must be created and loaded first because the existence-dependent key cannot reference a table that does not yet exist.
existence-independent
A property of an entity that can exist apart from one or more related entities. Such a table must be created first when referencing an existence-dependent table.
identifiers
One or more attributes that uniquely identify each entity instance.
iterative process
A process based on repetition of steps and procedures.
mandatory participation
A relationship in which one entity occurrence must have a corresponding occurrence in another entity. For example, an EMPLOYEE works in a DIVISION. (A person cannot be an employee without being assigned to a company's division.)
multivalued attributes
An attribute that can have many values for a single entity occurrence. For example, an EMP_DEGREE attribute might store the string "BBA, MBA, PHD" to indicate three different degrees held.
non-identifying relationship
A relationship in which the primary key of the related entity does not contain a primary key component of the parent entity.
optional attribute
In ER modeling, an attribute that does not require a value; therefore, it can be left empty.
optional participation
In ER modeling, a condition in which one entity occurrence does not require a corresponding entity occurrence in a particular relationship.
participants
An ER term for entities that participate in a relationship. For example, in the relationship "PROFESSOR teaches CLASS," the teaches relationship is based on the participants PROFESSOR and CLASS.
recursive relationship
A relationship found within a single entity type. For example, an EMPLOYEE is married to an EMPLOYEE or a PART is a component of another PART.
regular entity
See strong entity.
relational schema
The organization of a relational database as described by the database administrator.
relationship degree
The number of entities or participants associated with a relationship. A relationship degree can be unary, binary, ternary, or higher.
required attribute
In ER modeling, an attribute that must have a value. In other words, it cannot be left empty.
simple attribute
An attribute that cannot be subdivided into meaningful components. Compare to composite attribute.
single-valued attribute
An attribute that can have only one value.
strong (identifying) relationship
A relationship that occurs when two entities are existence-dependent; from a database design perspective, this relationship exists whenever the primary key of the related entity contains the primary key of the parent entity.
strong entity
An entity that is existence-independent, that is, it can exist apart from all of its related entities. Also called a regular entity.
ternary relationship
An ER term used to describe an association (relationship) between three entities. For example, a DOCTOR prescribes a DRUG for a PATIENT.
unary relationship
An ER term used to describe an association within an entity. For example, an EMPLOYEE might manage another EMPLOYEE.
weak entity
An entity that displays existence dependence and inherits the primary key of its parent entity. For example, a DEPENDENT requires the existence of an EMPLOYEE.
weak relationship
A relationship in which the primary key of the related entity does not contain a primary key component of the parent entity.
atomic attribute
An attribute that cannot be further subdivided to produce meaningful components. For example, a person's last name attribute cannot be meaningfully subdivided.
atomicity
The transaction property that requires all parts of a transaction to be treated as a single, indivisible, logical unit of work. All parts of a transaction must be completed or the entire transaction is aborted.
Boyce-Codd normal form (BCNF)
A special type of third normal form (3NF) in which every determinant is a candidate key. A table in BCNF must be in 3NF. See also determinant.
denormalization
A process by which a table is changed from a higher-level normal form to a lower-level normal form, usually to increase processing speed. Denormalization potentially yields data anomalies.
dependency diagram
A representation of all data dependencies (primary key, partial, or transitive) within a table.
determinant
Any attribute in a specific row whose value directly determines other values in that row. See also Boyce-Codd normal form (BCNF).
first normal form (1NF)
The first stage in the normalization process. It describes a relation depicted in tabular format, with no repeating groups and a primary key identified. All nonkey attributes in the relation are dependent on the primary key.
fourth normal form (4NF)
A table is in 4NF if it is in 3NF and contains no multiple independent sets of multivalued dependencies.
granularity
The level of detail represented by the values stored in a table's row. Data stored at its lowest level of granularity is said to be atomic data.
nonkey attribute
See nonprime attribute.
nonprime attribute
An attribute that is not part of a key.
normalization
A process that assigns attributes to entities so that data redundancies are reduced or eliminated.
partial dependency
A condition in which an attribute is dependent on only a portion (subset) of the primary key.
repeating group
In a relation, a characteristic describing a group of multiple entries of the same type for a single key attribute occurrence. For example, a car can have multiple colors for its top, interior, bottom, trim, and so on.
second normal form (2NF)
The second stage in the normalization process, in which a relation is in 1NF and there are no partial dependencies (dependencies in only part of the primary key).
third normal form (3NF)
A table is in 3NF when it is in 2NF and no nonkey attribute is functionally dependent on another nonkey attribute; that is, it cannot include transitive dependencies.
transitive dependency
A condition in which an attribute is dependent on another attribute that is not part of the primary key.
alias
An alternative name for a column or table in a SQL statement.
ALTER TABLE
The SQL command used to make changes to table structure. When the command is followed by a keyword (ADD or MODIFY), it adds a column or changes column characteristics.
AND
The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires that all conditional expressions evaluate to true.
AVG
A SQL aggregate function that outputs the mean average for a specified column or expression.
BETWEEN
In SQL, a special comparison operator used to check whether a value is within a range of specified values.
Boolean algebra
A branch of mathematics that uses the logical operators OR, AND, and NOT.
cascading order sequence
A nested ordering sequence for a set of rows, such as a list in which all last names are alphabetically ordered and, within the last names, all first names are ordered.
COMMIT
The SQL command that permanently writes data changes to a database.
COUNT
A SQL aggregate function that outputs the number of rows containing not null values for a given column or expression, sometimes used in conjunction with the DISTINCT clause.
CREATE INDEX
A SQL command that creates indexes on the basis of a selected attribute or attributes.
CREATE TABLE
A SQL command that creates a table's structures using the characteristics and attributes given.
DELETE
A SQL command that allows data rows to be deleted from a table.
DISTINCT
A SQL clause that produces only a list of values that are different from one another.
DROP INDEX
A SQL command used to delete database objects such as tables, views, indexes, and users.
DROP TABLE
A SQL command used to delete database objects such as tables, views, indexes, and users.
EXISTS
In SQL, a comparison operator that checks whether a subquery returns any rows.
FROM
A SQL clause that specifies the table or tables from which data is to be retrieved.
GROUP BY
A SQL clause used to create frequency distributions when combined with any of the aggregate functions in a SELECT statement.
HAVING
A clause applied to the output of a GROUP BY operation to restrict selected rows.
IN
In SQL, a comparison operator used to check whether a value is among a list of specified values.
inner query
A query that is embedded or nested inside another query. Also known as a nested query or a subquery.
INSERT
A SQL command that allows the insertion of one or more data rows into a table.
International Organization for Standardization (ISO)
An organization formed to develop standards for diverse network systems.
IS NULL
n SQL, a comparison operator used to check whether an attribute has a value.
LIKE
In SQL, a comparison operator used to check whether an attribute's text value matches a specified string pattern.
MAX
A SQL aggregate function that yields the maximum attribute value in a given column.
MIN
A SQL aggregate function that yields the minimum attribute value in a given column.
nested query
In SQL, a query that is embedded in another query. See subquery.
NOT
A SQL logical operator that negates a given predicate.
OR
The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires only one of the conditional expressions to be true.
ORDER BY
A SQL clause that is useful for ordering the output of a SELECT query (for example, in ascending or descending order).
ROLLBACK
A SQL command that restores the database table contents to the condition that existed after the last COMMIT statement.
rules of precedence
Basic algebraic rules that specify the order in which operations are performed. For example, operations within parentheses are executed first, so in the equation 2 + (3 5), the multiplication portion is calculated first, making the correct answer 17.
subquery
A query that is embedded (or nested) inside another query. Also known as a nested query or an inner query.
SUM
A SQL aggregate function that yields the sum of all values for a given column or expression.
UPDATE
A SQL command that allows attribute values to be changed in one or more rows of a table.
WHERE
A SQL clause that adds conditional restrictions to a SELECT statement that limit the rows returned by the query.
wildcard character
A symbol that can be used as a general substitute for: (1) all columns in a table (*) when used in an attribute list of a SELECT statement or, (2) zero or more characters in a SQL LIKE clause condition ( % and _ ).
Business intellegence (BI)
A comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making.
cube cache
In multidimensional OLAP, the shared, reserved memory area where data cubes are held. Using the cube cache assists in speeding up data access.
dashboards
In business intelligence, a web-based system that presents key business performance indicators or information in a single, integrated view with clear and concise graphics.
data cube
The multidimensional data structure used to store and manipulate data in a multidimensional DBMS. The location of each data value in the data cube is based on its x-, y-, and z-axes. Data cubes are static, meaning they must be created before they are used, so they cannot be created by an ad hoc query.
data mart
A small, single-subject data warehouse subset that provides decision support to a small group of people.
data visualization
Abstracting data to provide information in a visual format that enhances the user's ability to effectively comprehend the meaning of the data.
data warehouse
An integrated, subject-oriented, time-variant, nonvolatile collection of data in a specialized database that stores historical and aggregated data in a format that provides support for decision making.
decision support system (DSS)
An arrangement of computerized tools used to assist managerial decision making within a business.
dimension tables
In a data warehouse, tables used to search, filter, or classify facts within a star schema.
drill down
To decompose data into more atomic components—that is, data at lower levels of aggregation. This approach is used primarily in a decision support system to focus on specific geographic areas, business types, and so on.
extraction, transformation, and loading (ETL)
In a data warehousing environment, the integrated processes of getting data from original sources into the data warehouse. ETL includes retrieving data from original data sources (extraction), manipulating the data into an appropriate form (transformation), and storing the data in the data warehouse (loading).
governance
In business intelligence, the methods for controlling and monitoring business health and promoting consistent decision making.
key performance indicators (KPIs)
In business intelligence, quantifiable numeric or scale-based measurements that assess a company's effectiveness or success in reaching strategic and operational goals. Examples of KPIs are product turnovers, sales by promotion, sales by employee, and earnings per share.
master data management (MDM)
In business intelligence, a collection of concepts, techniques, and processes for the proper identification, definition, and management of data elements within an organization.
multidimensional database management systems (MDBMSs)
A database management system that uses proprietary techniques to store data in matrixlike arrays of n dimensions known as cubes.
multidimensional online analytical processing (MOLAP)
An extension of online analytical processing to multidimensional database management systems.
portals
In terms of business intelligence, a unified, single point of entry for information distribution.
relational online analytical processing (ROLAP)
Analytical processing functions that use relational databases and familiar relational query tools to store and analyze multidimensional data.
roll up
(1) To aggregate data into summarized components, that is, higher levels of aggregation. (2) In SQL, an OLAP extension used with the GROUP BY clause to aggregate data by different dimensions. Rolling up the data is the exact opposite of drilling down the data.
slice and dice
The ability to focus on slices of a data cube (drill down or roll up) to perform a more detailed analysis.
sparsity
In multidimensional data analysis, a measurement of the data density held in the data cube.
very large databases (VLDBs)
Database that contains huge amounts of data—gigabyte, terabyte, and petabyte ranges are not unusual.
Clustering
It is the task of taking a large collection of entities and dividing that collection into smaller groups of entities that exhibit some similarity.
Estimation
It is a process of assigning some continuously valued numeric value to an object.
Affinity grouping
is a process of evaluating relationships or associations between data elements that demonstrate some kind of affinity between objects.
Description
It is the process of trying to characterize what has been discovered or trying to explain the results of the data mining process.
Prediction
The subtle difference between prediction and the previous two tasks is that prediction is the attempt to classify objects according to some expected future behavior. Classification and estimation can be used for the purposes of prediction by using historical data, where the classification is already known, to build a model (this is called training). That model can then be applied to new data to predict future behavior.

You must be careful when using training sets for prediction. There may be a risk of an inherent bias in the data that may lead you to draw inferences or conclusions that are relevant in relation to the bias. Use different data sets for training and test, test, test!
The difference between clustering and classification
is that during the clustering task, the classes are not defined beforehand.
ETL Stages - Extraction
What data should be extracted?
How should that data be extracted?
ETL Stages - Transformation
Data type conversion
Data cleansing.
Integration.
Referential integrity checking.
Derivations
Denormalization and renormalization.
Aggregation
Audit information.
Null conversion.
ETL Stages - Loading
Target dependencies
Refresh volume and frequency
Fundamental aspects of a data warehouse
A data warehouse is a centralized repository of information.
A data warehouse is organized around the relevant subject areas important to the organization.
A data warehouse provides a platform for different consumers (both human and automated) to submit queries about enterprise information.
A data warehouse is used for analysis and not for transaction processing.
The data in a data warehouse is nonvolatile.
A data warehouse is the target location for integrating data from multiple sources, both internal and external to an enterprise.
data warehouse
is the primary source of information that feeds the analytical processing within an organization.
Using the Dimensional Model for Business Intelligence
Simplicity
Lack of bias.
Extensibility
dimensional model
standard for representing and managing data in a data warehouse
OLAP cube
environment provides an aggregate view of data variables across the dimensions across each dimension's hierarchy.
Why a Business Intelligence Program?
Financial value associated with increased profitability, whether derived from lowered costs or increased revenues;

Productivity value associated with increased throughput with decreased workloads, diminished time for executing end-to-end processes (such as manufacturing or
operational workflows), and increasing the percentage of high quality products or outcomes;

Trust value, such as greater customer, employee, or supplier satisfaction, as well as increasing confidence in forecasting, maintaining consistent operational and
management reports, reductions in time spent in "analysis paralysis," and better results from decisions; and

Risk value associated with improved visibility into credit exposure, confidence in
capital and asset investments, and auditable compliance with jurisdictional and industry standards and regulations.
Data
Facts gathered together for analysis
Flat Files
A file having no internal hierarchy
Hashed Files
A file that has been encrypted for security purposes
Heap Files
An unsorted set of records
Information
The transformation of raw data into useful facts
Structured Data
Information with a high degree of organization
Punch Card
A card that is perforated and can hold commands or data
Unstructured Data
Information that does not have structure (such as text)
data retrieved before database management systems
Sequentially from simple files
unique identifier
primary key
intersection data
describes the relationship between the two entities(Quantity)
entity
is an object or event in our environment that we want to keep track of. A person is an entity. So is a building, a piece of inventory sitting on a shelf, a finished product ready for sale, and a sales meeting (an event).
attribute
is a property or characteristic of an entity. Examples of attributes include an employee's employee number, the weight of an automobile, a company's address, or the date of a sales meeting.
modalities
minimums
cardinalities
maximums
referential integrity
problem because it revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation.
referential integrity problems can surface in any of the three operations
that result in changes to the database—insert, delete, and update records
factors that lead to redundant data across multiple files:
Data was stored in different formats in different files.

Data was often not shared among different programs that needed it, necessitating the duplication of data in redundant files.

Little was understood about file design, resulting in redundant data within individual files.

Files often could not be rebuilt after damage by a software error or a hardware failure.

Programs were usually written in such a manner that if the way that the data was stored changed, the program had to be modified to continue working.

Changes in everything from access methods to tax tables required programming changes.
index records in the tree
The index records contain salesperson number key values copied from the salesperson records.
Each key value in the tree is associated with a pointer that is the address of either a lower-level index record or a cylinder containing the salesperson records.
Each index record, at every level of the tree, contains space for the same number of key value/pointer pairs (four in this example). This index record capacity is arbitrary, but once it is set, it must be the same for every index record at every level of the index.
Each index record is at least half full (in this example each record actually contains at least two key value/pointer pairs).
Column also called
Field or Attribute
Row also called
Record or Tuple
Table also called
File or Relation
Candidate key
If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key.
Foreign key
(Not Unique identifer)If, in a collection of relations that make up a relational database, an attribute or group of attributes serves as the primary key of one relation and also appears in another relation, then it is called a foreign key in that other relation.
Primary key
It is an attribute or group of attributes whose values are unique throughout all **rows** of the re
Determinants
The attribute (or attributes)that we use as the starting point (the variable on the left side of the question) is called a determinant


Example
(cookiePrice, Qty) -------> BoxPrice
Determinant= cookiePrice
Functional Dependency
A relationship between attributes in which one attribute (or group of attributes) determines the value of another attribute in the same table

Example

The price of one delicious Girl Scout cookie can determine the price of a box of 12 cookies

(CookiePrice, Qty) -------> BoxPrice

Functional Dependency = BoxPrice
Associative Entity
An associative entity is an element of the entity-relationship model. All relationships for the associative entity should be many.
Attribute
An attribute is a property or characteristic of an entity.
Binary relationship
A binary relationship is a relationship between two entity types.
Candidate Key
Is any column or a combination of columns that can qualify as unique key in database.
Cardinality
Cardinality represents the maximum number of entities that can be involved in a particular relationship.
Cartesian product
Usually the result of a missing join condition or a method of expanding the data of 1 table by the number of rows in the second table.
Cascade Delete
Will delete all records that reference the primary key
column subquery
Returns a single column of one or more values.
data encryption
When data is encrypted, it is changed, bit by bit or character by character, into a form that looks totally garbled. It can and must be reconverted, or decrypted, back to its original form to be of use.
Data normalization
Is a methodology for organizing attributes into tables so that redundancy among the non-key attributes is eliminated.
Data volatility
Describes how often stored data is updated.
Data Volume Assessment
Understanding of how much data will be in a database or a table within a database
Database
A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.
DCL
Data control language is used to control access to data stored in a database.
DDL
Data definition language - involves instructing the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, and so forth.
Definer
Definer is a MySQL term where AuthID is the same for another DBMS
Denormalization
The act of duplicating data in a database for performance or the preservation of historical information
disaster recovery
Involves rebuilding an entire information system or significant parts of one after a catastrophic natural disaster such as a hurricane, tornado, earthquake, building collapse, or even a major fire.
DML
Data manipulation language refers to the four basic operations that can and must be performed on data stored in any DBMS: data retrieval, data update, insertion of new records, and deletion of existing records.
E-R model
An E-R model is a data model for describing a database in an abstract way.
embedded mode
The SELECT command is embedded within the lines of a higher-level language program and functions as an input or "read" statement for the program.
Entity
An entity is an object or event in our environment that we want to keep track of.
Foreign Key
Is a field (or collection of fields) in one table that uniquely identifies a row of another table.
Index
Used to help a DBMS find data quickly
Inner Join
Shows row that have matches in both tables
Intersection Data
Intersection Data associated with the concatenation of two segments.
Join
Joins 2 tables together
logical view
Is a mapping onto a physical table or tables that allows an end user to access only a specified portion of data.
Modality
Modality represents the minimum number of entity occurrences that can be involved in a relationship.
outer join
Shows rows in one table that have no match in the other table. Two kinds of outer joins are left and right joins.
Primary Key
Uniquely identifies each record in the table.
query mode
The command goes directly to the relational DBMS, which evaluates the query and processes it against the database.
Referential Integrity
Referential integrity is a database concept that ensures that relationships between tables remain consistent.
referential integrity
Enforces rules to guarantee that the foreign key relationship stays intact with no mismatches.
Response time
Is the delay from the time that the Enter Key is pressed to execute a query until the result appears on screen.
Restrict Delete
Will not allow deletes if the primary key is referenced
row subquery
Returns a single row of one or more values.
scalar subquery
is the most restrictive subquery because it produces only a single value
Set-to-Null on Delete
Will set values to null when primary key is deleted
SQL
Is a comprehensive database management language which incorporates DML and DDL
subquery
One SELECT statement is "nested" within another.
table subquery
Returns a table of one or more rows of one or more columns.
Ternary Relationship
A ternary relationship is when three entities participate in the relationship.
Unary Relationships
Unary relationships associate occurrences of an entity type with other occurrences of the same entity type.
One-to-One Binary Relationship
It means that a single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
Throughput
Is the measure of how many queries from simultaneous users must be satisfied in a given period of time by the application set and the database that it supports.
UNION
To create a result set that combines the results from several queries
Unique Identifier
A unique identifier is any identifier which is guaranteed to be unique among all identifiers used for those objects and for a specific purpose.
Association Rules
Descriptive - discovers links or associations amongst data
Business Intelligence
The transformation of raw data into useful information.
Classification
Predictive - maps data into predefined groups or classes
Clustering
Descriptive - groups similar data together into clusters
Extraction/Transformation/Loading
ETL is the process of extracting raw data and then transforming and loading into a target to be used with Business intelligence.
Regression
Predictive - used to map a data item to a real valued prediction variable
Sequence Discovery
Descriptive - discovers sequential patterns
Summarization Rules
Descriptive - maps data into subsets with associated simple descriptions or generalizations.
Time Series Analysis
Predictive - analysis information over time to predict future data
* in the SELECT clause
It indicates that all attributes of the selected row are to be retrieved.
AND operator
It displays a record if more than one condition is true.
AVG() function
It returns the average value of a numeric column.
BETWEEN operator
It allows you to specify a range of numeric values in a search.
Data definition
It is operationalized with a data definition language (DDL), involves instructing the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, and so forth.
Data management
There are two aspects of data management: data definition and data manipulation.
DISTINCT operator
It is used to eliminate duplicate rows in a query result.
IN operator
It allows you to specify a list of character strings to be included in a search.
JOIN clause
It is used to combine rows from more than one table, based on a common field between them.
LIKE operator
It allows you to specify partial character strings in a "wildcard" sense.
OR operator
It displays a record if either the first condition OR the second condition is true.
ORDER BY clause
It simply takes the results of a SQL query and orders them by one or more specified attributes.
SELECT command
Data retrieval in SQL is accomplished with the SELECT command.
Structured Query Language
It is a standard language for data management in relational databases, known as Structured Query Language or SQL.
Subquery
When one SELECT statement is "nested" within another in a format, it is known as subquery.
Data
Facts gathered together for analysis
Flat Files
A file having no internal hierarchy
Hashed Files
A file that has been encrypted for security purposes
Heap Files
An unsorted set of records
Information
The transformation of raw data into useful facts
Punch Card
A card that is perforated and can hold commands or data
Structured Data
Information with a high degree of organization
Unstructured Data
Information that does not have structure (such as text)
ETL process
Get the data from the source location.
Map the data from its original form into a data model that is suitable for manipulation at the staging area.
Validate and clean the data.
Apply any transformations to the data that are required before the data sets are loaded into the repository.
Map the data from its staging area model to its loading model.
Move the data set to the repository.
Load the data into the warehouse.
Estimation
It is a process of assigning some continuously valued numeric value to an object.
Description
It is the process of trying to characterize what has been discovered or trying to explain the results of the data mining process.
Affinity grouping
It is a process of evaluating relationships or associations between data elements.
Flat Files
a file having no internal hierarchy
Hashed Files
A file that has been encrypted for security purposes.
Heap File
An unsorted set of records.
Information
The transformation of raw data into useful facts.
Punch Card
A card that is perforated and can hold commands or data.
Structured Data
Information with a high degree of organization.
Unstructured Data
Information that does not have structure (such as text)
Binary Relationship
It is a relationship between two entity types.
Unary Relationship
It is an associate occurrence of an entity type with other occurrences of the same entity type.
Cardinality
It is the maximum number of entities that can be involved in a particular relationship.
E-R Model
It is well named, as it diagrams entities (together with their attributes) and the relationship among them.
Intersection Data
It is data that describes a many-to-many relationship.
Modality
It is a minimum number of entity occurrences that can be involved in a relationship.
One-to-one Binary Relationship
It means that a single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
Ternary Relationship
Involves three different entity types.
Unique identifier
It is used to uniquely identify each record in a database table.
Attribute
- Property, characteristic, or fact that we know about an entity.
Describe the ER model for "A salesperson works in one office."
- What is the name of this relationship?
- One-to-one binary relationship
Describe the ER model for "A salesperson sells to many customers."
- What is the name of this relationship?
- One-to-many binary relationship
Describe the ER model for "(L to R) A salesperson is authorized to sell many products; (R to L) a product can be sold by many salespersons."
- What is the name of this relationship?
- Many-to-many binary relationship
What is the positioning and meaning for Cardinality and Modality on an ER model?
- Cardinality is the outer symbol; represents the maximum.
- Modality is the inner symbol; represents the minimum.
Describe the ER model for "(L to R) A salesperson works in a minimum of one and a maximum of one office, or exactly one office." "(R to L) An office may be occupied by or assigned to a minimum of no salespersons or a maximum of one salesperson."
Describe the ER model for "A salesperson may have no customers or many customers."
Describe the ER model for "(L to R) Each salesperson is authorized to sell to at least one or many of our products."and "(R to L) Each product can be sold by at least one or many of our salespersons."
Describe the ER model for "One salesperson backs-up another salesperson."
- What is the name of this model?
- One-to-one unary relationship
Describe the ER model for "(L to R) A salesperson manages zero to many other salespersons." and "(R to L) A salesperson is managed by exactly one other salesperson."
- What is the name of this model?
- One-to-many unary relationship
Describe the ER model for "(L to R) A product can be part of no other products or part of several other products." and "(R to L) A product can be composed of no other products or be composed of several other products."
- What is the name of this model?
- Many-to-many unary relationship
What does 'refer' in Referential Integrity imply?
- Referential integrity revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation.
Define the delete rule RESTRICT...
- If the delete rule between two relations is RESTRICT and an attempt is made to delete a record on the "one side" of the one-to-many relationship, the system will forbid the delete to take place if there are any matching foreign key values in the relation on the "many side".
Define the delete rule CASCADE...
- If the delete rule between two relations is CASCADE and an attempt is made to delete a record on the "one side" of the relationship, not only will the record be deleted but all of the records on the "many side" of the relationship that have a matching foreign key value will also be deleted.
- In other words, the delete will "cascade" from one relation to the other.
Define the delete rule SET-TO-NULL...
- If the delete rule between the two relations is SET-TO-NULL and an attempt is made to delete a record on the "one side" of the one-to-many relationship, that record will be deleted and the matching foreign key values in the records on the "many side" of the relationship will be set to null.
Which entity is uniquely identified by concatenating the primary keys of the two entities it connects?
Associative entity
Which type of entity is also called a dependent entity?
Weak entity
Candidate Key
- If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key.
Concurrency Problem
- When two or more users are trying to update the same record simultaneously.
Equijoin
Combines two or more tables based on a column that is common to the tables.
Foreign Key
- When an attribute or group of attributes serves as the primary key of one relation and also appears in another relation.
Natural Join
- Matches each row in a table against each row in another table based on common values found in columns sharing a common name and data type.
Tuple
- Rows are referred to as tuples.
What are the five basic principles of The Database Concept?
1. The creation of a datacentric environment that is a significant company resource, which can be shared inside and outside the company.
2. The ability to achieve data integration while storing data in a non-redundant fashion.
3. The ability to store data representing entities involved in multiple relationships w/o introducing data redundancy.
4. Managing data control issues such as data security, backup and recovery, and concurrency control.
5. High degree of data independence.
What are the four major DBMS approaches?
- Hierarchical
- Network
- Relational
- Object-oriented
What are four key differences between a RELATION and a FILE?
- The columns of a relation can be arranged in any order w/o affecting the meaning of the data. That is not true of a file.
- Similarly, the rows of a relation can be arranged in any order, which is not true of a file.
- Every row/column position, sometimes referred to as a "cell", can have only a single value, which is not necessarily true in a file.
- No two rows of a relation are identical, which is not necessarily true in a file.
* in the SELECT clause
- It indicates that all attributes of the selected row are to be retrieved
AND operator
- It displays a record if more than one condition is true
AVG() function
- It returns the average value of a numeric column.
BETWEEN operator
- It allows you to specify a range of numeric values in a search.
DISTINCT operator
- It is used to eliminate duplicate rows in a query result.
IN operator
- It allows you to specify a list of character strings to be included in a search
JOIN clause
- It is used to combine rows from more than one table, based on a common field between them.
LIKE operator
- It allows you to specify partial character strings in a "wildcard" sense.
OR operator
- It displays a record it either the first condition OR the second condition is true.
ORDER BY clause
- It simply takes the result of a SQL query and orders them by one or more specified attributes.
SELECT command
- Data retrieval in SQL is accomplished with the SELECT command.
Subquery
- When on SELECT statement is "nested" within another in a format, it is known as subquery.
Common DDL commands:
- CREATE
- ALTER
- DROP
- TRUNCATE
- RENAME
Common DML commands:
- SELECT
- INSERT
- UPDATE
- DELETE
- MERGE
Write the basic SQL query command:
SELECT<columns>
FROM<table>
WHERE<predicates identifying rows to be included>
Write the SQL query to "Find the commission percentage and year of hire of salesperson 186":
SELECT COMMPERCT, YEARHIRE
FROM SALESPERSON
WHERE SPNUM=186;
Write the SQL query to "Retrieve the entire record for salesperson 186":
SELECT *
FROM SALESPERSON
WHERE SPNUM=186;
Write the SQL query to "List the salesperson numbers and salesperson names of those salespersons who have a commission percentage of 10.":
SELECT SPNUM, SPNAME
FROM SALESPERSON
WHERE COMMPERCT=10;
Write the SQL query to "List the salesperson numbers, salesperson names, and commission percentages of the salespersons whose commission percentage is less than 12.":
SELECT SPNUM, SPNAME, COMMPERCT
FROM SALESPERSON
WHERE COMMPERCT<12;
Write the SQL query to "List the customer numbers and headquarters cities of all customers that have a customer number of at least 1700":
SELECT CUSTNUM, HQCITY
FROM CUSTOMER
WHERE CUSTNUM>=1700;
Write the SQL query to "List the customer numbers, customer names, and headquarters cities of the customers that are headquartered in New York and that have a customer number higher than 1500":
SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY='New York'
AND CUSTNUM>1500;
Write the SQL query to "List the customer numbers, customer names, and headquarters cities of the customers that are headquartered in New York OR that have customer numbers higher than 1500":
SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY='New York'
OR CUSTNUM>1500;
Write the SQL query to "List the customers, customer names, and headquarters cities of the customers that are headquartered in New York or that satisfy the two conditions of having a customer number higher than 1500 and being headquartered in Atlanta":
SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY='New York'
OR (CUSTNUM>1500
AND HQCITY='Atlanta');
Write the SQL query to "List the customer records for those customers whose names begin with the letter 'A' ":
SELECT *
FROM CUSTOMER
WHERE CUSTNAME LIKE 'A%';
Write the SQL query to "Find the customer numbers, customer names, and headquarters cities of those customers with the customer numbers greater than 1000. List the results in alphabetic order by headquarters cities (and have the customer names within the same city alphabetized)":
SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE CUSTNUM>1000
ORDER BY HQCITY, CUSTNAME;
Write the SQL query to "Find the average number of units of the different products that Salesperson 137 has sold":
SELECT AVG(QUANTITY)
FROM SALES
WHERE SPNUM=137;
Write the SQL query to "Find the total number of units of all products that Salesperson 137 has sold":
SELECT SUM(QUANTITY)
FROM SALES
WHERE SPNUM=137;
Write the SQL query to "Find the name of the salesperson responsible for Customer Number 1525":
SELECT SPNAME
FROM SALESPERSON, CUSTOMER
WHERE SALESPERSON.SPNUM=CUSTOMER.SPNUM
AND CUSTNUM=1525;
Write the SQL query to "List the NAMES of the products of which salesperson Adams has sold more than 2000 units":
SELECT PRODNAME
FROM SALESPERSON, PRODUCT, SALES
WHERE SALESPERSON.SPNUM=SALES.SPNUM
AND SALES.PRODNUM=PRODUCT.PRODNUM
AND SPNAME='Adams'
AND QUANTITY>2000;
CREATE TABLE command
- The command that creates base tables and tells the system what attributes will be in them.
CREATE VIEW command
- Specifies the base tables on which the view is to be based and the attributes and rows of the table that are to be included in the view.
DELETE command
- Specify which row(s) of a table are to be deleted based on data values within those rows.
DROP TABLE command
- Discards a table from a database.
DROP VIEW command
- Discards views.
Normalization
- The process of organizing the fields and tables of a relational database to minimize redundancy (duplication) and dependency.
Second Normal Form
- The second step in normalizing a database.
- Builds on the first normal form.
- All non-key attributes must be fully functionally dependent on the entire key of that table.
Third Normal Form
- Non-key attributes are not allowed to define other non-key attributes.
What are three important point about Third Normal Form?
1. It is completely free of redundancy
2. All foreign keys appear where needed to logically tie together related tables.
3. It is the same structure that would have been derived from a properly drawn entity-relationship diagram of the same business environment.
Write the SQL query to "Add a new salesperson into the SALESPERSON table whose salesperson number is 489, name is Quinlan, commission percentage is 15, year of hire is 2011, and department number is 59.":
INSERT INTO SALESPERSON
VALUES
('489','Quinlan',15,'2011','59');
Write the SQL query to "Delete the row for salesperson 186 from the SALESPERSON table.":
DELETE FROM SALESPERSON
WHERE SPNUM = '186';
What is the correct syntax of the INSERT command?
INSERT INTO table_name VALUES (value1,value2,value3,...):
What is the correct syntax of the CREATE VIEW command?
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
What is called a decomposition process?
Data normalization
In which of the normal forms should every non-key attribute be fully functionally dependent on the entire key of a table?
Second form
What is the correct syntax of the CREATE TABLE command?
CREATE TABLE table_name (column_name1 data_type(size), column_name2 data_type(size),...);
What is the correct syntax of the UPDATE command?
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE some_column=some_value;
Association Rules
Descriptive; discovers links or associations amongst data.
Business Intelligence
The transformation of raw data into useful information.
Classification
Predictive; maps data into predefined groups or classes.
Clustering
Descriptive; groups similar data together into clusters.
Regression
Predictive; used to map a data item to a real valued prediction variable.
Sequence Discovery
Descriptive; discovers sequential patterns.
Summarization Rules
Descriptive; maps data into subsets with associated simple descriptions or generalizations.
What are the values of Business Intelligence?
- Financial value associated w/ increased profitability.
- Productivity value associated with increased throughput.
- Trust value (customer, employee, supplier satisfaction) as well as increased confidence in forecasting.
- Risk value
What are the reasons for using the Dimensional Model for BI?
- Simplicity.
- Lack of bias.
- Extensibility.
What are the fundamental aspects of a Data Warehouse?
- Centralized repository of information.
- Organized around relevant subject areas.
- Provides platform for queries.
- Used for analysis and not transactional processing.
- Data is nonvolatile.
- Target location for integrating data from multiple sources.
What is the general theme of the ETL process?
1. Get the data
2. Map the data to staging area
3. Validate and clean the data
4. Apply necessary transformations
5. Map data to loading model
6. Move data to repository
7. Load data to warehouse
What is the key factor based on the need for linear scalability?
Performance
What is used for populating summaries or any cube dimensions that can be performed at the staging area (ETL)?
Aggregation
What data mining activity is a process of assigning some continuously valued numeric value to an object?
Estimation
What includes exploiting the discovery of table and foreign keys for representing linkage between different tables?
Integration
What data mining activity is the process of organizing data into predefined classes?
Classification
What activity groups data members that are similar?
Clustering
What are the limitations of Software RAID?
- Performance: affects overall system performance.
- Supported features: does not support all RAID levels.
- O/S compatibility: inflexibility in the data-processing environment.
What is Controller Card RAID? Efficient or inefficient?
- Host-based RAID implementation in which a specialized RAID controller is installed in the host, and disk drives are connected to it.
- Not an efficient solution in a data center environment with a large number of hosts.
What are the three RAID techniques?
- striping: spreading of data across multiple drives to use the drives in parallel.
- mirroring: the same data is stored on two different disk drives, yielding two copies of the data.
- parity: redundancy technique that ensures protection of data w/o maintaining a full set of duplicate data.
RAID 0
- configuration uses data striping
- data is striped across all disks within the RAID set
- uses the full storage capacity of a RAID set
- good option for applications that need high i/o throughput
RAID 1
- based on the mirroring technique
- consists of two drives and every write is written to both drives
- good option for applications that require high availability and cost is no constraint
Nested RAID
- RAID 1+0 or RAID 0+1
- combine the performance benefits of RAID 0 with the redundancy benefits of RAID 1
- requires even number of disks with the minimum being 4 disks
RAID 3
- stripes data for performance and uses parity for fault tolerance
- good option for applications that involve large sequential data access, such as data backup or video streaming
RAID 4
- similar to RAID 3, but data is not striped on the parity disk
- good read throughput and reasonable write throughput
RAID 5
- versatile
- similar to RAID 4, but the parity is distributed across all disks to overcome the write bottleneck in RAID 4
- good for read-intensive i/o applications, preferred for messaging, data mining, and relational database management systems.
What are the four key components of an intelligent storage system?
- front end
- cache
- back end
- physical disks
Front-end controller
- routes data to and from cache via the internal data bus
Cache
- semiconductor memory where data is placed temporarily to reduce the time required to service i/o requests from the host.
active-active array
- high-end storage system
- large number of controllers and cache memory
- large storage capacity
- large amounts of cache
- fault tolerance
- ability to handle large amounts of concurrent i/os
active-passive arrays
- aka midrange storage system
- provide optimal storage solutions at a lower cost
- less storage capacity and cache than high-end storage
- fewer front-end ports
- ensure high redundancy and high performance for applications with predictable workloads
What are characteristics of cloud computing?
- on-demand self-service
- broad network access
- resource pooling
- rapid elasticity
- measured service
Benefits of cloud computing?
- reduced it cost
- business agility
- flexible scaling
- high availability
What are the layers of the cloud computing infrastructure?
- physical
- virtual
- applications and platform software
- cloud management and service creation tools
YOU MIGHT ALSO LIKE...
STUDY GUIDE