167 terms

544

STUDY
PLAY

Terms in this set (...)

What's SQL Standard?
A set of directions
SQL Standard purposes
-Specify syntax/semantics for data definition and manipulation
-Define data structures and basic operations
-Enable portability of database definition and application modules
-Specify minimal (level 1) and complete (level 2) standards
-Allow for later growth/enhancement to standards
Advantages of SQL Standard
-Reduced training costs
-Productivity
-Application portability
-Application longevity
-Reduced dependence on a single vendor
-Cross-system communication
Disadvantages of SQL Standard
-Stifle creativity & Innovation
-One standard is never enough to meet all needs
-Fixing deficiencies may take considerable effort
-Loss advantages like application portability
RDBMS
A database management system that manages data as a collection of tables in which all data relationships are represented by common values in related tables.
SQL environment
Catalog, Schema, DDL, DML, DCL

An SQL environment includes an instance of an SQL database management system along with the databases accessible by that DBMS and the users and programs that may use that DBMS to access the database.
Schema
Structure that contains descriptions of objects created by a user (base tables, views, constraints)
Catalog
A set of schemas that, when put together, constitute a description of a database
Comparison Operators
= (Equals) | Equal to
> (Greater Than) |Greater than
< (Less Than) |Less than
>= (Greater Than or Equal To) |Greater than or equal to
<= (Less Than or Equal To) | Less than or equal to
<> (Not Equal To) |Not equal to
!= (Not Equal To) |Not equal to
!< (Not Less Than)|Not less than
!> (Not Greater Than)|Not greater than
How to use Comparison Operators
WHERE.... (single quote)
Boolean /logical Operators
WHERE.... (AND, OR, NOT)
Evaluation order for Boolean operators
NOT -> AND -> OR
If want to work in desire order rather than prescribed order,
Use parentheses
Wildcard
Use of the (*), in the WHERE clause when an exact match is not possible.
LIKE '%Desk' - all different types of desks
LIKE '_-drawer' - exactly one character before drawer
DDL (4 major statements)
Data definition language
Commands used to define a database, including those for creating, altering, and dropping tables and establishing constraints

CREATE SCHEMA
CREATE TABLE
CREATE VIEW
other CREATE
ALTER
DML (3 major statements)
Data manipulation language
Commands used to maintain and query a database, including those for updating, inserting, modifying, and querying data.

INSERT
UPDATE
SELECT
DCL (3 major statements)
Data control language
Commands used in to control a database, including those for administering privileges and committing (saving) data

GRANT
ADD
REVOKE
Referential Integrity
Constraint that ensures that foreign key values of a table must match primary key values of related table in 1:M relationships
How is referential integrity established
Using REFERENCES clause
ON UPDATE RESTRICT
Any updates that would delete or change a primary key value will be rejected unless no foreign key references that value in any child table
ON UPDATE CASCADE
Pass the change through to the child tables by using this
ON UPDATE SET NULL
Allow the update on the table but to change the involved PK value in the other table to NULL
ON DELETE CASCADE
Remove one PK value would remove all associated records from other table
How might a view be used in SQL
CREATE VIEW xxxx_V AS
Views provide users controlled access to tables
Purpose of a view: simplify query commands, improve data security and enhance programming consistency and productivity for a database
How a view can be used to reinforce data security
Tables and columns that are not included will not be obvious to the user of the view
Why is it necessary to limit the kinds of updates that can be performed on data when referencing data through a view
In general, update operations to data in a view are permitted as long as the update is unambiguous in terms of data modification in the base table. However, when the CREATE VIEW statement contains any of the following situations, that view may not be updated directly.

a. The SELECT clause includes the keyword DISTINCT.
b. The SELECT clause contains expressions, including derived columns, aggregates, statistical functions, and so forth.
c. The FROM clause, a subquery, or a UNION clause references more than one table.
d. The FROM clause or a subquery references another view, which is not updateable.
e. The CREATE VIEW command contains an ORDER BY, GROUP BY, or HAVING clause.
How is the order in which attributes appear in a result table changed
Using ORDER BY
Reverse order use DESC after column name
How are the column heading labels in a result table changed
Using alias, AS xxx
COUNT, COUNT DISTINCT, COUNT (*)
COUNT tallies only rows that contain values, it ignores all null values;

COUNT(*) counts all rows selected by a query, regardless of whether any of the rows contain null values;

COUNT DISTINCT only counts non-duplicate rows
Scalar aggregate
The single value returned from an SQL query that includes an aggregate function
Vector aggregate
Multiple values returned from an SQL query that includes an aggregate function
GROUP BY limitation
Only those columns that have a single value for each group can be included.
Why SQL is called a set-oriented language
Most operations performed using SQL statements do not operate on a single row in a table, but instead operate on multiple rows in a table
When would the use of the LIKE keyword with the CREATE TABLE command be useful
Allow for the copying of the existing structure of one or more tables into a new table
In what order are the clauses of an SQL statement processed
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
In ORDER BY clause, 2 ways to refer to columns to be used for sorting the results of the query
Type column names
Use their column positions in the select list
CHECK within CREATE TABLE
Using CHECK as a column constraint, to establish validation rules for values to be inserted into the database
WITH CHECK OPTION in CREATE VIEW
When the data values do not meet the specifications of WITH CHECK OPTION attempts to insert data through the view will be rejected
What can and cannot be changed using ALTER
Cannot be changed: view
Is it possible to use both WHERE and HAVING? Differences between two?
Yes it's possible.

-WHERE doesn't allow aggregate
-WHERE qualifies a set of rows, HAVING qualifies a set of groups
-HAVING follows GROUP BY
Base table
A table in the relational data model containing the inserted raw data, Base table correspond to the relations that are identified in the database's conceptual schema
Virtual table
A table constructed automatically as needed by a DBMS. Virtual tables are not maintained as real data
Dynamic view (slides)
-A "virtual table" created dynamically upon request by a user
-No data actually stored; instead data from base table made available to user
-Based on SQL SELECT statement on base tables or other views
Materialized view (slides)
-Copy or replication of data
-Data actually stored
-Must be refreshed periodically to match corresponding base tables
Advantages of view
-Simplify query commands
-Assist with data security (but don't rely on views for security, there are more important security measures)
-Enhance programming productivity
-Contain most current base table data
-Use little storage space
-Provide customized view for user
-Establish physical data independence
Disadvantages of view
-Use processing time each time view is referenced
-May or may not be directly updateable
Dynamic SQL
Ability for an application program to generate SQL code on the fly, as the application is running
Embedded SQL
Including hard-coded SQL statements in a program written in another language such as C or Java
Equi-Join
a join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table
Functions
Routines that return values and take input parameters
Procedures
Routines that do not return values and can take input or output parameters
Triggers
Routines that execute in response to a database event (INSERT, UPDATE, or DELETE)
Join
a relational operation that causes two or more tables with a common domain to be combined into a single table or view
Union Join
includes all columns from each table in the join, and an instance for each row of each table
Natural Join
an equi-join in which one of the duplicate columns is eliminated in the result table
Outer Join
a join in which rows that do not have matching values in common columns are nonetheless included in the result table
inner join
in which rows must have matching values in order to appear in the result table
Self-join
Self-joins are usually used on tables with unary relationships
IN/NOT IN
To match a list of values
Correlated Subqueries
This type of subquery is processed outside in, rather than inside out. That is, the inner query is executed for each row in the outer query, and the inner query depends in some way on values from the current row in the outer query.
Tips for Developing Queries
-Be familiar with the data model (entities and relationships)
-Understand the desired results
-Know the attributes desired in results
-Identify the entities that contain desired attributes
-Review the ERD
-Construct a WHERE equality for each link
-Fine tune with GROUP BY and HAVING clauses if needed
-Consider the effect on unusual data
Guidelines for Better Query Design
-Understand how indexes are used in query processing
-Keep optimizer statistics up-to-date
-Use compatible data types for fields and literals
-Write simple queries
-Break complex queries into multiple simple parts
-Don't nest one query inside another query
-Don't combine a query with itself (if possible avoid self-joins)
-Create temporary tables for groups of queries
-Combine update operations
-Retrieve only the data you need
-Don't have the DBMS sort without an index
-Learn!
-Consider the total query processing time for ad hoc queries
Can an outer join be easily implemented when joining more than two tables?
OUTER JOIN syntax does not apply easily to a join condition of more than two tables. The results will vary, be sure to test any outer join syntax.
Several reasons to consider embedding SQL
-It is possible to create a more flexible, accessible, interface for user.
-It may be possible to improve performance by using embedded SQL
-Database security may be improved
Any query that can be written using the subquery approach can also be written using the joining approach, but not vice versa. Why?
While SQL*PLUS allows a subquery to return more than one column, most systems allow pairwise joining of one and only one column in an inner query with one column in an outer query. (An exception to this is when a subquery is used with the EXISTS keyword.) You can display data only from the table(s) referenced in the outer query. If you want to include data from the subquery in the result, then you would have to use the join technique because data from the subquery cannot be included in the results. The joining technique is useful when data from several relations are to be retrieved and displayed, and the relationships are not necessarily nested.
Purpose of the COMMIT command; its relation to the notion of a business transaction.
a. SQL transactions terminate by executing either a COMMIT or ROLLBACK operation. COMMIT [WORK] takes the contents of the log file and applies them to the database and then empties the log file. There is also an AUTOCOMMIT (ON/OFF) command in many RDBMSs that specifies whether changes are made permanent after each data modification command (ON) or only when work is explicitly made permanent (OFF) by the COMMIT WORK command. These commands are necessary to maintain a valid database and are transparent to the user in most interactive SQL situations.
b. SQL transactions are logical units of work. Either all of the operations performed in the SQL transaction will be committed, or none of the operations will be committed to the database. An SQL transaction may be more involved than an accounting transaction. For example, the entry of a customer order may also trigger inventory adjustment. Executing the COMMIT command will either make permanent changes to all relations involved in the logical unit of work, or it will make changes to none of them.
Explain the processing order of a correlated subquery.
Correlated subqueries use the result of the outer query to determine the processing of the inner query. Thus, the inner query may be somewhat different for each row referenced in the outer query.
What is a derived table? How are derived table used?
Derived tables are used to create a temporary table that is treated as if it were an actual table. This table is not persistent in that it goes away after the query in which it was created is run.
Example of derived table
One example of the use of a derived table would be to find all ships that were loaded beyond capacity. In this example, a shipment's weight is calculated by computing the sum of the quantity order times the weight. The query follows:
SELECT ship.ship_no
FROM ship, shipment, (SELECT shipment_line.shipment_id, SUM(item.weight*shipment_line.quantity) AS tweight
FROM shipment_line,item
WHERE shipment_line.item_no = item.item_no GROUP BY shipment_id) AS ship_wt
WHERE ship.ship_no = shipment.ship_no
AND shipment.shipment_id = ship_wt.shipment_id
AND ship.capacity < ship_wt.tweight;
Difference between a trigger and stored procedures?
Trigger in act which is performed automatically before or after a event occur
Stored procedure is a set of functionality which is executed when it is explicitly invoked
If two queries involved in a UNION operation contained columns that were incompatible with one another, how would you recommend fixing this? Explain
One possibility would be to convert one of the data types. For example, if one data type is a character and the other numeric, you could use a function such as Oracle's TO_CHAR to convert the numeric to a character. Another option is to decide which tables might be involved in UNION operations and make sure that the data types are compatible.
Big data
An ill-defined term applied to database whose size strains the ability of commonly used relational DBMSs to capture, manage, and process the data within a tolerable elapsed time.
Conformed dimension
One of more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each table
Surrogate Keys and why?
Dimension table keys should be surrogate (non-intelligent and non-business related) because:
-Business keys may change over time
-Helps keep track of non-key attribute values for a given production key
-Surrogate keys are simpler and shorter
-Surrogate keys can be same length and format for all key
Data mart
A data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform, and load processes from source data systems
Data warehouse
A subject-oriented (organized around key subjects), integrated, time variant (contain a time dimension to study trends and changes), nonupdateable collection of data used in support of management decision-making processes.
Data mining
Knowledge discovery, using a sophisticated blend of techniques from traditional statistics, artificial intelligence, and computer graphics
Data visualization
The representation of data in graphical and multimedia formats for human analysis
Independent data mart (graph)
A data mart filled with data extracted from the operational environment, without the benefit of a data warehouse.
Dependent data mart (graph)
A data mart filled exclusively from an enterprise data warehouse and its reconciled data
Derived data
Data that have been selected, formatted, and aggregated before end user decision support applications
Reconciled data
Detailed, current data intended to be single authoritative source for all decisions support applications
Enterprise data warehouse (EDW) (graph)
A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications
Grain
The level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements
Operational system
A system that is used to run a business in real time, based on current data. Also called a system of record
Informational system
A system designed to support decision-making based on historical point-in-time and prediction data for complex queries or data mining applications
Logical data mart
A data mart created by a relational view of a data warehouse
Multidimensional OLAP
OLAP tools that load data into an intermediate structure, usually a three or higher dimensional array
Online analytical processing
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques
NoSQL
Short for "not only SQL", NoSQL is a class of database technology used to store and access textual and other unstructured data, using more flexible structures than the rows and columns format of relational database. It's used when the user unsure of what structure to use for data.
Operational data store
An integrated, subject-oriented, continuously updateable, curren- valued (with recent history), enterprise-wide, detailed database designed to serve operational users as they do decision support processing
Periodic data
Data that are never physically altered or deleted once they have been added to the store
Transient data
Data in which changes to existing records are written over previous records, thus destroying the previous data content
Real time data warehouse (graph)
An enterprise data warehouse that accepts near-real-time feeds of transactional data from the systems of record, analyzes warehouse data, and in near-real-time relays business rules to the data warehouse and systems of record so that immediate action can be taken in response to business events
Star schema
A simple database design in which dimensional data are separated from fact or event data. A dimensional model is another name for a star schema.

It's not suited to online transaction processing, and therefore, it is not generally used in operational systems, operational data stores or an EDW.

Contains one fact table and one or more dimension tables.

Example: Sales - Product, Period, Store
Snowflake Schema
An expanded version of a star schema in which dimension tables are normalized into several related tables
Major trends that have led to the growth in data warehouse
a. No single system of record
b. Multiple systems are not synchronized
c. Organizations want to analyze the activities in a balanced way
d. Customer relationship management
e. Supplier relationship management
Major components of a data warehouse architechture
a. Operational data. Stored in the various operational systems throughout the organization (and sometimes in external systems)
b. Reconciled data. The type of data stored in the enterprise data warehouse
c. Derived data. The type of data stored in each of the data marts
List three types of metadata; briefly describe the purpose of each type:
a. Operational metadata. These are metadata that describe the data in the various operational systems (as well as external data) that feed the enterprise data warehouse. Operational metadata typically exist in a number of different formats, and they are, unfortunately, often of poor quality.
b. Enterprise data warehouse (EDW) metadata. These metadata are derived from (or at least are consistent with) the enterprise data model. They describe the reconciled data layer. EDW metadata also describe the rules that are used to transform operational data to reconciled data.
c. Data mart metadata. These metadata describe the derived data layer. They also describe the rules that are used to transform reconciled data to derived data.
Four characteristics of a data warehouse:
a. Subject-oriented. A data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects may include customers, patients, students, products, time, etc., depending on the real-world domain within which the organization operates.
b. Integrated. The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization. This means that the data warehouse holds the one version of "the truth."
c. Time-variant. Data in the data warehouse contain a time dimension so that they may be used to study trends and changes.
d. Non-updatable. Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end users.
Five claimed limitations of independent data marts:
a. A separate ETL process has to be developed for each data mart. This can yield costly redundant data and efforts.
b. A clear, enterprise-wide view of data may not be provided because data marts may not be consistent with one another.
c. Analysis is limited because there is no capability to drill down into greater detail or into related facts in other data marts.
d. Scaling costs are excessive as each new application creates a separate data mart, which repeats all the extract and load steps.
e. Attempting to make the separate data marts consistent generates a high cost to the organization.
Two claimed benefits of independent data marts:
a. Allow for the concept of a data warehouse to be proved by working on a series of small, fairly independent projects
b. A reduction in the amount of time until a benefit from data warehousing is perceived by the organization, so that there is not a delay until all data are centralized
Three types of operations that can be easily performed with OLAP tools:
a. Slicing a cube
b. Drill-down
c. Data mining
Is a star schema a relational model?
The star schema is a denormalized implementation of the relational data model. The fact table plays the role of a normalized n-ary associative entity that links together the instances of the various dimensions. Usually, the dimension tables are in second normal form or possibly (but rarely) in third normal form. The dimension tables are denormalized and because they are neither updated nor joined with one another, provide an optimized user view for specific information needs but could not be used for operational purposes.
Volatility of a data warehouse vs. volatility of an operational database:
A major difference between a data warehouse and an operational system is the type of data stored. An operational system most often stores transient data, which are overwritten when changes to the data occur. Thus, the data in an operational system are very volatile. On the other hand, a data warehouse usually contains periodic data, which are never overwritten once they have been added to the store. A data warehouse contains a history of the varying values for important (dimensional) data.
Pros and cons of logical data marts:
Pros:
a. New data marts can be created quickly because no physical database or database technology needs to be acquired or created. Also, loading routines do not need to be written.
b. Data marts are always up-to-date because data in a view are created when the view is referenced. Views can be materialized.

Con:
Logical data marts are only practical for moderate-sized data warehouses or when high performance data warehousing technology is used.
Characteristics of a surrogate key for a data warehouse or data mart:
All keys used to join the fact table to the dimension tables should be system assigned. The key should be simple as compared to the production or composite key. It is best to maintain the same length and format for all surrogate keys across the entire data warehouse, regardless of the business dimensions involved.
Why is time almost always a dimension in a data warehouse or data mart
Time is almost always a dimension in a data warehouse or data mart because data marts and data warehouses record facts about dimensions over time. Date and time are almost always included as a dimension table, and a date surrogate key is usually one of the components of the primary key of the fact table. The time dimension is critical to most of the reporting and analysis needs that end users of the data warehouse have. Often, users will want to view how facts (such as sales) have changed over time or may want to compare one time period against another.
In what way are dimension tables often not normalized?
Most dimension tables are not normalized so that for a given user group the dimension data are only one join away from associated facts. One example might be multivalued data, in which one could store multiple values by using several different fields. Another example would be the incorporation of data from other tables that are not part of the star schema but might be needed for analysis.
What is the meaning of the phrase "slowly changing dimension"?
Although data warehouses track data over time, the business does not remain static. We need to keep track of the history of values in order to record the history of facts with correct dimensional descriptions when the facts occurred. Dimension data changes slower than transactional data, thus we can consider dimensions to be slowly changing dimensions.
The most common approach used to handle slowly changing dimensions.
Create a new dimension table row (with a new key) each time the dimension object changes and this new row will contain all the dimension characteristics. A fact row is associated with the key whose attributes apply at the time of the fact. This approach allows us to create as many dimensional object changes as necessary. It can become unwieldy if rows change frequently. We may also want to store the surrogate key value for the original object in the dimension row so that we can relate changes back to the original object.
Data warehouses nonupdatable?
Non-updatable means that data, once included in the data warehouse, are never changed (except to correct errors). Instead, new versions of the same data may be stored.
Data administration
A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards.
Database administration
A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery.
Two-phase locking protocol
A procedure for acquiring the necessary locks for a transaction where all necessary locks are acquired before any locks are released, resulting in a growing phase, when locks are acquired, and a shrinking phase, when they are released.
Information repository
Stores metadata that describe an organization's data and data processing resources. Manages the total information-processing environment. Combines information about an organization's business information and its application portfolio.
Versioning
Concurrency control mechanism that doesn't use record locking. Each transaction is restricted to a view of the database as of the time that transaction started, and when a transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record.
Deadlock
An impasse that results when two or more transactions have locked a common resource, and each waits for the other to unlock that resource.
Transaction
A discrete unit of work that must be completely processed or not processed at all within a computer system.
Encryption
The coding (or scrambling) of data so that humans cannot read them.
Data archiving
The process of moving inactive data to another location where it can be accessed when needed.
Heartbeat query
A query submitted by the DBA to test the current performance of the database. A heartbeat query is also called a canary query.
Deadlock prevention; deadlock resolution
When deadlock prevention is employed, user programs must lock all records they will require at the beginning of a transaction, rather than one at a time. Deadlock resolution allows deadlocks to occur, but builds mechanisms into the DBMS for detecting and breaking the deadlocks.
Backward recovery; forward recovery
With backward recovery (also called rollback), the DBMS backs out of or undoes unwanted changes to the database. Before-images of the records that have been changed are applied to the database. As a result, the database is returned to an earlier state; the unwanted changes are eliminated. With forward recovery (also called rollforward), the DBMS starts with an earlier copy of the database. By applying after-images (the results of good transactions), the database is quickly moved forward to a later state (Figure 12-7).
Before-image; after-image
A before-image is simply a copy of a record before it has been modified, and an after-image is a copy of the same record after it has been modified.
Shared lock (S Lock, or Read Lock);
Exclusive lock (X Lock, or Write Lock)
Placing a shared lock on a record prevents another user from placing an exclusive lock on that record. Placing an exclusive lock on a record prevents another user from placing any type of lock on that record.
Aborted transaction
A transaction in progress that terminates abnormally
Authorization rules
Controls incorporated in a data management systems that restrict access to data and also restrict the actions that people may take when they access data
Backup facility
A DBMS COPY utility that produces a backup copy (or save) of an entire database or a subset of a database
Checkpoint facility
A facility by which a DBMS periodically refuses to accept any new transactions. The system is in a quiet state, and the database and transaction logs are synchronized
Concurrency control
The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multiuser environment
Data dictionary
A repository of information about a database that documents data elements of a database
System catalog
A system-created database that describes all database objects, including data dictionary information, and also includes user access information
Database change log
A log that contains before and after images of records that have been modified by transactions
Transaction log
A record of the essential data for each transaction that is processed against the database
Database destruction
The database itself is lost, destroyed, or cannot be read
Database recovery
Mechanisms for restoring a database quickly and accurately after loss or damage
Database security
Protection of database against accidental or intentional loss, destruction or misuse
Inconsistent read problem
An unrepeatable read, one that occurs when one user reads data that have been partially updated by another user
Information Resource Dictionary System (IRDS)
A computer software tool that is used to manage and control access to the information repository
Journalizing facility
An audit trail of transactions and database changes
Open Source DBMS
Free DBMS source code software that provides the core functionality of an SQL-compliant DBMS
Restore/Rerun
A technique that involves reprocessing the day's transactions (up to the point of failure) against the backup copy of the database
Smart card
A credit card-sized plastic card with an embedded microprocessor chip that can store, process, and output electronics data in a secure manner
User-defined procedures
User exits (or interfaces) that allow system designers to define their own security procedures in addition to the authorization rules
The changing roles of the data administrator and database administrator:
Data administration is a high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide data definitions and standards. Typically, the role of database administration is taken to be a more hands-on, physical involvement with the management of a database or databases. Database administration is a technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery. As business practices change, the roles are also changing within organizations. There are, however, a core set of database administration functions, which must be met in every organization, regardless of the database administration chosen. On one hand, these functions may be spread across data administrators and database administrators. At the other extreme, all of these functions may be handled by a single DBA.
Managing the data repository
data administrator
Installing and upgrading the DBMS
database administrator
Conceptual data modeling
data administrator
Managing data security and privacy
database administrator
Database planning
data administrator
Tuning database performance
database administrator
Database backup and recovery
database administrator
Running heartbeat queries
database administrator
Changes in data administration procedures that will decrease development and implementation time:
a. Database planning: Improve technology selection through selective evaluation of possible products. Consider each technology's fit with the enterprise data model, reducing time required in later stages by effective selection of technology at the database planning stage.

b. Database analysis: Work on physical design in parallel with development of the logical and physical models. Prototyping the application now may well lead to changes in the logical and physical data models earlier in the development process.

c. Database design: Prioritize application transactions by volume, importance, and complexity. These transactions are going to be most critical to the application, and specifications for them should be reviewed as quickly as the transactions are developed. Logical data modeling, physical database modeling, and prototyping may occur in parallel. DBAs should strive to provide adequate control of the database environment while allowing the developers space and opportunity to experiment.

d. Database implementation: Institute database change control procedures so that development and implementation are supported rather than slowed. Wherever possible, segment the model into modules that can be analyzed and implemented more quickly. Find ways to test the system more quickly without compromising quality. Testing may be moved earlier in the development; use testing and change control tools to build and manage the test and production environments.

e. Operation and maintenance: Review all timesaving measures that have been taken to ensure that database quality has not been compromised. Consider using third-party tools and utilities wherever possible to save work; other tools, such as Lotus Notes, may reduce the need for meetings, thus saving time.
Five areas where threats to data security may occur:
a. Accidental losses, including human error, software, and hardware-caused breaches: Establishing operating procedures, such as user authorization, uniform software installation procedures, and hardware maintenance schedules, are examples of actions that may be taken to address threats from accidental losses. As in any effort that involves human beings, some losses are inevitable, but well thought-out policies and procedures should reduce the amount and severity of losses.

b. Theft and fraud: These activities are going to be perpetrated by people, quite possibly through electronic means, and may or may not alter data. Attention here should focus on each possible location shown in Figure 12-3. For example, control of physical security, so that unauthorized personnel are not able to gain access to the machine room, should be established. Data access policies that restrict altering data immediately prior to a payroll run will help to secure the data. Establishment of a firewall to protect unauthorized access to inappropriate parts of the database through outside communication links is another example of a security procedure that will hamper people who are intent on theft or fraud.

c. Loss of privacy or confidentiality: Loss of privacy is usually taken to mean loss of protection of data about individuals, while loss of confidentiality is usually taken to mean loss of protection of critical organizational data that may have strategic value to the organization. Failure to control privacy of information may lead to blackmail, bribery, public embarrassment, or use of user passwords. Failure to control confidentiality may lead to loss of competitiveness.

d. Loss of data integrity: When data integrity is compromised, data will be invalid or corrupted. Unless data integrity can be restored through established backup and recovery procedures, an organization may suffer serious losses or make incorrect and expensive decisions based on the invalid data.

e. Loss of availability: Sabotage of hardware, networks, or applications may cause the data to become unavailable to users, which again may lead to severe operational difficulties.
How creating a view may increase data security, and why not to rely completely on using views to enforce data security:
The advantage of a view is that it can be built to present only the data to which the user requires access, thus effectively preventing the user from viewing other data (not produced by the SQL for the view) that may be private or confidential. The user may be granted the right to access the view, but not to access the base tables upon which the view is based. For most views, data may not be updated, thus eliminating the possibility of unauthorized changes to the base data. However, views are not adequate security measures because unauthorized persons may gain knowledge of or access to a particular view. In addition, with high-level query languages, an unauthorized person may gain access to data through simple experimentation. Views also do not protect data from access outside the DBMS by hackers. Thus, views provide more convenience than security.
Integrity controls in database security:
Integrity controls protect data from unauthorized use and update. Integrity controls limit the values a field may hold, limit the actions that can be performed, or trigger the execution of some procedure (such as placing an entry into a log to record which users have performed which actions on which data). A domain is an example of an integrity constraint that can be used to define a user-defined data type. If this type ever needs to be changed, it can be changed in only one place, and all fields using this domain will be updated automatically. Assertion constraints enforce desirable database conditions and check automatically whenever transactions are run. Triggers can be used for events or conditions, and actions needed to be tracked against a database. Triggers cannot be circumvented.
What are the key areas of IT that are examined during a Sarbanes-Oxley audit?
IT change management
Logical access to data
IT operations
What is the advantage of optimistic concurrency control and pessimistic concurrency control? Explain.
The main advantage of versioning over locking is performance improvement. Read-only transactions can run concurrently with updating transactions without loss of database consistency.
What is the difference between deadlock prevention and deadlock resolution?
Deadlock prevention requires all users to lock all records they will require at the beginning of a transaction as opposed to one at a time. Unfortunately, it is often difficult to predict in advance all records that will be needed at the beginning of a transaction. Deadlock resolution requires DBMS mechanisms that detect and break deadlocks. A matrix of a resource usage is maintained that enables deadlock detection.
What are the different types of DBMS facilities that are required for database backup and recovery?
a. Backup facilities provide periodic backup copies of the entire database.

b. Journalizing facilities maintain an audit trail of transactions and database changes.

c. Checkpoint facility allows periodic suspension of all processing and synchronization of a database's files and journals.

d. Recovery manager allows the DBMS to restore the database to a correct condition and restart processing transactions.
Why is transaction integrity important?
A transaction is the identification of the sequence of steps that constitute a well-defined business activity. Transaction integrity includes those actions taken to enforce the commitment of all of the steps that constitute a transaction. In processing a transaction, we want any changes to the database to be made only if the entire transaction is processed successfully. If the transaction aborts at any point, no changes can be allowed, or else the database will not reflect the data needed to manage the firm.
What are the common types of database failure?
a. Aborted transaction: Aborting a transaction in progress due to some abnormal condition, such as the loss of transmission in a communication link while a transaction is in progress.

b. Incorrect data: The database is updated with incorrect, but valid data: for example, an incorrect grade recorded for a student.

c. System failure: Some component of the system fails, but the database is not damaged. Causes include power loss, operator failure, loss of communications transmission, or system software failure.

d. Database loss or destruction: The database itself is lost or destroyed or cannot be read.
What are the main components of a disaster recovery plan
a. Develop a detailed disaster recovery plan.

b. Choose and train a multidisciplinary team to carry out the plan.

c. Establish a backup data center at an off-site location.

d. Send backup copies of the databases to the backup data center on a scheduled basis.
What is the purpose of the GRANT and REVOKE SQL commands? Explain.
The purpose of the GRANT and REVOKE SQL commands is to control the access to data, and the actions that can be taken on the data, within a database management system. The GRANT and REVOKE commands are generally operated against User accounts within a database management system. The DBMS vendors may vary in their ability to control the authorization rules within a particular DBMS. In Oracle, authorization rules are handled with privileges on User accounts at the database or the table level, as well as INSERT and UPDATE privileges on a column level. Oracle privileges that can be GRANTed or REVOKEd from a user include: SELECT, INSERT, UPDATE, DELETE, ALTER, INDEX, REFERENCES, EXECUTE.
Costs of downtime
...