Chapter 5

STUDY
PLAY

Terms in this set (...)

normalized relations
A requirement to begin designing physical files and databases is:
A) normalized relations.
B) physical tables created.
C) implementation complete.
D) all datatypes determined.
selecting structures
A key decision in the physical design process is:
A) ignoring the size of the database.
B) selecting structures.
C) deciding on the monitor.
D) deciding the e-r diagrams
descriptions
Designing physical files requires ________ of where and when data are used in various ways.
A) maps
B) descriptions
C) keys
D) hints
transaction volumes
Database access frequencies are estimated from:
A) transaction volumes.
B) user logins.
C) security violations.
D) random number generation
data type.
A detailed coding scheme recognized by system software for representing organizational data
is called a(n):
A) DBMS code.
B) data type.
C) SQL.
D) DB layout.
use a lot of storage space
All of the following are objectives when selecting a data type EXCEPT:
A) represent all possible values.
B) improve data integrity.
C) support all data manipulations.
D) use a lot of storage space
boolean.
All of the following are valid datatypes in Oracle 11g EXCEPT:
A) varchar2.
B) boolean.
C) blob.
D) number
field.
The smallest unit of application data recognized by system software is a:
A) field.
B) row.
C) data type.
D) column.
Char
An appropriate datatype for one wanting a fixed-length type for last name would include:
A) VarChar.
B) Char.
C) Blob.
D) Date.
Blob
An appropriate datatype for adding a sound clip would be:
A) VarChar.
B) Char.
C) Blob.
D) Date.
Improve data integrity
Which of the following is an objective of selecting a data type?
A) Represent a small number of possible values
B) Maximize storage space
C) Limit security
D) Improve data integrity
Physical
In which data model would a code table appear?
A) Conceptual
B) Logical
C) Physical
D) Data layout
range control
An integrity control supported by a DBMS is:
A) substitute estimates.
B) security.
C) range control.
D) GUI guards.
Default Value
The value a field will assume unless the user enters an explicit value for an instance of that
field is called a:
A) c.
B) null value.
C) range control.
D) gurand.
track missing data with special reports
A method for handling missing data is to:
A) substitute with random numbers for the missing data.
B) track missing data with special reports.
C) perform insensitivity testing.
D) delete the associated row
checking to see if missing data will greatly impact results
Sensitivity testing involves:
A) checking to see if your teeth hurt when you brush.
B) seeing how accurate data are.
C) checking to see if missing data will greatly impact results.
D) deleting the associated row.
data integrity
The storage format for each attribute from the logical data model is chosen to maximize
________ and minimize storage space.
A) query design
B) programmer productivity
C) data integrity
D) data integration
tablespace.
Within Oracle, the named set of storage elements in which physical files for database tables
may be stored is called a(n):
A) extent.
B) table.
C) tablespace.
D) partition
operating system
While Oracle has responsibility for managing data inside a tablespace, the tablespace as a
whole is managed by the:
A) user.
B) CEO.
C) XML.
D) operating system.
extent.
A contiguous section of disk storage space is called a(n):
A) track.
B) sector.
C) extent.
D) tablespace.
clustering
A method to allow adjacent secondary memory space to contain rows from several tables is
called:
A) cluttering.
B) clustering.
C) concatenating.
D) compiling.
Pointer
A(n) ________ is a field of data used to locate a related field or record.
A) key
B) index
C) lock
D) pointer
file organization
A(n) ________ is a technique for physically arranging the records of a file on secondary
storage devices.
A) physical pointer
B) retrieval program
C) file organization
D) update program
Sequential
Which type of file is most efficient with storage space?
A) Sequential
B) Hashed
C) Indexed
D) Clustered
Hashed
Which type of file is easiest to update?
A) Sequential
B) Hashed
C) Indexed
D) Clustered
efficient storage.
A factor to consider when choosing a file organization is:
A) inefficient data retrieval.
B) DDL.
C) efficient storage.
D) DML.
secondary key
One field or combination of fields for which more than one record may have the same
combination of values is called a(n):
A) secondary key.
B) index.
C) composite key.
D) linked key
Hashed
In which type of file is multiple key retrieval not possible?
A) Sequential
B) Hashed
C) Indexed
D) Clustered
hash index table.
A file organization that uses hashing to map a key into a location in an index where there is a
pointer to the actual data record matching the hash key is called a:
A) hashed file organization.
B) hash key.
C) multi-indexed file organization.
D) hash index table
heap file organization.
A file organization where files are not stored in any particular order is considered a:
A) hashed file organization.
B) hash key.
C) multi-indexed file organization.
D) heap file organization
join index.
An index on columns from two or more tables that come from the same domain of values is
called a:
A) bitmap index.
B) multivalued index.
C) join index.
D) transaction index
hashing algorithm
A(n) ________ is a routine that converts a primary key value into a relative record number.
A) record index calculator
B) index pointer program
C) hashing algorithm
D) pointing algorithm
parallel query processing.
A method that speeds query processing by running a query at the same time against several
partitions of a table using multiprocessors is called:
A) multiple partition queries.
B) perpendicular query processing.
C) parallel query processing.
D) query optimization.
explain plan
A command used in Oracle to display how the query optimizer intends to access indexes, use
parallel servers and join tables to prepare query results is the:
A) explain plan.
B) show optimization.
C) explain query.
D) analyze query
a one-to-many relationship
All of the following are common denormalization opportunities EXCEPT:
A) two entities with a one-to-one relationship.
B) a one-to-many relationship.
C) a many-to-many relationship with nonkey attributes.
D) reference data.
efficient data processing
In most cases, the goal of ________ dominates the design process.
A) efficient data processing
B) security
C) quick pointer updates
D) shorter design times
horizontal partitioning.
Distributing the rows of data into separate files is called:
A) normalization.
B) horizontal partitioning.
C) vertical partitioning.
D) file allocation.
when different categories of a table's rows are processed separately.
Horizontal partitioning makes sense:
A) when different categories of a table's rows are processed separately.
B) when less security is needed.
C) when partitions must be organized the same.
D) when only one category is allowed
efficiency.
An advantage of partitioning is:
A) efficiency.
B) remote optimization.
C) extra space and update time.
D) increase redundancy.
extra space and update time
A disadvantage of partitioning is:
A) simplicity.
B) remote optimization.
C) extra space and update time.
D) shorter technology spans.
multivalued partitioning
All of the following are horizontal partitioning methods in Oracle EXCEPT:
A) key range partitioning.
B) hash partitioning.
C) multivalued partitioning.
D) composite partitioning
Vertical
________ partitioning distributes the columns of a table into several separate physical
records.
A) Horizontal
B) Crossways
C) Vertical
D) Final
data replication.
A form of denormalization where the same data are purposely stored in multiple places in the
database is called:
A) data duplication.
B) data replication.
C) advanced placement.
D) horizontal partitioning
First, one must choose the storage format (or data types) of each attribute. These should
be chosen to maximize data integrity and minimize storage space. Next, one must decide how to
group the attributes from the logical model into tables. One must also decide upon the file
organization to be used to store records, and also think about protecting data and how to recover
if errors are found. Finally, one has to think about indexing and query optimization. What types
of indexes will be required for the most efficient retrieval? How can queries be optimized?
Discuss the critical decisions that must be made during physical database design
Answer: Careful physical design enables an organization to demonstrate that data are accurate
and well protected. Laws such as Sarbanes-Oxley and Basel II have been enacted to protect
investors by requiring an internal control report within each financial report. Without a good
physical design, an organization would be hard pressed to prove that its data are accurate
Discuss why physical database design is a foundation for compliance with regulations on
financial reporting.
Data volume and frequency of use statistics are important inputs to the database design
process. One has to maintain a good understanding of the size and usage patterns throughout the
lifecycle of the database in order to plan the design.
Discuss the rationale behind data volume and usage analysis.
Selecting a datatype involves four objectives that have different levels of importance
for different applications. One must represent all possible values, improve data integrity, support
all data manipulations and minimize storage space.
Explain how one goes about choosing data types.
Coding techniques are used in cases when attributes have a sparse set of values or have
large values which would negatively impact storage space. In this case, one can translate the
values into codes. For example, one may have a list of colors for a car, which would have a
limited number of values. In this case, one could create a lookup table for color and then replace
the value for color with a code.
Discuss coding techniques and how they could be used.
In many database management systems, data integrity controls can be built into the
field. The data type is one integrity control. Another integrity control is default values, where one
specifies the value that the field will take if no data has been entered. Range control is another
data integrity constraint which indicates the upper and lower bounds for the field. Null value
control allows one to specify whether null values are allowed or not in the field. Finally,
referential integrity ensures that any value entered into a foreign key must have a corresponding
value in the primary key of the related table. This ensures that there are no misplaced values in
the foreign key.
Discuss how data integrity can be controlled
When dealing with missing data, there are many options. Two integrity controls
(default value and null control) will handle this problem. One also could substitute an estimate
for the missing value which would then be stored in place of the missing value (and marked as an
estimate). Another possible method is to use triggers to track missing values and create a log
which can be viewed later. One other method is to perform sensitivity testing, which will tell us
whether having missing data greatly impacts results. If results are not impacted, then the missing
values are ignored
How are missing values generally handled in database management systems?
The seven factors are:
1. Fast data retrieval
2. High throughput for processing data input and maintenance transactions
3. Efficient use of storage space
4. Protection from failures or data loss
5. Minimizing need for reorganization
6. Accommodating growth
7. Security from unauthorized use
List the seven factors to consider when choosing a file organization for a particular file in a
database.
A tablespace is a named logical storage unit in which data from one or more tables,
views, and indexes can be stored. A tablespace is mapped to one or more physical files in the OS
and can be set to increase in size automatically. A tablespace can, in many ways, be thought of as
its own file system available to the RDBMS.
Explain what a tablespace is and how it is used.
A hashing algorithm is a routine which converts a primary key value into a record
address. Hashing algorithms are typically mathematical routines which perform computations
such as dividing a primary key by a prime number and then using the remainder for a storage
location. The hashing algorithm is used in a file organization to determine where to store the
record. One limitation of this is that only one key value can be used for an index.
What is a hashing algorithm and how is it used in hashed file organizations?
Clustering is a technique used in some database management systems (such as Oracle)
to allow adjacent secondary memory space to contain rows from several tables. A cluster is
defined by the tables used as well as the column or columns used for clustering. To create a
cluster in Oracle, for example, we would create a cluster first with a create cluster command and
then reference the cluster when we create each individual table.
What is clustering?
In sequential file organization, the records are stored sequentially according to a
primary key value. In index file organizations, the records are stored either sequentially or nonsequentially.
An index is created that allows the application software to locate individual records.
The index can be thought of as a card catalog.
Explain the differences between sequential and indexed file organizations.
A primary key index is automatically generated when a table is created (provided that a
primary key is specified). The primary key index identifies uniquely each row of the table. Thus,
there is one entry in the primary key index for each row. A unique key index is similar, except
that we have to define this with a separate create index command. A secondary key is nonunique.
So, for example, we might create a secondary index on state. Since many records would have the
same state, this is non-unique
Explain what primary, secondary and unique key indexes are.
One should always specify a primary key index for each table. Also, indexes are most
useful on larger tables. Columns that frequently appear in WHERE clauses of SQL statements
are good candidates for indexes, as well as those fields referenced by WHERE clauses and
GROUP BY clauses. Any field where there is a wide variety of values is also a good candidate.
One should use caution when creating indexes, however, since indexes do require overhead and
must be updated for each insert, delete and update. This can cause significant overhead if there
are a large number of transactions taking place against the database
Discuss when it is best to use indexes.
There are several possible situations when denormalization makes sense. When we
have two entities with a one-to-one relationship and high access frequency between them, then it
makes sense to combine them into one entity. If one entity is optional, then there will be fields
with null values in the combined entity.
Another scenario is when you have an associative entity with nonkey attributes. If there is a high
frequency of join operations between the entities, it might be more efficient to combine the
associative entity with one of the other entities.
The third case is reference data. If we have a 1:M relationship and the entity on the one side does
not participate in any other relationships, then it might be wise to merge the entities
Discuss three possible opportunities for denormalization
Partitioning data has several advantages. First, queries are more efficient since data
queried together are stored close to one another. Also, each partition of data can be stored in such
a way as to optimize its own use. Partitioned data is more secure, since it is much easier to
isolate data and make it available only to a group of users. Since smaller files take less time to
backup and recover, it is easier to keep a partitioned database up and running. Also, the effects of
damage are isolated, since other files (partitions) may not be affected by one damaged file. It is
also easier to load balance, since the smaller files can be spread across multiple disks.
Some disadvantages include inconsistent access speed. Different partitions may have different
access times, depending upon physical storage of the file. Also, partitioning adds complexity for
the programmer. Although to the user, partitioning appears transparent, this is often not the case
for a programmer. Finally, data may be replicated across multiple partitions, which will require
more storage space and possibly more time for updates
Discuss the advantages and disadvantages of partitioning
Horizontal partitioning is best used when different categories of rows are processed
differently. For example, if you would like to have a report for each region, then it might make
sense to partition the data by region. This type of partitioning will increase query performance.
Horizontal partitioning is most often done by using a single column value or a date.
Vertical partitioning splits up the columns of a table, repeating the primary key. This would best
be used for only allowing access by a group of users to a portion of the data. Of course, in both
cases, tables could be joined to retrieve all data.
When would you use horizontal partitioning, and when would you use vertical partitioning?
Since many of today's computers have either multiple processors or multi-core
processors, many database management systems have introduced parallel query processing. With
parallel query processing, a query can be broken apart and partitions can be run separately on
each processor. The partitions have to be defined in advance by the database designer. The
advantage to parallel query processing is speed, since the query can be done in much less time in
parallel. The query has to be tuned to achieve the best degree of parallel
What is parallel query processing and how is it useful?
Sometimes, the query designer may know information that can be overlooked by the
query optimizer. Often, in the course of testing queries, one may find that it is actually faster not
to use a certain index or to use a different index. When this is the case, database management
systems such as Oracle include a facility to override the query optimizer, called query hints.
Why would you want to override automatic query optimization?