database alll

STUDY
PLAY

Terms in this set (...)

What is a database?
The first - and most obvious - question to ask when you take up this subject
is the simplest - 'What is a database?' Certainly, you will have dealt with them,
indirectly, almost daily. Whether you are in a shop in person or whether you are exploring its catalogue on the internet, when you check whether a
product is in stock, it is likely that a database will be used somewhere within the system. Amazon and Facebook, YouTube and iTunes all use databases to
deliver products and services to their users.
The database and its structure may be quite obvious to the user for a library catalogue or an online retailer, but it may also be serving a less direct purpose, allowing the company to keep track of its employees and suppliers, or helping
an advertiser track visitors to web pages across different sites, tailoring their adverts to match a browser's activity
A database system is a system that stores data. To qualify as a database system,
there are some features that it would have to offer:
•• find (retrieve) data
•• add (insert) new data
•• delete unwanted data
•• change (update) data.
This definition will be refined and formalised in the sections to come, but first,
we can illustrate these features with an example.
Consider a shoe shop, specialising in trainers. The shop keeps information
about the products it sells. This information could be organised in the form of
a table, as shown in Figure 2.1 (prices are in U.K. pounds sterling, shown as £),
and could be part of the shop's database.
Model Brand Size (EU) Location Price Stock level
Air Max 90 Nike 35 K1S4 £40.00 3
Air Max 90 Nike 37 K1S4 £40.00 12
Mesh Kaplan 48 MFash1 £40.00 18
Gel Lyte VI Asics 46 MFash3 £100.00 14
... ... ... ... ... ...
A database has a structure and content. The structure is represented in this
example by the table headings; the content by the body of the table. The
content changes in time - it is dynamic in nature. The structure can change,
but it is far less changeable than the content. For instance, you could add
a new column to this table - the type of trainer or the activity it might be
associated with - but you would not expect to make such changes that often.
The structure of the database is called its intension and the content is called
its extension (we will return to this in more detail later in this chapter).
Although it may not be obvious from this example, a database is capable of
storing a large amount of data.
So far, a database system is, for us, nothing more than a system that manages
data. But is any system that manages data a database system? Is there
anything that all database systems have in common, that distinguishes
them from other software systems? The answer is obviously yes. In order
to understand the 'database approach', we shall first have a brief look at
file-based systems. In appearance (behaviour) they are similar to database
systems, but they are conceptually (qualitatively) different. We shall identify
the drawbacks of the file-based approach to data management and then
introduce the database approach as a solution to most of these drawbacks.
File-based systems
We shall start with a definition of file-based systems.
Definition: A file-based system is a collection of application programs, each
managing its own data.
In a file-based system, permanent data is stored in various files of ad-hoc
structures. Each application program defines and handles its own data files
independently of the others. This approach is called the de-centralised
approach. Each application program works with its data at the physical level,
manipulating records as they are organised in persistent memory. Sharing of
data between applications is likely to b
The concept of a physical level for data is one to which we will return later.
The structure we describe is not purely physical, but we use the term to
indicate that it is to some extent platform dependent, because access to
files is made through the primitives (built-in functionality) of the operating
system.
Take, for example, an estate agent's office, for which we shall consider the
Sales and the Contracts department. Each department maintains its own data
in its own data files, as depicted in Figure 2.3.
The Sales Department needs:
•• detailed information about the properties for rent, so staff can give good
advice to customers (such as Type and No Of Rooms from the Property for
rent file);
•• detailed information about customers, so that their needs can be
appropriately matched to what is available (such as Preferred Type and
Max Rent in the Renter file);
•• 'identification' information - such as name, address and telephone number
- about customers, the properties on offer and their owners.
The Contracts Department needs:
•• detailed information about the renting contracts (in the Lease file);'identification' information - such as name, address, telephone number -
about customers, the contracted properties and their owners.
Some drawbacks of this solution are obvious. These are the limitations of the
file-based
approach in general. The most important are enumerated below.
Duplication. Different applications might have to make use of the same
information. Because each application has its own files, data is duplicated
(e.g. the 'identification' information in our example). This aspect has at
least two negative consequences. Firstly, duplication is wasteful.1 Secondly,
data can become inconsistent - it can have different values in different
files (belonging to different applications), even though it is supposed to
give the same piece of information. For example, the address of an owner,
Mr. J. Morris, might be updated in the Owner file belonging to the Sales
Department, while the Contracts Department might still have Mr. Morris's
old address
Separation and isolation. Data is scattered among different files, each file
belonging to a certain department. A department has access to its own
files, but no access to the files of the other departments. Files belonging
to different departments cannot be used together in order to create more
complex data or analysis. Often, because they are based on different
infrastructures (platforms, development software, etc.) files belonging to
different departments cannot be transferred (copied) across.
Program-data dependence. Each file belongs to a certain application
program. The (physical) structure of data is defined inside the application
program. This could easily - and usually does - lead to incompatible file
formats between applications, meaning that it becomes impossible to share
data between them. Another aspect is that data definition is embedded in
the application program. That means that if the physical structure of data
is to be changed - for instance, if instead of representing a year with two
digits, it is to be represented with four2 - then the application program
itself must be changed. Not only that, but the methods of access and
manipulation of data are also embedded in the application program (for
instance, in previously-defined queries); to change them, the application
program must be modified
In the file-based approach, the emphasis is placed on functionality - provided
by the application program. Data modelling takes a lower priority. This
approach leads to the drawbacks we have listed. If the approach is inverted
and we consider data as central, then these problems can be removed.
Informally, this represents the database approach
Databases and database management systems
We shall start with the definition provided by Connolly and Begg:
Database. 'A database is a (shared) collection of logically-related persistent
data (including its description) as part of the information system of an
organisation.'
Let us now explain this definition. A database is a large repository of data,
in which data is defined once and stored once. Data that was scattered in
different files - with different formats and owners - in the file-based approach,
is now integrated with minimum redundancy (duplication), as a single
resource. Different application programs will share this common resource,
usually concurrently (at the same time).
There are times when redundant data is necessary, some of which will be
explored in this guide. You might choose to store intermediate results that
are called for often (using snapshots) or to ensure the atomicity of a set of
operations (transactions). In these cases, the redundant data is intentionally
included to achieve something extra, and even so requires special treatment.
The diagrams in Figure 2.4 and Figure 2.5 illustrate the differences between
the database approach and the file-based approach
Group 1 of users
Group 2 of users
Group n of users
Files of Application 1
Files of Application 2
Files of Application n
Application 1
Application 2
Application n
Figure 2.4. The file-based approach.
In the database approach (see Figure 2.5), the raw data is integrated in a
common database for all applications. The data is managed by a database
management system (DBMS), which provides shared access to it, for all the
applications in the system.
A database management system is a software system that provides
a set of primitives (built-in functionality) for defining, accessing and
maintaining a database.
A database stores both the raw data and its description. We say that the
information3 stored in a database is self-describing. The description of the raw
data is known as the system dictionary, data dictionary or metadata
The consequence of this approach is program-data independence. This
means that the structure of data may change without affecting the application
programs that use it. This basic definition is going to be refined and better
explained later in this chapter.
This approach - separating the data definition from application programs - is
similar to data abstraction in programming, where the internal definition of
an object is kept separate from its external definition. An outside system can
only see the exterior of the object. As far as the external definition remains
unchanged, any changes in the object's internal definition do not affect the
outside system
A database management system automatically performs a lot of the
housekeeping tasks that would otherwise be the responsibility
application programmer. As a result, the user - i.e. the person who defines
and uses the database - is presented with a clean and powerful set of tools
for database development and exploitation. A more detailed description of
both database systems and database management systems is provided in the
following sections. At this stage, it is important that you broadly understand
why database systems are needed and what their main benefits are
This definition, with its distinction between file-based and database
approaches is quite high-level and functional. From the discussion above,
many organisations that use database software would still be defined as
having a file-based system, if different departments use different database
implementations for storing similar data.
The three-level ANSI/SPARC architecture of a
database environment
Program-data independence is one of the most important advantages
offered by the database approach. This independence can be achieved if the
system is abstracted into two or more levels. A low-level abstraction deals
with how data is organised on the physical support.4 Meanwhile, a high-level
abstraction describes the logical structure of data, irrespective of its physical
representation. This separation allows the separation of the design of a
database from the details of its implementation.
This idea can be further refined.
This idea can be further refined.
•• Users and application programs should be freed from considering the
aspects of the system related to the physical representation of data, such
as storage and accessing details. Instead, they should be able to take into
consideration only the logical structure of data. Rather than having to deal
with such aspects in each application program it would be much better if
these problems were to become the responsibility of the system (DBMS)
that manages data.
•• It should be possible to change the physical representation of data without
affecting users, as long as its logical structure is preserved.
•• As we have seen, the database integrates all the information required
within an organisation. Individual users will often only need (or be allowed)
access to certain parts of this 'pool of information'. Each user, then, needs
to have a customised view of the database and it should be possible to
change that view without affecting other users
These aims were formalised in the early 1970s and codified and adopted
as a standard in 1975 as the ANSI/SPARC three-level architecture. The
architecture forms a basis for most modern DBMS.
The ANSI/SPARC architecture consists of three levels of abstraction (see Figure
2.6). The external level represents the way data is viewed by individual
users. The conceptual level represents the way the organisational data (i.e.
all data that is relevant for the organisation) is structured.represents the way data is physically stored, although the very lowest-level
aspects of that are likely to be handled by the operating system itself
The external level. This incorporates each user's external view of the
database. A user's view consists only of the data needed by that user. Other
data may exist in the database, but the user does not need to be aware of it.
For instance, suppose that the database of a software company includes
information about its employees. The Personnel Department's view of the
employees - the data that is relevant to them - might consist of: name,
address, sex, date of birth, qualifications, department for which the employee
works, current salary, job contract details, and details about previous jobs.
The Personnel Department needs to be able to access this data about any
employee in the company
On the other hand, the Development Department's view of the employees
might consist of: departmental ID, name, telephone number, timetable, the
projects in which the employee is involved including the employee's role in
each project, the objectives and their deadlines. Only the data about its own
employees is relevant for the Development Department.
The whole information about the company's employees is stored in the
database. However, the two departments need access to different projections
of data. The data relevant to each department represents the department's
external view of the database (Figure 2.7).
Different users may want to see the same data in different formats, for
instance, name might be stored in multiple fields (for first and last name), but
will be required as a single name. The external level might also include derived
data - calculations based on stored data - for instance, a user might need an
employee's age, which could easily be calculated from the date of birth stored
in the database
The conceptual level. This represents the logical structure of the database
(of all the data required by the organisation). It can be seen as the union
of the whole set of views at the external level. Conversely, any view should
be derivable from the conceptual level. The conceptual level represents
the information stored in the database about the real life system's entities
(objects) and the relationships between them. The representation of data at
this level is still independent from any physical considerations - it specifies
what is stored, rather than how it is stored
The internal level. This describes the physical representation of data.
The internal level specifies how data is stored. It is at this level where
the physical data structure and file organisation are defined. The internal
level is situated at the interface between the DBMS and the Operating
System (OS). It is quite common that the internal level of the DBMS uses
the file management primitives of the OS. However, there is no clearly
defined boundary between the OS and the internal level. Because of this,
there often exists another level below the internal level, namely the actual
physical support (cylinders, blocks, clusters, etc.) (this extra layer is shown in
Figure 2.6
Name Date of birth Address Job Pay scale
String Date String String Integer
Name Date of birth Address Job Pay scale
A. Johnson 02-02-1995 London Programmer 10
S. Lee 09-07-1996 Leeds Analyst 14
J. Singh 05-05-1989 Edinburgh Programmer 12
The database schema consists of three types of schemas, one for each level of
abstraction:
•• External schemas describe the external level. There is one schema for
each view.
•• The conceptual schema (one only) describes the conceptual level. All
the definitions should only take the logical structure of the data into
consideration. Implementation aspects, and user views, should be
disregarded.
•• The internal schema describes the internal level. It defines the physical
records, methods of representation, index implementation, etc.
As a result of this separation of concerns, mappings are needed to allow
navigation between the schemas. Since there are three types of schema, there
are two types of mapping:
•• External/conceptual defines the correspondence between an external
schema and the conceptual schema.
•• Conceptual/internal defines the correspondence between the conceptual
and the internal level.
Figure 2.9 illustrates these ideas with a simple database maintaining
information about a company's employees. Two external views of the
database are considered in this example; one for the Finance Department and
one for the switchboard. The information needed by the Finance Department
uses the full name, age and salary of each employee and is defined by its
corresponding external schema. Each employee must be uniquely identified.
Since two employees might have the same name, a unique identifier, ID, is
used.
The information needed by the switchboard, defined by the respective
schema, only refers to the name, job and telephone number of each
employee. The switchboard needs employees' first and last names to
be separate, so they can be sorted easily. For the switchboard, unique
identification of each employee is less critical, and the job title combined with
the employee's name is used. The assumption that two employees having
the same name will not have the same job title as well is considered safe
enough.
Other external views might exist too, and other data might be stored in the
database but we are dealing with a heavily simplified example for the sake of
clarity.
The conceptual schema, that defines the conceptual level, unites the data
required to support the two views. It specifies the identifier, first and last
name, date of birth, job title, employment date, salary scale and the telephone
number of each employee.
The link between the external schemas and the conceptual schema is made
by the external/conceptual mappings. The mappings corresponding to the
Finance Department's view is defined as:
Finance Department schema Conceptual schema
ID=Id Id
Name=concatenate(Fname, ' ', Sname) Fname, Sname
Age=current_year() - yearOf(BirthDate) BirthDate
Pay Scale = Scale Scale
The other mapping - between the Switchboard's view and the conceptual
schema - is self-explanatory.
The internal schema consists of the data structures that are used to represent
(implement) the conceptual schema. For the above example, this is struct
STAFF, in a C-like hypothetical language. It also can include other structures
(i.e. not derived from the logical level), used for pragmatic reasons (e.g.
efficiency). In the above example, an index was defined, INDEXES, in order to
make the retrieval operations (from Table_Employees) faster
The conceptual/internal mapping links the definition of data at the
conceptual level with the way it is actually represented - it links what
data is represented with how it is represented. The table Employees is
implemented as an array of records of type struct STAFF. Note that the index,
Index_Employees, is used purely at the internal level (i.e. it is not mapped
to the conceptual level). This is because the index does not describe the data
as such - it is a way of making access to the data faster or more efficient
As a final point, the issue of data independence can now be reconsidered. One
of the main advantages provided by the three-level architecture of a database
system is the provision of data independence. There are two types of data
independencePhysical data independence is the immunity of application programs to
changes at the internal/physical level.
Logical data independence is the immunity of application programs to
changes at the conceptual level.
For instance, in the example above, the Employees table can be
implemented using a linked list. If the conceptual/internal mapping is
modified appropriately, and as long as the conceptual schema stays the same,
the application programs (situated above) remain unaffected.
Logical data independence may be more difficult to achieve since application
programs typically rely heavily on the logical structure of data. However,
suppose another view is needed, for the Personnel department, which requires
information about the address and the family status - such as marital status,
dependants and next of kin - for each employee. The conceptual schema can be
extended as necessary, without affecting the other two views
At the highest, most general level, a database environment consists of:
•• data, representing the information needed for an organisation;
•• software, serving two purposes:
the management of the stored data, and
further processing of the data to the users' needs;
•• hardware, supporting both the stored data and the software components;
•• users, broadly divided into two categories:
developers of the database system, and
users of the system
Data
Data can be classified into two categories, namely:
1. Primary data - the fundamental information necessary to provide the
database service, stored on permanent support, such as hard disks.
2. Derived data - information that can be inferred or calculated from primary
data (and may be recalculated at any time).
Derived data may be the output of the application programs - the result of
processing the primary data - in a form suitable for the users' needs, but it can
also be the input from users that will then be processed by the application to
be stored as primary data
The focus of a database system is on primary data. This has to be appropriately
identified, described and implemented. The primary data has three important
characteristics. It is:
•• integrated, rather than existing in separate systems - it has been gathered
together into a single system5
•• shared, with all the applications belonging to the information system
having common access to (at least parts of ) it
•• extensive, in that database systems are usually developed for data
intensive applications, where their benefits are more clearly felt
Software
The software component can be seen as consisting of three layers (Figure
2.11):
•• the operating system (OS), positioned at the base, provides the necessary
routines for accessing the hardware resources (such as file handling or
memory management routines);
•• the database management system (DBMS), placed above the OS - and
using the routines that the OS makes available - provides all the necessary
primitives for data management, including languages for defining
schemas, manipulating and reading data and so on;
•• application programs, above the DBMS - and using the routines made
available by the DBMS - provide data formats and computations beyond
the capabilities of the DBMS.
The hardware and the OS are often grouped together and called the platform.
There is considerable variation between platforms, which is one reason
for having the DBMS software handle this variation and present a more
abstracted interface to higher-level components. This provides a platform
independence that shields the application programs from unnecessary
physical details, and means that we need not concern ourselves with details
of hardware or OS for the remainder of these subject guides. Instead, we focus
on the features provided for the application programs by the DBMS.
The features of the DBMS will be considered in detail over the course of
this chapter. Briefly, the DBMS provides support for schema definition, data
manipulation, data security and data integrity.
The application programs can be of two kinds:
1. user developed;
2. provided together with the DBMS by its developer.
The former class of applications will generally be written in a high-level
programming language, such as C, Java or Python. Support for database
access in such languages is provided by means of a data sub-language,
embedded within the host language. Statements written in the embedded
sub-language are processed and passed on to the DBMS using the
appropriate routines.
Programs provided by the DBMS developer allow the rapid development
of user applications, without the user writing any conventional code.
Programming tools abstract away or remove so much functionality in order
to allow often application-specific software to be constructed quickly; these
are known generically as fourth-generation tools. Home or small business
database systems - such as Microsoft Access or OpenOffice Base - provide
graphical fourth-generation tools for this purpose.
The DBMS can also be referred to as server or backend (server), whereas the
application programs are referred to as clients, or front-ends. Clients use the
services provided by a server for data management. The division between
client and server makes it possible for the server and client to run on different
machines, giving rise to the idea of distributed processing, an issue discussed
in the 'Database architectures' section and elsewhere in these subject
guides
Hardware
As we have seen, the DBMS allows both the developer of a database and
the database users to operate without knowing the details of the hardware
being used. This does not remove from the system administrator the need
to select hardware and operating systems that, firstly, are capable of running
the chosen software; and secondly that can cope with the demands that
will be placed upon it by the database and associated systems. The system
administrator should be satisfied that:
1. There is enough permanent storage space, for instance disk space, to store
the data and any indexes and cached derived data.
2. There is enough temporary storage space, for instance RAM, to hold
intermediate results and computations.
3. There is enough computational power to manipulate the data at the rate
that will be required.
4. There is fast enough communication between components of the system
for moving the data between them. This is only usually an issue for
particularly data-heavy applications or systems with a very large user base.
Users
Users, as a component of a database environment, can be classified in four
categories, according to the role they play.6
Data administrator. The data administrator (DA) is a user who properly
understands the data requirements of the organisation and is in charge of
administering the organisation's data. This user:
• decides which data is relevant and which is not;
• is in charge of applying the organisation's policy and standards;
• decides on the security policy, and so on.
The DA does not need to be a technical expert or a manager. Rather, the DA
is somewhere in between, liaising with the management on one hand, and
with the technical team, on the other
Database administrator. The database administrator (DBA) is the technical
user in charge of the database system. More specifically the DBA is
responsible for the database's design, implementation and maintenance,
and deals with both the correctness of the implementation and the
efficiency of the database system. The DBA must have good technical
knowledge and is in charge of the definition of the DB schemas, integrity
and security rules, access procedures, backup and recovery procedures,
performance of the system, etc
Application programmer. The application programmer writes programs
that perform more complex processing of data (either computations
or formatting). For this, they use either a third-generation language,
embedded with a database language, or a fourth-generation tool. The
resulting programs are for use by end users
End user. The end users are the 'beneficiaries' of the database system.
They may range from technically naïve to extremely sophisticated. A
technically naïve user, for example a bank employee, may interact with the
system using application programs developed for specific tasks. A naïve
user does not have to be aware of the functionality of the DBMS. All they
need is reliable and easy to use programs that they can use with minimal
fuss. A sophisticated user, on the other hand, will know how to access the
database directly, through the database language supported by the DBMS.
Sometimes a sophisticated user might even develop applications, and so
become an application programmer
DBMSs and database languages
The database management system is the software through which all access
to the database is made. This is a concise but limited definition. In reality
the DBMS is responsible for much more. Some of its important features are
presented below.
Data definition. The DBMS must provide support for defining or modifying
the database schema. Schema definition includes specifying data types,
structures, constraints and security restrictions. This is achieved by means
of a data definition language (DDL). The statements made in DDL for
a specific database system represent the system's catalogue (or data
dictionary7). In theory, there should be a DDL at each level of abstraction
(i.e. external, conceptual and internal), but in practice there usually exists a
single DDL that allows definitions at any level.
Data manipulation. The DBMS must provide support for data catalogue (Connolly).
manipulation. In particular it has to support:
• retrieval of existing data, contained in the database
• deletion of old data from the database
• insertion of new data
• modification of out-of
This is achieved by means of a data manipulation language (DML). There can
be a DML at each level of abstraction. At the external and conceptual level, the
DML is concise, comprehensive and easy to use; in other words, the emphasis
is on its expressive power - on these levels, efficiency is a secondary goal.
On the other hand, at the internal level, the emphasis is placed on the DML's
efficiency. This means that its statements are complex - and probably not that
straightforwardly expressible - but quite efficient.
These languages (DDLs and DMLs) are called data sub-languages
because they do not include constructs for the control of flow - they are
computationally incomplete (meaning they cannot be used as general
purpose programming languages
Users can use them directly in order to define and access the database.
However, for applications that require more complex data processing (and
formatting) they are usually embedded into a full high-level programming
language.
Some authors prefer to further divide DMLs into two categories; namely,
procedural and non-procedural (declarative). Within a procedural language
one must specify how the result to be obtained is computed; whereas using
a declarative language one only has to specify what result must be obtained
- what it looks like - the system being responsible for its computation. Since
there are neither pure declarative or pure procedural DMLs - they range
between the two - any classifications of this kind are rather ad-hoc in nature.
For example in certain situations SQL can be considered declarative while in
others it can be considered procedural
An important requirement for DMLs is to allow unplanned or ad-hoc queries;
namely, requests that were not foreseen at the time of design. A problem that
may result from this is how to gain reasonable efficiency for such unpredicted
use.
Other features that the DBMS must provide include:
•• support for data integrity - the system must ensure that there are no
'contradictions' between the data values in the database; this is achieved
based on a set of integrity constraints (part of the data dictionary)
•• support for security control - the system must ensure that data is not
accessed by unauthorised users or applications; this is achieved based on a
set of security rules (part of the data dictionary
recovery services - the ability to restore the database to a previously
correct state in the case of a crash or error
•• concurrency facilities - allowing the database to be accessed by more
than one user at a time
•• support for data communication
•• user-accessible data dictionary
Advantages
Reduced redundancy. In a file-based system each application has its
own private files. This often leads to data being duplicated in different
files, wasting storage space. In a database approach, all data is integrated,
reducing or removing unwanted redundancy. There are various reasons
why eliminating redundancy completely is often not possible or desirable
in a DBMS - and we shall return to these in later chapters. However, where
the file-based system forces redundancy in an ad-hoc way, a DBMS should
provide mechanisms for specifying redundant data and for controlling it (to
maintain the consistency of the database
Avoiding inconsistency. This is largely as a result of the reduced
redundancy. A database is in an inconsistent state if the same item of
information is stored in at least two places in the database, but with
different values. The database approach dramatically reduces that sort
of repetition, making the risk of inconsistent data smaller. Even where
redundant information is stored, the repetition can be made known to
the DBMS, so that the system automatically enforces consistency, so that
whenever some changes are made to one set of data, the same changes are
propagated to the same version that is duplicated elsewhere. The support
provided by most current DBMSs for preventing inconsistencies is limited to
a relatively small number of categories, but the mechanism is present.
Improved data sharing. Since all data is centralised, the restrictions on
which applications and users can see it are ones of security constraints
rather than those of system and network architecture. In contrast to having
a set of separate file-based systems, here all data is integrated, meaning
that more information can be derived from the same amount of data. Both
aspects considerably improve the accessibility of data.
Data independence. As we have seen in earlier sections, a database
approach provides protection for applications from changes in both the
physical and - at least to some extent - the logical structure of the data
(physical and logical data independence).
Some other benefits of the database approach are:
•• the maintenance of the overall information system can be improved due
to data independence
•• integrity can be maintained - any DBMS should allow the specification of
integrity constraints on data
Disadvantages
Complexity. In the database approach the information needed by an
organisation is modelled and implemented as a whole. Where the file-based
approach can often be achieved piece by piece as individual departments
develop a need and budget, the process of developing a database system
is by its nature a single, unifying and more complex process, which will
include:
• data acquisition
• data modelling and design
• database implementation
• database maintenance.
The greater complexity of this process may mean that errors in
implementation, design and data acquisition may occur, and be harder, within
the organisation, to get fixed.
Depending on the organisation and its data need, the database approach may
require extra hardware and IT infrastructure, along with new maintenance
contracts. Depending on the system being replaced, this can make a database
approach more expensive, in terms of either initial or ongoing costs. The
DBMS software itself may cost no money, since there are many free and open
source options, but the system built around it will require developer time and
may also incorporate other, paid-for software. In some cases, the integration of
several systems may represent a reduction in costs, as separate contracts and
IT structures are rationalised and unified
Higher impact of failure. The database system is at the core of the
information system of an organisation. All data is stored centrally, in the
database. As a result, most applications rely on this data. If the DBMS fails,
the whole organisation is paralysed, unlike a decentralised system, where a
failure in one system will only directly affect the department that uses it.
Performance. DBMS software is heavily optimised for its core functionality,
but it is still a generic piece of software. A database application may be
slower for an individual user than a bespoke, perhaps local, file-based
solution.
We have seen that when the data of an organisation is integrated in a single
database, it can be shared between many applications. Accordingly, a 'natural'
organisation of a database system is the client-server architecture (Figure
2.12). The DBMS is the server and the application programs are clients. A
server can also be referred to as back-end
In the client-server architecture, the DBMS (including the database) runs on
a dedicated machine - the server machine. The server machine is tailored to
support the DBMS, both in terms of storage space and computational power. In
high-demand situations, it has to provide:
•• extensive and fast external (persistent) memory
•• powerful processing capabilities (fast processors), combined with sufficient
internal memory.
The main requirement for the server machine is to provide the resources that
the DBMS needs to respond efficiently to the requests received from the clients
(i.e. to provide what they need at an appropriate speed).
Although we speak of the server as a separate machine, it is becoming
increasingly common for virtual servers to be used by organisations on a
subscription basis. These run on remote servers and may offer advantages,
such as easier upgrades and less in-house maintenance, and disadvantages,
including questions about data security and privacy.
The applications would normally run on different client machines, with each
client machine specified to meet the needs of its application or applications.
For instance, if an application program performs complex graphical processing,
then a more powerful graphical workstation might be required, whereas if
an application only performs simple data entry, then a cheaper, less powerful
machine might be enough. If these needs change, it is only the client machine
that has to be 'modified', making thus the client-server architecture quite
flexib
this architecture, each machine supports a part of the organisation's
database and can become a client of the other servers in the network. The
organisation's database is thus constituted from the union of the individual
databases. For this approach to confer its main advantages, though, the
individual databases should be exploitable independently.
Data models
Physical data independence is one of the main advantages of database
systems. This is achieved based on the conceptual level. Users work with
data - both defining and manipulating it at the conceptual level, while the
DBMS takes care of the physical details. We have also mentioned that at the
conceptual level, data is described purely in terms of its intrinsic characteristics
- its logical structure.
The definition and manipulation of data happens at the conceptual level by
reference to a modelling theory. The theory we will primarily be referring to in
this course is called relational theory or the relational model. This remains
the most common data modelling theory for database systems, although
there is increasing competition from other models. The relational theory
consists of:
•• concepts, relational data objects by means of which data is modelled
•• operators, which support the manipulation of the objects in the model
•• rules, specify
Relational theory provides us with the components that we need to model
information and the relationships between parts of the information. It allows
us to define the types of information - such as numbers and text - that will
be stored and to define constraints on it - for instance to indicate that a date
in a particular context must be a past rather than a future date. These are all
concepts that will be considered in more detail in the next chapter
Once a suitable data model has been defined, it must be implemented before
it can be used. A DBMS that implements relational theory to the extent that
it supports the implementation of models defined using relational theory is
called a relational DBMS. The result of implementing an abstract data model
using a DBMS is a database system (see Figure 2.14). In practice, DBMSs
do not fully implement a formal theory, and a restricted subset only will be
available. This means that some data models need to be adjusted before they
can be implemented as database systems
...