Only $2.99/month

INF 282 - Chapter 2

Terms in this set (93)

A description of a policy, procedure, or principle within an organization. A brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization.

For example, a pilot cannot be on duty for more than 10 hours during a 24 hour period, or a professor may teach up to four classes during a semester.

In order to be effective, these must be easy to understand and widely disseminated, so that everyone within the organization shares the same interpretation of the rules.

These apply to any organization, no matter how large or small, that stores and uses data to generate information. In simple terms, they describe the main and distinguishing characteristics of the data as viewed by the company.

When beginning to build a data model, database designers consider what types of data exist in the organization, how the data are used, and in what time frame they are used. However, this information does not, by itself, yield a full understanding of the full business - that only comes from these, when they are properly defined.

When derived from a detailed description of an organization's operations, they help to create and enforce actions within that organization's environment. They must be put into writing and updated to reflect any changes in the organization's operational environment.

Properly written, these are used to easily define entities, attributes, relationships, and constraints, or they define them themselves.

The main sources of these are company managers, policy makers, department managers, and written documentation, such as the organization's procedures, standards, and operations manual.

These can also come from direct interviews with end users - however, these often lead to unreliable or inconsistent results, because different people have different perceptions of the policies. In the end, it pays to always verify the company's policies.

Verifying, identifying, and documenting an organization's rules pays off because it can help to standardize a company's view of data, it can serve as a communication tool between end users and designers, it allows designers to understand the nature, role, and scope of the data, it allows the designers to understand business processes, and it allows the designer to develop appropriate relationship participation rules and constraints and create an accurate data model.

As a general rule, nouns in these will equate to entities in the data model, and verbs will equate to relationships between those entities.
A relatively simple representation, usually graphical, of more complex real-world data structures. This represents data structures and their characteristics, relations, constraints, transformations, and other constructs with the purpose of supporting a specific problem domain.

A representation, usually graphic, of a complex "real-world" data structure. They are used in the design phase of the Database Life Cycle.

An important thing to note is that this is merely an abstraction - you cannot draw the needed data from this. But, ideally, this should be a "blueprint" for how to build a database that meets all of the end user's needs.

Just as you can't build a good house without a good blueprint, you can't build a good database without a good data model.

Its evolution, from the early hierarchical and network models in the 1960s to the object/relational database management systems and NoSQL databases of today, have always been driven by the search to model and manage increasingly complex real-world data.

Throughout this evolution, several standards have come to be accepted. First, it has to have a conceptual simplicity, without compromising the completeness of the database. It doesn't make sense to have a data model that's harder to understand than the real world itself. However, at the same time, it has to be unambiguous and applicable to the problem domain.

Next, it must represent the real world as closely as possible. Throughout the different data model generations, this has been completed by adding more semantics, more terms and words, to the data model's data representations.

Each new generation of data models addressed the shortcomings of the previous generation.

This serves as a bridge between real world objects and the computer database. It also serves as a crucial communication piece between the database designer, the application programmer, and the end user. With this communication, everyone can understand better the organization's data, what it's used for, and how it should be implemented.

There are four main building blocks - entities, attributes, relationships, and constraints. When considering these four building blocks, database designers generally start with gaining a thorough understanding of what types of data exist in an organization, how the data are used, and in what time frames they are used.

When naming objects within this, the designer should use unique and easily distinguishable names for entities, attributes, relationships, and constraints. These names should also be descriptive and terminology familiar to the end users.

Proper naming conventions can help to make these self documenting, as well as improve communication between the designer, application manager, and end user.
In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation.

It is the representation of a database as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.

It can also be seen as the designer translating the conceptual model into the requirements of the chosen database management system. Ideally, the designer would use logical design and create a conceptual model that could be implemented in any database management system.

A detailed internal model is especially important to database designers who work with hierarchical or network data models, because those require precise specification of data storage locations and data access paths.

The relational model, though, requires less detail in this phase because most RDBMS handle data access paths transparently, where the designer isn't aware of how the data is accessed - the specificity in this phase isn't as necessary, but it does still help.

This is said to be software dependent, because a change in the software would affect the model - it would need to be changed to fit the characteristics and requirements of the chosen new model.

However, a change in the hardware won't affect the model, so it is still considered hardware independent. A change in the storage devices or even a change in the operating system won't affect the model.

A specific representation of this is referred to as the internal schema.
A data model based on a structure composed of two data elements, a key and a value, in which every key has a corresponding value or set of values.

This data model is also called the associative or attribute-value data model.

Early leaders in the NoSQL database model, such as Amazon's SimpleDB, Google's BigTable, and Apache's Cassandra, point to this structure being one of the leaders of the next generation of data models, as well as column stores.

This data model stores data in secondary storage, just like any other database. What separates this model from other data models is how the data is stored.

In this type of model, the table or database has two columns - the key, defining the attribute, and the value, defining the data attached to that attribute.

To accommodate for the variety of data types that could be in the value column, every piece of data in the value column is stored as a long string type. Because of this, it becomes difficult to index all of the data, and searching becomes difficult.

This is said to be schema-less, because it's so easy to add values - you merely add a new row, and attribute it to the correct object.

This is a good structure when it comes to data where the attributes are many but there aren't that many actual values for the attributes (scarce data). But, because of the issues of maintaining data integrity and relationships, and the increased complexity of any search query, with having every value as the same data type, this type of structure would be poor for every day business.
A new generation of database management systems that is not based on the traditional relational database model. It also supports distributed database architectures, provides high scalability, high availability, and fault tolerance, it supports large amounts of sparse data, and is geared towards performance rather than transaction consistency.

Despite these broad rules, there is no standard model for this type of database. However, examples of early success stories include Amazon's SimpleDB, Google's BigTable, and Apache's Cassandra, and they point to the use of key-value store and column store models.

This generation of databases was created due to the need of different organizations needing to handle Big Data needs. In recent years, companies have begun to be flooded with web data, including browsing patterns, purchasing histories, customer preferences, behavior patterns, and social media data from sites such as Facebook and Twitter.

The relational model doesn't always handle these Big Data needs efficiently, due to several reasons.

However, despite the issues with the relational model, it is still the preferred model for handling day-to-day transactions and structured data analytic needs. 98 percent of market needs are met by object/relational database management systems. The remaining two percent are served by databases built in this model.

These types of database do not enforce relationships among entities. Instead, that is up to the application programmer to manage in the program code. In addition, it's up to application programmers and to manage data integrity and validation.

These databases also use their own application programming interfaces (APIs), so they don't use SQL. Because of that, it's up to the programmers to code the correct way to retrieve and manipulate the data.

One of the big benefits of using this type of structure is that is can be implemented using cheap commodity servers, and it can be easily implemented using a distributed database of multiple nodes. However, this leads to issues of enforcing data consistency.

In order to combat this issue with data consistency, distributed databases will automatically make copies of data at multiple nodes in order to ensure high availability and fault tolerance. If the main node with the data goes down, then that data can be pulled from one of the nodes that has a copy.

However, if the network goes down during an update, the node won't always have the most updated information. This is one of the issues with this type of database - they sacrifice data consistency for performance. But some of these database have eventual consistency, which means that updates will trickle through the nodes until every node has the most up-to-date data.

Another benefit is that it can handle large amounts of data. It's actually built for sparse data, where there are a large amount of attributes, but not much actual data for each of those attributes.

These types of databases also support the ability to add nodes to the distributed database transparently and without any downtime, and also designed to keep working if one of the node fails.

Despite this being one of the most popular models in database design, it is only one of multiple emerging trends in data management.
A data model whose basic modeling structure is an object.

This was demanded by complex real world problems that demonstrated the need for a data model that closely represented the real world. The object, the basis for this model, more closely represents real world items. Objects store both data and their relationships.

This was developed to meet a specific need, as was the relational data model. This model was created to address specific engineering needs for more complex objects, not the wide-ranging needs of general data management. Because of this very specific need, this data model hasn't been as developed as the relational model ever since its inception.

However, the use of this data model increased when the internet was created, and businesses realized they could use the internet to access, distribute, and exchange critical business information, leading to the development of XML.

It introduced support for complex data within a rich semantic framework, and was combined with the relational data model in the extended relational data model.

These are typically depicted using UML class diagrams and UML notation.

The advances due to this model influenced many areas, from system development and design to programming. Many advanced programming languages, including Java, Ruby, Perl, and C#, have all adopted object-oriented concepts.

The added semantics created by this model allow for richer representation of complex objects, and more meaning gleaned from them. In turn, this allowed applications to support increasingly complex objects in innovative ways.
Developed by E.F. Codd of IBM in 1970, it represented a major breakthrough for users and designers because of its conceptual simplicity.

This replaced the network and hierarchical models due to its simpler data representation, superior data independence, and easy-to-use query language. These features made it the preferred data model for business applications.

This has not yet been replaced as the preferred data model, but it does now have several competitors in the object-oriented data model and the NoSQL database market.

Both this and the object-oriented data model were developed to address different issues. This model was developed for better general data management based on sound mathematical theory. It is based on mathematical set theory and represents data as independent relations.

Each relation (table) is conceptually represented as a matrix of intersecting rows and columns. Seperate relations are related to each other through the sharing of common entity characteristics, or attributes (values in columns).

Although the relations (tables) are separate from each other, it is easy for a user to associate different tables with each other. Because of this, there is limited, controlled data redundancy, but it helps the understanding of the data, rather than hurt it (as was the case in the file system).

A table resembles a file, but the main difference is that a relation is completely independent in terms of its data and structure, as it is a purely logical structure. The table has nothing to do with how the data is actually stored - the DBMS hides all of that from the user.

Another reason for this model's rise to power is the use of SQL, or other query languages. With these languages, the user specifies what needs to be done, but doesn't specify how - the DBMS handles that.

The conceptual simplicity set the stage for a genuine database revolution. Originally, the model was considered ingenious but impractical - contemporary computers didn't have the power to implement the model.

However, the power of computers and the efficiency of operating systems improved over time to where now, personal computers can implement easy-to-use databases based off of this model.

This is implemented through a special DBMS called a relational database management system, which hides all of the complexities behind the scenes, so that the user doesn't have to worry about all of that, and presents the data in logical and easy-to-understand tables. However, these tables are merely a logical presentation format - the data is stored separately, so that any change to the data or structure won't cause the designer to have to change everything.

Although this model is a vast improvement over the network and hierarchical data models, it still lacked features that would make it effective for database design. Because the data was presented graphically in the end, designers tended to use graphical elements when designing their databases. Thus, the entity relationship model was born.

Due to its robust foundation in broadly applicable principles, this model is easily extended to include new classes of capabilities, such as objects or XML.

When it comes to the needs of companies that have Big Data concerns, though, this model doesn't always match the needs.

For example, within this model, it's not always possible to fit unstructured data into the conventional relation structure of rows and tables. Adding millions of different attributes a day in different formats will lead to the need for more storage space, processing power, and sophisticated data analysis tools. Because of this need, there's a hefty price tag that comes with using this model for Big Data needs.

Data analysis using online analytical processing tools (OLAP) typically has a lot of success with this model. However, when it comes to Big Data, the amount of unstructured data makes this difficult to accomplish if this model is being used.

Despite these issues, databases based on this model remain the preferred database model to support most day-to-day transactions and structured data analytic needs.