Upgrade to remove ads
CIS 2050 Chapter 5: Data and Knowledge Management
Terms in this set (70)
refer to the vast and constantly increasing amounts of data that modern organizations need to capture, store, process, and analyze. Big Data impact the following:
- Human Resources (Health Benefits and Hiring with online assessments)
- Product Development
is a repository of historical data that are organized by subject to support deci- sion makers in the organization
is a group of logically related files that store data and the associations among them. A database consists of attributes, entities, tables, and relationships.
Database decisions, in contrast, are much harder to undo. Database design constrains what the organization can do with its data for a long time
5.1 Rollins Automotive
Dennis Rollins, the owner of a small car lot in Bowdon, Georgia. Dennis needed an effective way to manage the data pertaining to his car lot. Achieving a solid online presence can be difficult for small used car dealers because there are so many makes and models of cars to sell and so many online outlets through which to advertise.
That solution came in the form of Dealer Car Search, a company that specializes in creating Web sites for car dealers. Dealer Car Search provides products for small businesses, dealers, and dealer chains. What ultimately makes the company so successful, however, is its database.
Entity-relationship (ER) Modeling
When database developers in the firm's MIS group build a database, this tool creates a model of how users view a business activity.
The Difficulties of Managing Data
First, the amount of data increases exponentially with time. In addition, data are also scattered throughout organizations, and they are collected by many individuals using various methods and devices.
Another problem is that data are generated from multiple sources: internal sources, personal sources, and external sources
Another problem arises from the fact that, over time, organizations have developed infor- mation systems for specific business processes, such as transaction processing, supply chain management, and customer relationship management. Information systems that specifically support these processes impose unique requirements on data, which results in repetition and conflicts across the organization.
5.2 New York City Opens Its Data to All
New York City passed Local Law 11, which man- dated that city agencies systematically categorize data and make them available to the public. To accommodate this initiative, the city had to redefine its data practices. So, in September 2012, it created an "Open Data Policy and Technical Standards Manual," which outlines how city agencies can gather, structure, and automate data flows to meet the requirements of Local Law 11.
The goal is to enable developers, entrepreneurs, and academics to put data to work in new and innovative ways. Literally anyone can employ his or her skills and creativity to utilize these data to improve the city's quality of life.
Two other factors complicate data management: Federal Regulations and unstructured data overflow.
are those data that visitors and customers produce when they visit a Web site and click on hyperlinks.
refers primarily to problems with the media on which the data are stored. Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access the data. The second aspect of data rot is that finding the machines needed to access the data can be difficult.
is an approach to managing information across an entire organization.
Master Data Management
is a process that spans all organizational business processes and applications. It provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely "single version of the truth" for the company's master data.
are a set of core data, such as customer, product, employee, vendor, geo- graphic location, and so on, that span the enterprise information systems.
which are generated and captured by operational systems, describe the business's activities, or transactions.
Gartner's Big Data
as diverse, high-volume, high-velocity information assets that require new forms of processing to enable enhanced deci- sion making, insight discovery, and process optimization.
Big Data Institute (TBDI)'s Big Data
defines Big Data as vast data sets that:
• Exhibit variety;
• Include structured, unstructured, and semi-structured data;
• Are generated at high velocity with an uncertain pattern;
• Do not fit neatly into traditional, structured, relational databases (discussed later in this chapter); and
• Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.
Big Data generally consists of the following:
• Traditional enterprise data—examples are customer information from customer relation- ship management systems, transactional enterprise resource planning data, Web store transactions, operations data, and general ledger data.
• Machine-generated/sensor data—examples are smart meters; manufacturing sensors; sen- sors integrated into smartphones, automobiles, airplane engines, and industrial machines; equipment logs; and trading systems data.
• Social data—examples are customer feedback comments; microblogging sites such as Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.
• Images captured by billions of devices located throughout the world, from digital cameras and camera phones to medical scanners and security cameras.
Big Data has Three Distinct Characteristics:
Volume: We have noted the incredible volume of Big Data in this chapter. Although the sheer volume of Big Data presents data management problems, this volume also makes Big Data incredibly valuable.
Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company and its customers.
Variety: Traditional data formats tend to be structured, relatively well described, and they change slowly. Traditional data include financial market data, point-of-sale transactions, and much more. In contrast, Big Data formats change rapidly. They include satellite imagery, broadcast audio streams, digital music files, Web page content, scans of government documents, and comments posted on social networks.
Databases that can manipulate structured as well as unstructured data and inconsistent or missing data; are useful when working with Big Data.
Leveraging Big Data
- Creating Transparency
- Enabling Experimentation
- Segmenting Population to Customize Actions
- Replacing/Supporting Human Decision Making with Automated Algorithms
- Innovating New Business Models, Products, and Services
- Organizations Can Analyze Far More Data
File Management Environment
From the time that businesses first adopted computer applications (mid-1950s) until the early 1970s, organizations managed their data.
is a collection of logically related records.
Database minimizes the following problems:
• Data redundancy: The same data are stored in multiple locations.
• Data isolation: Applications cannot access data associated with other applications.
• Data inconsistency: Various copies of the data do not agree.
• Data security: Because data are "put in one place" in databases, there is a risk of losing a lot of data at once. Therefore, databases have extremely high security measures in place to minimize mistakes and deter attacks.
• Data integrity: Data meet certain constraints; for example, there are no alphabetic characters in a Social Security number field.
• Data independence: Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.
Bit (Binary Digit)
represents the smallest unit of data a computer can process. The term binary means that a bit can consist only of a 0 or a 1.
A group of eight bits represents a single character. A byte can be a letter, a number, or a symbol.
A logical grouping of characters into a word, a small group of words, or an identification number.
Fields can also contain data other than text and numbers. They can contain an image, or any other type of multimedia.
is a grouping of logically related fields; describes an entity.
Data File (or a Table)
is a logical grouping of related records
is a diagram that represents entities in the database and their relationships.
is a person, place, thing, or event—such as a customer, an employee, or a product—about which information is maintained.
Entities can typically be identified in the user's work environment. A record generally describes an entity.
Instance of an Entity
is a specific, unique representation of the entity. For example, an instance of the entity STUDENT would be a particular student.
is each characteristic or quality of a particular entity.
is the identifier field or attribute that uniquely identifies a record.
is another field that has some identifying information but typically does not identify the record with complete accuracy. For example, the student's major might be a secondary key if a user wanted to identify all of the students majoring in a particular field of study.
Entity-Relationship (ER) Diagram
Document that shows data entities and attributes and relationships among them.
Entity-Relationship (ER) Modeling
The process of designing a database by organizing data entities to be used and identifying the relationships among them.
refers to the maximum number of times an instance of one entity can be associated with an instance in the related entity.
refers to the minimum number of times an instance of one entity can be associated with an instances in the related entity.
which are attributes (attributes and identifiers are synonymous) that are unique to that entity instance.
One-To-One (1:1) Relationship
a single-entity instance of one type is related to a single- entity instance of another type.
One-To-Many (1:M) Relationship
This relationship means that a professor can have one or more courses, but each course can have only one professor.
Many-To-Many (M:M) Relationship
indicates that a student can have one or more courses, and a course can have one or more students.
Database Management System (DBMS)
is a set of programs that provide users with tools to add, delete, access, modify, and analyze data stored in a single location.
DBMSs also provide the mechanisms for maintaining the integrity of stored data, managing security and user access, and recovering information if the system fails.
Relational Database Model
is based on the concept of two-dimensional tables.
A relational database generally is not one big table—usually called a flat file—that contains all of the records and attributes.
Structured Query Language (SQL)
is the most popular query language used for this operation. SQL allows people to perform complicated searches by using relatively simple statements or key words. Typical key words are SELECT (to specify a desired attri- bute), FROM (to specify the table to be used), and WHERE (to specify conditions to apply in the query).
Query By Example (QBE)
In QBE, the user fills out a grid or template—also known as a form—to construct a sample or a descrip- tion of the data desired. Users can construct a query quickly and easily by using drag-and-drop features in a DBMS such as Microsoft Access. Conducting queries in this manner is simpler than keying in SQL commands.
defines the required format for entering the data into the database.
The data dictionary provides information on each attribute, such as its name, whether it is a key or part of a key, the type of data expected (alpha- numeric, numeric, dates, and so on), and valid values. Data dictionaries can also provide information on why the attribute is needed in the database; which business functions, applications, forms, and reports use the attribute; and how often the attribute should be updated.
is a method for analyzing and reducing a relational database to its most streamlined form to ensure minimum redundancy, maximum data integrity, and optimal processing performance.
5.3 Database Solution for the German Aerospace Center
There are hundreds of operational satellites in orbit around the earth. Each one completes an orbit of the Earth approximately every 100 minutes. The cameras and sensors that many of these satellites carry have made satellite imagery pervasive in today's society.
Consider the German Aerospace Center, or Deutsches Zentrum für Luft und Raumfahrt, with 7,000-plus employees at 16 locations in Germany. As the country's aerospace agency, the DLR maintains a research opera- tion, known as the German Remote Sensing Data Center (DFD) that focuses on the Earth and on atmospheric observation for global monitoring, environmental studies, and security.
The DLR was convinced that the DFD needed a single information management system designed to meet the DFD's various needs, the needs of its commercial clients, and the needs of the German nation. As a result, the DFD developed a Data and Information System (DIMS) to solve the challenge of data storage and archiving.
is a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or an individual department.
Data marts can be implemented more quickly than data warehouses, often in less than 90 days
The basic characteristics of data warehouses and data marts include the following:
• Organized by business dimension or subject. Data are organized by subject—for example, by customer, vendor, product, price level, and region. This arrangement differs from transactional systems, where data are organized by business process, such as order entry, inventory control, and accounts receivable.
• Use online analytical processing. Typically, organizational databases are oriented toward handling transactions. That is, databases use online transaction processing (OLTP), where business transactions are processed online as soon as they occur. The objectives are speed and efficiency, which are critical to a successful Internet-based business operation. Data warehouses and data marts, which are designed to support decision makers but not OLTP, use online analytical processing. Online analytical processing (OLAP) involves the analysis of accumulated data by end users. We consider OLAP in greater detail in Chapter 12.
• Integrated. Data are collected from multiple systems and then integrated around subjects. For example, customer data may be extracted from internal (and external) systems and then integrated around a customer identifier, thereby creating a comprehensive view of the customer.
• Time variant. Data warehouses and data marts maintain historical data (i.e., data that include time as a variable). Unlike transactional systems, which maintain only recent data (such as for the last day, week, or month), a warehouse or mart may store years of data. Orga- nizations utilize historical data to detect deviations, trends, and long-term relationships.
• Nonvolatile. Data warehouses and data marts are nonvolatile—that is, users cannot change or update the data. Therefore the warehouse or mart reflects history, which, as we just saw, is critical for identifying and analyzing trends. Warehouses and marts are updated, but through IT-controlled load processes rather than by users.
• Multidimensional. Typically the data warehouse or mart uses a multidimensional data structure. Recall that relational databases store data in two-dimensional tables. In contrast, data warehouses and marts store data in more than two dimensions. For this reason, the data are said to be stored in a multidimensional structure.
Storage of data in more than two dimensions; a common representation is the data cube.
A common representation for this multidimensional structure.
are subjects such as product, geographic area, and time period that represent the edges of the data cube.
The environment for data warehouses and marts includes the following:
• Source systems that provide data to the warehouse or mart
• Data-integration technology and processes that prepare the data for use
• Different architectures for storing data in an organization's data warehouse or data marts
• Different tools and applications for the variety of users. (You will learn about these tools and applications in Chapter 5.)
• Metadata, data-quality, and governance processes that ensure that the warehouse or mart meets its purposes
ETL or Data Integration
In addition to storing data in their source systems, organizations need to extract the data, transform them, and then load them into a data mart or warehouse. This process is often called ETL, but the term data integration is increasingly being used to reflect the growing number of ways that source system data can be handled.
One Central Enterprise Data Warehouse
Most organizations use this approach, because the data stored in the warehouse are accessed by all users and represent the single version of the truth.
Independent Data Marts
This architecture stores data for a single application or a few applications, such as marketing and finance. Limited thought is given to how the data might be used for other applications or by other functional areas in the organization. This is a very application-centric approach to storing data.
Hub and Spoke
This architecture contains a central data warehouse that stores the data plus multiple dependent data marts that source their data from the central repository. Because the marts obtain their data from the central repository, the data in these marts still comprise the single version of the truth for decision-support purposes.
to maintain data about the data.
whose primary role is to create information for other users.
- IT developers and analysts
utilize information created by others.
- managers and executives
The benefits of data warehousing include the following:
•End users can access needed data quickly and easily via Web browsers because these data are located in one place.
•End users can conduct extensive analysis with data in ways that were not previously possible.
•End users can obtain a consolidated view of organizational data.
5.4 Hospital Improves Patient Care with Data Warehouse
Founded in 1972, Soon Chun Hyang University Hospital has evolved into one of the largest healthcare institutions in South Korea. The hospital operates 2,800 beds in four different cities across the country— Seoul, Gumi, Cheonan, and Bucheon.
As the number of patients and the amount of patient data dra- matically increased, SCHUJ faced a growing challenge in continuing to offer an excellent care experience. To maintain its high standards, the hospital needed to reduce admission times, process patient test results more quickly, and transfer patients for diagnosis or treatment at different locations more efficiently.
SCHUH launched the Integrated Medical Information System (IMIS) project. The purpose of this project was to replace the information silos located at each of the hospital's four sites with a centralized source of patient information; namely, a data warehouse.
Knowledge management (KM)
is a process that helps organizations manipulate important knowledge that comprises part of the organization's memory, usually in an unstructured format.
Intellectual Capital (or Intellectual Assets)
is another term for knowledge.
Knowledge is information that is contextual, relevant, and useful. Simply put, knowledge is information in action.
deals with more objective, rational, and technical knowledge. In an organization, explicit knowledge consists of the policies, procedural guides, reports, products, strategies, goals, core competencies, and IT infrastructure of the enterprise.
is the cumulative store of subjective or experiential learning. In an organization, tacit knowledge consists of an organization's experiences, insights, expertise, know-how, trade secrets, skill sets, understanding, and learning.
Knowledge management systems (KMSs)
refer to the use of modern information technologies—the Internet, intranets, extranets, databases—to systematize, enhance, and expedite intrafirm and interfirm knowledge management. KMSs are intended to help an organization cope with turnover, rapid change, and downsizing by making the expertise of the organization's human capital widely accessible.
is the most effective and efficient ways of doing things.
The KMS Cycle
1. Create knowledge. Knowledge is created as people deter- mine new ways of doing things or develop know-how. Sometimes external knowledge is brought in.
2. Capture knowledge. New knowledge must be identify as valuable and be represented in a reasonable way.
3. Refine knowledge. New knowledge must be placed in context so that it is actionable. This is where tacit qualities (human insights) must be captured along with explicit facts.
4. Store knowledge. Useful knowledge must then be stored in a reasonable format in a knowledge repository so that others in the organization can access it.
5. Manage knowledge. Like a library, the knowledge must be kept current. It must be reviewed regularly to verify that it is relevant and accurate.
6. Disseminate knowledge. Knowledge must be made available in a useful format to anyone in the organization who needs it, anywhere and anytime.
THIS SET IS OFTEN IN FOLDERS WITH...
CIS 2050 - Appalachian State University - John Kow…
MIS Chapter 09
CIS 2050 Ch 5
CIS 2050- Ch 6
YOU MIGHT ALSO LIKE...
CIS Chapter 3
information systems ch 5 exam 2
Chapter 5: Data & Knowledge Management
OTHER SETS BY THIS CREATOR
EC 301H Exam 3
CIS 2050 Chapter 13: Simon Barrett and Adam Wingard
CIS 2050 Chapter 12: Business Analytics