86 terms

Chapter 3 Data, Text, and Document Management

What is data infrastructure?
the fundamental structure of an IS which determines how it functions and how flexbile it is in meeting future data requirements
What is an asset?
resources with recognized value that are under the control off an individual or organization
What is data management?
provide the infrastructure and tools to transform raw data into usable corporate information of the highest quality
Why invest in data management?
opportunity to earn revenues, ability to cut expenses
What is a data warehouse?
specialized type of database that aggregates data from transaction databases so it can be analyzed
What is dirty data?
poor quality data lacks integrity and cannot be trusted.
What is data management?
a structured approach for capturing, storing, processing, integrating, distributing, securing, and archiving data effectively throughout its life cycle.
List the three principles of the importance of data life cycle
principle of diminishing data value focuses on how the value of data diminishes as the data ages. Principle of 90/90 data use, being able to act on real-time or near real-time operational data, 90 percent of data is not accessed after 90 days. principle of data in context, the ability to capture , process, format, and distribute data in near real time or faster requires a huge investment in data management.
What are some challenges of data management?
widespread problem is that people do not get data in the format they need to do their jobs even if it is accurate, timely and clean still might not be usable.
What is enterprise portals?
set of software applications that consolidate, manage, analyze, and transmit data to users through Web-based interface
What are some of the challenges of managing data in the enterprise?
volume of data increases exponentially with time. external data that needs to be considered in making decisions is constantly increasing in volume. Data is scattered throughout organizations and is collected and created by many individuals using different methods, devices and channels. Data security, quality and integrity are critical yet easily jeopardized. Data is being created and used offline without going through quality control checks; hence the validity of the data is questionable. Data throughout an organization may be redundant and out of date creating huge maintenance problems for managers.
What are client/server networks?
PC called clients linked to high-performance computers called servers
What is MDM(Master data management)?
A process whereby companies integrate data from various sources or enterprise applications to provide a more unified view of the data. It is a master reference file which then feeds data back to the applications
What is a data entity?
anything real or abstract about which a company wants to collect and store data. ie customer, a vendor, a product and an employee
What are master data entities?
The main entities of a company such as customers, products, suppliers, employees, and asset.
List three benefits of unified view of customers.
Better more accurate data, Better responsiveness to ensure that all employees who deal with customers have up to date reliable information on customers, Better revenue management and more responsive business decisions
What type of tools does MDM consist of?
tools for cleaning and auditing the master data elements as well as tools for integrating and synchronizing data to make the data more accessible.
What is a data mart?
a small data warehouse designed for a strategic business unit(SBU) or a single department.
What is an ETL?
extract, transform, and load processes move data from multiple sources, reformat and cleanse them and load them into another data warehouse or data mart for analysis or onto another operational system to support a business process
What are data mining tools?
software allows users to analyze data from various dimensions or angles, categorize it, and find correlations or paterns among fields in the data warehouse.
What is data quality?
a measure of the data's usefulness as well as the quality of the decisions based on the data.
What are the five dimensions of of data quality?
accuracy, accessibility, relevance, timeliness and completeness
List some common data problems
data errors, duplicated data, compromised data, missing data
What causes data ownership issues?
lack of policies defining responsibility and accountability in managing data
Name two types of data mining
subject-based that retrieve data to follow a lead and pattern-based that look for suspicious behaviors.
What is text mining?
interpreting words and concept in context
What percentage of an organization's data is freeform or unstructured?
What two business challenges does text mining address?
information organization and the findability of the content in documents. The discovery of trends and patterns to allow foresight from textual information.
What is the process of analyzing text?
exploration, preprocessing, and categorizing and modeling
What is document management?
the automated control of imaged and electronic documents, page images, spreadsheets, voice and email messages and word documents
What is DMS?
document management system that consist of hardware and software that manage and archive electronic documents and also convert paper documents into e-documents and then index and store them
What are some of the benefits of a DMS?
Enabling the company to access and use the content contained in the documents, cutting labor cost y automating business processes, reducing the time and effort required to locate information the business needs to support decision making, improving the security of the content, minimizing the cost associated with printing
What is a VPN?
allows a virtual private network, it allows a worker to connect to a company's network remotely through the internet and is less expensive than having workers connect using a modem or dedicated line
What is a bit?
The smallest unit of data a computer can process which is either a 0 or a 1
What is a byte?
A group of eight bits which represents a single character such as a letter , a number or a symbol
What is a field?
Characters that are combined to form a word, group of words or a complete number which all entries are related in some way. ie customer name
What is a record?
Related fields make up a record
What is a file?
Related records make up a file
What is a database?
a logical grouping of files constitutes a database
What is an attribute?
a characteristic that describes an entity and it corresponds to a field on a record
What is a primary key?
a unique field that identifies the record so that it can be retrieved, updated and sorted.
What are secondary keys?
non-unique fields that have some identifying information
What are foreign keys?
keys whose purpose is to link two or more tables together
What is sequential file organization?
way files are organized on a tape, records must be retrieved in the same way they are stored
What is direct file organization or random file organization?
records can be accessed directly regardless of their location on the storage medium; ie magnetic disks use this method
What is ISAM?
indexed sequential access method uses an index of key fields to locate individual records
What is an index?
key field of each record and where that record is physically located on the storage media. Records are stored on disk in key sequences.
What are limitations of the data file approach?
Data redundancy because different programmers create different data manipulating applications, data inconsistency because the actual data values are not synchronized across various copies of the data, data isolation because file organization creates silos of data that make it extremely difficult to access data from different apps, data security is difficult in the file environment because new applications are added to system on an adhoc-basis.
What led to the data management systems?
The data management problems from the file environment approach
What is the optimal way to store and access organizational data?
By using a database, it can provide access to all of the data, alleviating many of the problems associated with data file environments
What are the two types of databases?
Centralized and distributed
What is a centralized database?
stores all related files in one physical location usually on large mainframe computers
List advantages of a centralized database
more consistent with one another when files are physically kept in one location, files changes can be made in a supervised and orderly fashion, files are not accessible except via the centralized host computer, where they can be protected more easily from unauthorized access or modification
What is a disadvantage of the centralized database method?
single point of failure, users are widely dispersed and must perform data manipulations from great distances, they may experience transmission delays
What is distributed database?
either complete copies of a database or portions of a database
What are the two types of distributed databases?
replicated and partitioned
What is a replicated database?
Replicated database stores complete copies of the entire database in multiple locations used for a backup or failure for centralized database.
What are the advantages of a replica database?
improves the response time because it is local close to users
What are the disadvantages of a replica database?
more expensive to set up and maintain because each replica must be updated as records are changed within the database
What is a partitioned database?
divided up so that each location has a portion of the entire database usually the portion that meets the users needs
What are some advantages of partitioned database?
respond speed of localized files without the need to replicate all changes in multiple locations, files can be entered more quickly and kept more accurate by the users immediately responsible for the data
What is a DBMS?
Database Management System program to provide access to databases and permits an organization to centralize data, mange it efficiently and provide access to the stored data. It is an interface between the application and the physical data. It allows multiple users to share data
List the major functions of a DBMS?
Data filtering and profiling, data quality, data synchronization, data enrichment, and data maintenance.
List some advantages of a DBMS
Permanence, querying, concurrency, backup and replication, rule enforcement, security, computation, change and access logging, and automated optimization.
How does DBMS support different requirements for multiple users?
DBMS provides two views of the data, a physical view and a logical view
What is a physical view?
deals with the actual, physical arrangement and location of data in the direct access storage devices, which database resources use to configure storage and processing resources.
What is a logical view?
a users view of data is meaningful to the user. allows users to see data from a business-related perspective rather than from a technical viewpoint.
What is the difference between databases and data warehouses?
databases are designed and optimized to store data, whereas data warehouses are designed and optimized to respond to analysis questions that are critical for the business
What is OLTP?
online transaction processing systems in which every transaction has to be recorded quickly
What is OLAP?
online analytical process systems meaning that the data can be queried and analyzed much more efficiently than OLTP application databases.
What is analytical processing?
also referred to as business intelligence, includes data mining, decision support systems, enterprise systems, web apps, querying, and other end-user activities
What are some areas that benefits from data warehouses?
marketing and sales, pricing and contracts, forecasting, sales performance and financial
What are the nine characteristics of a data warehouse?
organization by subject, consistency, time variant data is kept for many years, nonvolatile once the data is entered into the warehouse, they are not updated until next scheduled extraction, relational, client/server, web-based, integration, real-time
Why would you create a data warehouse as a separate data store?
performance of a separate data store is because it is not competing or waiting for processing time, modeling a database that can be used for both operational and analytical purposes can be difficult
What is an economical and effective method of delivering data?
providing data warehouse content via an intranet
What are some characteristics that would make data warehouses suitable for an organization?
end users need to access large amounts of data, operational data is stored in different systems, organization employs an information-based approach to management, organization serves a large diverse customer base, same data is represented differently in different systems, data is stored in highly technical formats that are difficult to decipher, and extensive end- user computing is performed
What is a data mart?
A scaled-down version of a data warehouse designed for a strategic business unit(SBU) or a single department. Provides a lower-cost alternative to a data warehouse
What are some of the advantages of data marts?
shorter implementation time, allow for local rather than central control, contain less information than data warehouse, respond more quickly, easier to understand and navigate, allow a business to build its own decision support system
What is an operational data store?
a database for transaction processing systems that use data warehouse concepts to provide clean data. situated between the operational data in legacy systems and the data warehouse. used for short-term decision making involving critical mission apps
What are some reasons for data warehouse failures related to design?
unrealistic expectations, inappropriate architecture, vendors overselling capabilities, lack of development expertise, lack of effect project sponsorship
What are some reasons for data warehouse failures related to implementation?
Poor user training, failure to align data warehouses and data marts, lack of attentions to cultural issues, corporate policies not updated
What are some reasons for data warehouse failures related to operation?
poor upkeep of technology, failure to upgrade modules , lack of integration, and poor data quality
What is a data center?
name given to facilities containing mission-critical iss and components that deliver data and IT services to enterprises
What is ECM?
Enterprise content management used in large and medium-sized organizations which includes electronic document management, web-content management, digital asset management and electronic records management.
What is the value of e-record management?
to help in preparation to respond to an audit, federal investigations. lawsuit, or any other legal actions.
What is discovery?
The process of gathering information in preparation for trial, legal or regulatory investigation or administrative action as required by law.