MIS Test 2 (Ch 9, 10)
Terms in this set (96)
What are business intelligence systems?
information systems that process data to identify patterns, relationships and trends for use by business professionals and other knowledge workers
The patterns, relationships, trends and predictions are known as what in a business intelligence system?
What are the five components of of a business intelligence (BI) system?
What is the software component of a BI system?
What are the 4 common types of source data for a BI system?
1. organization's own operational data
2. social media data
3. data that organizations purchase from data vendors
4. employee knowledge
What are the 4 common ways in which the BI application processes data?
1. reporting applications
2. data mining applications
3. BigData applications
4. Knowledge management applications
Describe the hierarchical nature of deciding, problem solving and project management (what does each require)
Deciding requires informing
Problem solving requires deciding therefore informing
Project Management requires problem solving and therefore deciding and informing
What are the 3 common uses for BI?
1. Identifying changes in purchasing patterns
3. Predictive Policing
Explain 2 uses/examples for a BI system: "identifying changes in purchasing patterns".
Important life events change what customers buy
Amazon's : "Those who bought this, also bought...
Explain 2 uses/examples for a BI system: "entertainment".
Netflix uses BI analysis to determine which shows to make/buy to stream
BI analysis allows Spotify to recommend where bands should perform based on the data about where their customers are listening to music
Explain the use for a BI system: "predictive policing".
analyze data on past crimes such as location, data, time, day of week, type of crime which can help recommend where to station the police
What is Just In Time Medical Reporting?
Software analyzes patient's records, if injections needed, recommends as exam progresses
What are the three primary activities in the BI process?
1. Acquire data
2. Perform Analysis
3. Publish Results
What is data acquisition?
the process of obtaining, cleaning, organizing, relating, and cataloging source data
What is BI analysis?
the process of creating business intelligence
What does it mean to "publish results"?
the process of delivering business intelligence to the knowledge workers who need it
What is push publishing?
delivers business intelligence to users without any request from the users (according to a schedule of some kind)
What is pull publishing?
requires the user to request BI results
Business intelligence is only as intelligent as...?
the people creating it
What formats are results published in the BI process?
over the internet or networks, via web service, PDF or powerpoints
What is a data warehouse?
a large store of data accumulated from a wide range of sources within a company and used to guide management decisions.
What are the 4 functions of a data warehouse?
1. obtain data from operational, internal and external databases
2. cleanse data
3. organize and relate data
4. Catalog data using metadata
What are the two major steps in processing data for a data warehouse?
1. Programs read operational/other data and extract, clean and prepare the data for BI processing
2. Prepped data is stored in a data warehouse database using a data warehouse DBMS
What are the 6 main problems with source/operational/raw data? Describe each.
1. Dirty Data: like B instead of M or F given for gender
2. Missing Values: data parts missing can cause a bias in the analysis
3. Inconsistent Data: a condition in which different versions of the same data yield different inconsistent results
4. Data not Integrated: therefore involves combining data residing in different sources and providing users with a unified view of them
5. Wrong Granularity: too fine/not fine enough
6. Too much Data:
TOO MANY ATTRIBUTES which can cause curse of dimensionality (the more attributes there are, the easier it is to build a model that fits the sample data but that is worthless as a predictor) TOO MANY DATA POINTS can be overwhelming for analysis so drill down using statistical sampling
level of detail represented by the data; can be too fine or too coarse
Can granularity go from coarse to fine or fine to coarse?
fine to coarse as you can sum and combine fine data to make it more coarse
What is a data mart?
a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business
If the data warehouse is the distributor in a supply chain than a data-mart is like...
a retail store in a supply chain
What is a reporting application?
a BI application that inputs data from one or more sources and applies reporting operations to that data to produce business intelligence
What are the 5 basic reporting operations when using a reporting application?
What is used to sort, filter, group, calculate and format data within a reporting application?
SQL and HTML or a simple report writing tool
What are the two important reporting applications?
RFM analysis and OLAP
What is RFM analysis?
a technique readily implemented with basic reporting operations which is used to analyze and rank customers according to their purchasing patterns
What does the R, F, M stand for in RFM analyis?
R= How RECENTLY a customer ordered
F= How FREQUENTLY the customer ordered
M= How much MONEY the customer spent
*a number is assigned from 1 to 5 where 1 means less recent, frequent or less monetary value overall for that customer AND 5 means MOST for each of the categories
What does OLAP stand for? What is it?
Online Analytical Processing: a second type of reporting application that is more general than RFM and provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data
What is the key defining feature of OLAP?
it is dynamic meaning the viewer of the report can change the report format "hence the term online in OLAP"
What is the measure within OLAP?
the data of interest (the item that is to be summed or averaged or otherwise processed in the OLAP report) like average sales or total sales
What is a dimension within OLAP?
a characteristic of a measure like customer type or location
What is an OLAP cube or report?
A report. Despite its name however, it may have any number of axes.
What does it mean to drill down into the data?
to further divide the data into more detail (ie: looking at stores in certain cities in California rather than just broadly in every city in California)
What is data mining?
the application of statistical techniques to find patterns and relationships among data for classification and prediction
What is unsupervised data mining?
analysts do not create a model or hypothesis before running the analysis, but instead make a hypothesis afterwards
What is cluster analysis?
unsupervised data mining that identify groups of entities with similar characteristics; used to find groups of similar customers from customer order and demographic data
What are the three main characteristics of unsupervised data mining?
1. No initial hypothesis/model
2. Findings obtained only by data analysis
3. Hypothesized model created to explain the patterns found
What is supervised data mining?
when data miners develop a model prior to the analysis and apply statistical techniques to estimate parameters of the model
What are the two main characteristics of supervised data mining?
1. Use a planned model to start with
2. Use regression analysis to come to conclusions
What is regression analysis?
a supervised data mining technique that measures the impact of a set of variables on another variable
What are neural networks?
a supervised data mining application used to predict values and make classifications such as "good prospect" or "poor prospect" customers
What is the technical definition for neural networks?
a complicated set of possibly non-linear equations
What is market-basket analysis?
an unsupervised data mining technique for determining sales patterns; shows the products that customers tend to buy together
What are the 4 primary activities involved in market-basket analysis?
1. Identify sales patterns in large volumes of data
2. Identify what products customers tend to buy together
3. Computes probabilities of purchases
4. Identify cross-selling opportunities
What is cross-selling in market basket analysis?
Building on a sale by recommending a product that complements another product the client has already purchased
What is the support measure in market basket analysis?
the probability that two items will be purchased together
What is the confidence measure in market basket analysis?
way of expressing statistical significance (p-value) that shows the size of the effect and the statistical power
What is the lift measure in market basket analysis?
the ratio of confidence to the base probability of buying an item and shows how much the base probability increases or decreases when other products are purchased
What is a decision tree?
an unsupervised data mining technique that uses a hierarchical arrangement of criteria that predicts a classification or a value
What does decision tree allows a business to do?
select attributes most useful for classifying "pure groups" and creating decision rules that can assist with decision making in a hierarchical fashion
Rules that are made such as "if credit score is greater than 572 and if CurrentLTV is less than .94 accept or reject the loan" is an example of utilizing which type of data mining system?
Name three characteristics of BigData.
1. data sets are extremely large (a petabyte or larger)
2. generates rapidly (rapid velocity)
3. has structured data, free-form text, log files, possibly graphics, audio and video (great variety)
What is Two Sigma?
Analyzes financial statements, developing news, Twitter activity, weather reports, other sources AND develops and tests investment strategies
What are the steps in two sigma's five-step process?
1. Acquire data
2. Create models
3. Evaluate models
4. Analyze risks
5. Place trades
What is MapReduce?
a technique for harnessing the power of thousands of computers working in parallel
What is the basic idea behind MapReduce?
"divide and conquer on computers!" BigData collection is broken into pieces and hundreds or thousands of independent processors search these pieces for something of interest KNOWN AS THE MAP PHASE
As the processes finish in Map Reduce what is the phase where results are combined back together?
What is Hadoop?
an open-source software framework written in Java that manages thousands of computers and implements map reduce to process BigData (requires someone with high expertise to use)
What is Pig?
query language for Hadoop which is easy to master, extensible and auto optimizes queries on map reduce level
What is knowledge management?
the process of creating value in a company by utilizing intellectual capital and sharing that knowledge with those who need that capital
What are the two ways in which Knowledge Management (KM) benefits organizations?
1. improve process quality
2. improve team strength
How does Knowledge Management (KM) preserve organizational memory?
by capturing/storing lessons learned and best practices of key employees
The scope of KM corresponds to what online platform in hyper-social organizations?
SM (social media)
What are expert systems?
rule-based systems that encode human knowledge in the form of if/then rules
What are if/then rules?
statements that specify if a particular condition exists, then to take some action
What are the three tasks an expert system shell does?
1. Process IF side of rules
2. Report values of all variables
3. Knowledge is then gathered from human experts
What are expert systems shells?
the programs that process a set of rules
What are the three major disadvantages of expert systems?
1. Difficult and expensive to develop:
Ties up domain experts.
2. Difficult to Maintain
Changes cause unpredictable outcomes.
Constantly need expensive changes.
3. Don't live up to expectations
Can't duplicate diagnostic abilities of humans
What does CMS stand for? What does it support?
Content Management System: information systems that support management and delivery of documents, other expressions of employee knowledge
What are the 5 challenges of content management? Describe each.
1. Huge databases: (with thousands of docs, pages graphics)
2. Dynamic content: (constantly update the content of a web page)
3. Documents refer to one another: (so the CMS must link documents so the content dependencies are known and used to maintain document consistency)
4. Perishable contents: documents become obsolete and need to be removed/replaced/updated
5. In many languages: documents must translate changes into all languages
What are three common alternatives for content management applications? Describe each.
1. In-house custom: customer support develops in-house database applications to track customer problems
2. Off-the-shelf: horizontal market products like SharePoint OR vertical market applications
3. Public Service Engine: some organizations used Google or Bing to manage their content (easy way to find public documents)
What is hyper-social knowledge management?
Social media, and related applications, for management and delivery of organizational knowledge resources (ie: a person who has a blog discussing issues with a product provides knowledge to many people potentially)
What is the hyper-organization theory?
Framework for understanding KM:
shifts from knowledge and content to fostering authentic relationships among knowledge creators and users
What is a rich directory?
an employee directory that includes not only the standard name, email, phone, and address but also organizational structure and expertise (ie: what manager someone is under or what languages employees speak)
Why do employees resist knowledge sharing?
1. Employees reluctant to exhibit their ignorance
How can an organization remedy issues with knowledge sharing?
1. Provide strong management endorsement
2. Provide positive feedback
3. Nothing wrong with praise or cash...especially cash
What are static reports in BI publishing?
BI documents that are fixed at the time of creation and for not change (ie: a PDF of sales analysis)
What are dynamic reports in BI publishing?
BI documents that are updated at the time they are requested (ie: a sales report that is current at the time the user accessed it on a web server)
What are pull options in BI publishing?
requires the user to request BI results
What are push options in BI publishing?
delivers business intelligence to users without any request from the users
What are 4 ways to publish BI results?
1. Print and distribute via email or collaboration tool
2. Publish on Web server or SharePoint
3. Publish on a BI server
4. Automate results via Web service
What is a BI server?
a Web server application that is purpose-built for the publishing of business intelligence
What are user subscriptions? How does a BI server support user subscriptions ?
user subscriptions are user requests for particular BI results on a particular schedule or in response to particular events (ie: a user can subscribe daily to a sales refort)
it extends alert/RSS functionality to support the subscriptions
What are the two functions of a BI Server?
1. Management: maintains metadata about the authorized allocation of BI results to users. The BI server tracks what results are available, what users are authorized to view those results
2. Delivery: then schedule upon which the results are provided to the authorized users. It adjusts allocations as available results change and users come and go
Does publishing static or dynamic content require more expertise?
What is function does metadata have in regards to BI servers?
BI servers use metadata to determine which results to send to which users and possibly on which schedule
BI results can be transferred to what kind of devices?
What is enabling the growth of BI systems?
free data storage and CPU processors are becoming nearly so
What is a singularity in terms of BI systems (possibly may happen in the future)?
computer systems adapt and create their own software without human assistance and machines will possess and create information for themselves