137 terms

Data Warehouse

Final +Midtern
Dimensional Model
1.A _________________ is made up of a central fact table and its associated dimensions.
users get the data they need to meet ongoing business requirements
1.The main objective of the data track is to make sure ________________________
Business Requirements
1.______________are the bedrock of the successful data warehouse/business Intelligence (DW/BI) systems.
Business Requirements
1. ______________ guide the development team in making the biggest strategic choices such as prioritizing subject areas for implementation.
Measurement event
A record in a fact table is a_____________________
__________________are the nouns of the dimensional model, describing the objects that participate in the business, such as employee, subscriber, publication, customer, physician, vehicle, product, service, author and store.
Partitioned table page
A _______________is essentially a large table split out into smaller tables under the covers.
Conformed dimension
A dimension that is shared across multiple processes is called a _____________________
Relational data warehouse database
The primary place to store and manage the dimensional model is in the _________________
The level of detail contained in the fact table is called the_________________
The Kimball Group
the definitive source for dimensional data warehousing expertise.organized in 2003 to respond to the increased demand for quality dimensional data warehouse consulting and education
Data Warehouse
relational database that is designed for query and analysis rather than for transactions
Data Warehousing and business intelligence
are techniques to provide business people with the information and tools they need to make operational and strategic business decisions
Focus on the business
Build an Information Infrastructure
Deliver in meaningful increments
Deliver the entire solution:
Kimball Lifecycle - The four primary principles are
The Kimball Lifecycle
a collection of steps, tasks, and dependencies based on the collective experience of the Kimball group
Business intelligence (BI)applications
Predefined applications that query, analyze, and present information to support a business need.
Data mining model
A statistical model often used to predict future behavior based on data about past behavior of identify closely related subsets of a population called clusters
ad hoc
are formulated by the user on the spur of the moment
Developing solid business sponsorship
the best place to
start the DW/BI project.
Dimensional Model
is the heart of the DW/BI system.
It is the target for the ETL system, the structure of the database, and the model behind the user query and reporting experience.
1Present the needed information to users as simply as possible 2.Return query results to the users as quickly as possible 3Provide relevant information that accurately tracks the underlying business processes
The three primary design goals of Dimensional Modeling are.
A Fact table
contains the measurements associated with a specific business process, like taking an order, displaying a web page, admitting a patient, or handling a customer support request.
conformed dimensions
When two business processes use exactly the same product dimensions with the same Keys,
surrogate key
a unique value, usually an integer, assigned to each row in the dimension.
Slowly Changing Dimensions
is the technique use to manage attribute changes in a dimension over time.
Type One is the technique used to deal with changes the business doesn't care about
Type Two the type two change tracking technique is a powerful method for capturing the attribute values that were in effect at a point in time and relating them to the business events
SCD's have two types:
degenerate dimension
A dimension without attributes is called
is the practice of connecting lookup tables to fields in the dimension tables.
Step 1: Identify the business process. Step 2: Declare the Grain. Step 3: Choose the dimensions. Step 4: Choose the Facts
Creating the high level dimensional model: The Four-Step Modeling Process.
Lower cost of ownership
High Performance and scale
Microsoft Investment in business intelligence
Advantages to using the Microsoft Toolset
Manageability: As discussed in Chapter 17, it's much easier to handle changes in structure and content in the relational database than in Analysis Services.
Why do you need to store the dimensional data in the relational database?
the main datafile with a suffix of .mdf and the
transaction log file with a suffix of .ldf.
SQL Server creates two default files when you create a
The first is
a true dollar cost: Data compression is an Enterprise
Edition feature.
The second is a CPU cost.
There are two major costs to data compression
entity integrity
All tables have a primary key, which is that column or set of columns that
will identify a single row when constrained to a single
referential integrity
Foreign keys define the relationships between the fact anddimension tables. It says every value for a given foreign key found in the fact table is guaranteed to have an entry inthe associated dimension table
Indexes Aggregates
two major performance tools available in the data warehouse database platforms
date key
For fact tables, the standard starting point is to create a single-column clustered index on the
aggregate table
table summarizes data at a higher level than the atomic data maintained in the detailed fact table...tables are the single most useful way to improve query performance on a dimensional DW/BI system.
Partitioned Tables
essentially a large table split out into
smaller tables under the covers. Each of these smaller
tables, can be accessed, indexed, and
managed independently.
Master data
is reference data that is managed centrally for an
organization. It
describes the business entities that participate in the transaction systems.
is before the data warehouse, in a separate though related management system.
The best place for managing your organization's master reference data
Data Governance
is the process and rules for managing data your organization
Required fields,
-Standard values,
-Data retention policies
Data Governance rules should include the following:
OLAP stands for On Line Analytic Processing, OLAP is an analytic database and not a OLTP Online transaction processing database
What is OLAP
OLAP cube
is a set of data, organized in a way that facilitates non-predetermined queries for aggregated information.
Focus on the business:
Build an Information Infrastructure:
Deliver in meaningful increments
Deliver the entire solution:
The Kimball Lifecycle is a collection of steps, tasks, and dependencies based on the collective experience of the Kimball group.
The four primary principles are:
Data warehouse
The platform for business intelligence. In the Kimball method it includes everything from the original data extracts to the software and applications that users see.
Business process dimensional model
A specific discipline for modeling data that is an alternative to normalized modeling
Normalized models
the Third Normal Form models, were designed to support the high-volume single-row inserts and updates that define transaction systems
Online analytic processing database (OLAP):
A technology for storing, managing, and querying data specifically designed to support business intelligence users.
Extract, Transformation, and Load system (ETL):
a set of processes that clean, transform, combine, de-duplicate, household, archive, conform and structure data for use in the data warehouse.
primary key
a multi-part key made up of a subset of the foreign keys from each dimension table involved in the business event.
Atomic level
Fact tables should be built with the lowest level of detail (Grain)as possible from the original source- generally this is known as
Atomic fact
provide complete flexibility to roll up the data at any level of summary needed across any dimension
Conformed Dimension
is a single dimension that is shared across multiple processes
The Date
fundamental business dimension across all organizations and industries
is Microsoft 's offering in the portal and web-based application platform category
The __________portal is the navigation framework for the organizations structure.
_______________are a special type of list that store files as well as information about files (essentially directories).
A SharePoint _____________is the collection of all servers running in a SharePoint deployment and the services running on those servers.
Data Mining
We generally describe ____________as a process of data exploration with the intent to find patterns or relationships that can be made useful to the organization.
Knowledge Discovery and Data Mining
The field of data mining is known more broadly as______________________.
The programmatic technique used to identify the relationships or patterns in the data is called an.
The definition of the relationship identified by the algorithm, which generally takes the form of a set of rules, a decision tree, a set of equations, or a set of associations is called a________
The collection of attributes and relationships (variables) that are associated with an individual instance of the entity being modeled, usually a customer is called a.
Page or section break field
What is the function of this mock-up user interaction symbol ()
In Excel 2010 what is the tool that is used for analysis?
The BI applications provided data warehouse access for a broad and important user community.
Response Feedback:
What is the definition of Broad access BI Applications?
What is the function of this mock-up user interaction symbol \\ \\
Application- entered variable
What is the function of this mock-up user interaction symbol {}
BI applications built in the development phase of the lifecycle.
What is the definition of Early impact BI Applications?
The report mock-ups are a great way to communicate the content and ________ of the reports.
User-entered variable
What is the function of this mock-up user interaction symbol < >
Report template comments
What is the function of this mock-up user interaction symbol [ ]
Drillable field
What is the function of this mock-up user interaction symbol <<>>
Business Value
The process of identifying and creating BI application base on business requirements almost guarantees that you will provide something of significant value to the business.
All of the above
Business Requirements whose Functional Implication is to Create Reports have the following in common
_______________means that the data visible in a report or analysis is updated many times a day but is not guaranteed to be the absolute current truth.
Report Builder 3.0 is a ______________________report authoring tool.
Report description
Every report should have a __________?
Navigation framework
Business Requirements whose Functional Implication is to create Final reports have the following in commonOnce you know which reports to build, you need to categorize them so the users can find the information they're looking for as quickly as possible.
Report Name
The_________________ communicates the contents of the report to the viewer.
Every report should have a comment or ____________________
Reporting Services
The most common way to use SQL Server technology to access real-time data is to use?
Prototype of the report
Mock up reports represent a _____________
BI portal
is the navigation framework for the organizations structure.
IT is used to collect and categorize all structured and unstructured information throughout the organization and make it available through a rich user interface with intelligent search capabilities and the ability to personalize the experience.
BI portal
the central place where people can find the analytic information they need.
Usable Content-rich Clean Current
The BI portal plays a significant role in the change management process. The BI portal must provide the following features:
Interactive and Value-oriented
Role of the BI portal
Density: density translates to the BI portal in a couple of ways. Primarily, it means we flatten the information hierarchy. Structure: The BI portal should be structure in a way that helps you find the information you need. The BI
The process of creating the BI portal requires the careful combination of two basic design principles:
provide access to standard reports
The main purpose of the BI portal is to....
Web application
a top-level domain URL is known as a web application, and it corresponds to an Internet Information Services (IIS) website. A web application contains one or more site collections, which is a set of one or more websites that have the same owner and administration settings.
is a coherent set of pages with a home page, libraries, lists, a common layout template and theme, all based off the same root URL.
Alternate access mapping
When one sharePoint server has multiple application mapped to different ports on the same server.
Web Parts
reusable code modules that allow you to display various types of content on your pages
are ways to view the contents of a library or list, sort
of like a report definition
are collections of items of a single type.... tasks, calendars, announcements, links, and contacts.
Data Mining
process of data
exploration with the intent to find patterns or relationships
that can be made useful to the organization.
Knowledge Discovery and Data Mining (KDD
The field of data mining is known more broadly as
first is usually a one-time project to help you gain an understanding of who your customers are and how they behave. We call this exploratory or undirected data mining, where the goal is to find something interesting.
The second is most often a project created to work on a specific problem or opportunity. We call this more focused activity directed data mining.
two common approaches to data mining
Case set
A group of cases that share the same attributes
Dependent variable(s) (or predicted attribute or predict column
The variable the algorithm will build a model to predict or classify.
Independent variable(s) (or predictive attribute or input column
The variables which provide the descriptive or behavior information used to build the model.
Discrete or continuous variables:
Numeric columns that contain continuous or discrete values.
A statistical technique that creates a best-fit formula based on a data set. The formula can be used to predict values based on new input variables.
A measure of how well the regression formula fits the actual values in the data set from which it was created
mining structure
A Microsoft data mining term used as a name for the definition of a case set in Analysis Services. is essentially a metadata layer on top of a Data Source View that includes additional data mining-related flags and column properties, such as the field that identifies a column as input, predict, both, or ignore
Mining model
The specific application of an algorithm to a particular mining structure
is the task of assigning each item in a set to one of a predetermined set of discrete choices based on its attributes or behaviors.
Anomaly Detection
the identification of cases that deviate from the norm in a significant way.
The process of building a decision tree starts with the dependent variable to be predicted and runs through the independent variables to see which one most effectively divides the population. The goal is to identify the variable
that splits the cases into groups where the predicted variable (or class) either predominates or is faintly represented.
The process of building a decision tree starts .....
Neural Network
mimic our understanding of the
way neurons work in the brain.
The number of times a combination occurs is
called its
is to create the mining model structure in BIDS
The first step in developing the data mining model is
The first involves comparing models created with different
algorithms, parameters, and inputs to see which is most effective at predicting the target variable. The second is a business review of the proposed model to examine its contents and assess its value
There are two kinds of model validation in data mining
a lift chart, a classification matrix, and cross validation.
Analysis Services Data Mining provides three tools for
comparing the effectiveness of certain types of data mining models
the selection of the
"best" model based on its performance in the model comparison process and its implementation cost.
The data mining phase ends with....
Naïve Bayes
It is a simplified version of Decision Trees, and can be used for
classification and prediction for discrete attributes. this algorithm is fairly simple, based on the relative probabilities of the different values of each attribute, given the value of the predictable attribute.
Real Time
means delivering data to the business users throughout the day.
Daily/ Frequent /Very low latency Instantaneous
Data can be delivered in the following ways for real time systems.
Global company process data in a handful of regions, after local midnight. This isn't a true real-time scenario because each region's data is processed daily. This is the norm for business today.
means that the data visible in a report or analysis
is updated many times a day but is not guaranteed to be the
absolute current truth.
Very low latency
means the data must be delivered to the
business users faster than we can load it into the DW/BI
system, but not truly instantaneously.
means that the data visible on the screen
represents the true state of the transaction system(s) at
every instant.
Reports are executed on a schedule and a copy of the report is saved to the system for a period of time for users to use.
Snapshot report
saves the report's body, including the dataset, in the report catalog database. you can schedule the execution of the report to run on a schedule and store its results.
is a feature of SQL Server that's used primarily to increase the availability of a transaction database.
BizTalk Application
An enterprise application developed to handle real-time data needs.
a separate physical table, usually in a different database that uses transactional database options
Dashboards and scorecards
These applications generally
involve a combination of multiple reports and charts in a
seamless interface that use exception highlighting and
drill-down capabilities to analyze data from multiple
Operational BI and closed loop applications
These include
the use of applications that are more sophisticated than
typical operational reports. These applications leverage the rich historical context across multiple business processes available in the data warehouse to guide operational decision making.
The BI portal
is the business's primary interface
to the BI applications. It provides an organizing frameworkto help people find the information they need
Determining the initial report set
Creating a standard look-and-feel template
Creating a mock-up and documentation for each target report
Designing the navigation framework
Conducting the user review
Report specification typically includes the following tasks
Database prototyping
Data profiling
Report/Dashboard prototyping and delivery
Business user analytics
PowerPivot adds value to the BW/BI system in the following ways
Work from a dimensional model
Create incremental calculations
Check your work
Create a robust date dimension
Use naming conventions
Guidelines for working with PowerPivot for Excel
1. Develop your plan.
2. Set up the design and development environment.
3. Create a Data Source View.
4. Create and fine tune your dimensions.
5. Run the Cube Wizard and edit the resulting cube.
6. Create calculations and other decorations.
7. Iterate, iterate, iterate.
Designing the OLAP Structure
Incomplete attributes
Data Integration
Systems integration
Common master data management problems include