1.A _________________ is made up of a central fact table and its associated dimensions.
users get the data they need to meet ongoing business requirements
1.The main objective of the data track is to make sure ________________________
1.______________are the bedrock of the successful data warehouse/business Intelligence (DW/BI) systems.
1. ______________ guide the development team in making the biggest strategic choices such as prioritizing subject areas for implementation.
__________________are the nouns of the dimensional model, describing the objects that participate in the business, such as employee, subscriber, publication, customer, physician, vehicle, product, service, author and store.
Partitioned table page
A _______________is essentially a large table split out into smaller tables under the covers.
A dimension that is shared across multiple processes is called a _____________________
Relational data warehouse database
The primary place to store and manage the dimensional model is in the _________________
The Kimball Group
the definitive source for dimensional data warehousing expertise.organized in 2003 to respond to the increased demand for quality dimensional data warehouse consulting and education
relational database that is designed for query and analysis rather than for transactions
Data Warehousing and business intelligence
are techniques to provide business people with the information and tools they need to make operational and strategic business decisions
Focus on the business
Build an Information Infrastructure
Deliver in meaningful increments
Deliver the entire solution:
Kimball Lifecycle - The four primary principles are
The Kimball Lifecycle
a collection of steps, tasks, and dependencies based on the collective experience of the Kimball group
Business intelligence (BI)applications
Predefined applications that query, analyze, and present information to support a business need.
Data mining model
A statistical model often used to predict future behavior based on data about past behavior of identify closely related subsets of a population called clusters
is the heart of the DW/BI system.
It is the target for the ETL system, the structure of the database, and the model behind the user query and reporting experience.
1Present the needed information to users as simply as possible 2.Return query results to the users as quickly as possible 3Provide relevant information that accurately tracks the underlying business processes
The three primary design goals of Dimensional Modeling are.
A Fact table
contains the measurements associated with a specific business process, like taking an order, displaying a web page, admitting a patient, or handling a customer support request.
When two business processes use exactly the same product dimensions with the same Keys,
Slowly Changing Dimensions
is the technique use to manage attribute changes in a dimension over time.
Type One is the technique used to deal with changes the business doesn't care about
Type Two the type two change tracking technique is a powerful method for capturing the attribute values that were in effect at a point in time and relating them to the business events
SCD's have two types:
Step 1: Identify the business process. Step 2: Declare the Grain. Step 3: Choose the dimensions. Step 4: Choose the Facts
Creating the high level dimensional model: The Four-Step Modeling Process.
Lower cost of ownership
High Performance and scale
Microsoft Investment in business intelligence
Advantages to using the Microsoft Toolset
Manageability: As discussed in Chapter 17, it's much easier to handle changes in structure and content in the relational database than in Analysis Services.
Why do you need to store the dimensional data in the relational database?
the main datafile with a suffix of .mdf and the
transaction log file with a suffix of .ldf.
SQL Server creates two default files when you create a
The first is
a true dollar cost: Data compression is an Enterprise
The second is a CPU cost.
There are two major costs to data compression
All tables have a primary key, which is that column or set of columns that
will identify a single row when constrained to a single
Foreign keys define the relationships between the fact anddimension tables. It says every value for a given foreign key found in the fact table is guaranteed to have an entry inthe associated dimension table
For fact tables, the standard starting point is to create a single-column clustered index on the
table summarizes data at a higher level than the atomic data maintained in the detailed fact table...tables are the single most useful way to improve query performance on a dimensional DW/BI system.
essentially a large table split out into
smaller tables under the covers. Each of these smaller
tables, can be accessed, indexed, and
is reference data that is managed centrally for an
describes the business entities that participate in the transaction systems.
is before the data warehouse, in a separate though related management system.
The best place for managing your organization's master reference data
-Data retention policies
Data Governance rules should include the following:
OLAP stands for On Line Analytic Processing, OLAP is an analytic database and not a OLTP Online transaction processing database
What is OLAP
is a set of data, organized in a way that facilitates non-predetermined queries for aggregated information.
Focus on the business:
Build an Information Infrastructure:
Deliver in meaningful increments
Deliver the entire solution:
The Kimball Lifecycle is a collection of steps, tasks, and dependencies based on the collective experience of the Kimball group.
The four primary principles are:
The platform for business intelligence. In the Kimball method it includes everything from the original data extracts to the software and applications that users see.
Business process dimensional model
A specific discipline for modeling data that is an alternative to normalized modeling
the Third Normal Form models, were designed to support the high-volume single-row inserts and updates that define transaction systems
Online analytic processing database (OLAP):
A technology for storing, managing, and querying data specifically designed to support business intelligence users.
Extract, Transformation, and Load system (ETL):
a set of processes that clean, transform, combine, de-duplicate, household, archive, conform and structure data for use in the data warehouse.
a multi-part key made up of a subset of the foreign keys from each dimension table involved in the business event.
Fact tables should be built with the lowest level of detail (Grain)as possible from the original source- generally this is known as
provide complete flexibility to roll up the data at any level of summary needed across any dimension
_______________are a special type of list that store files as well as information about files (essentially directories).
A SharePoint _____________is the collection of all servers running in a SharePoint deployment and the services running on those servers.
We generally describe ____________as a process of data exploration with the intent to find patterns or relationships that can be made useful to the organization.
Knowledge Discovery and Data Mining
The field of data mining is known more broadly as______________________.
The programmatic technique used to identify the relationships or patterns in the data is called an.
The definition of the relationship identified by the algorithm, which generally takes the form of a set of rules, a decision tree, a set of equations, or a set of associations is called a________
The collection of attributes and relationships (variables) that are associated with an individual instance of the entity being modeled, usually a customer is called a.
The BI applications provided data warehouse access for a broad and important user community.
What is the definition of Broad access BI Applications?
BI applications built in the development phase of the lifecycle.
What is the definition of Early impact BI Applications?
The process of identifying and creating BI application base on business requirements almost guarantees that you will provide something of significant value to the business.
All of the above
Business Requirements whose Functional Implication is to Create Reports have the following in common
_______________means that the data visible in a report or analysis is updated many times a day but is not guaranteed to be the absolute current truth.
Business Requirements whose Functional Implication is to create Final reports have the following in commonOnce you know which reports to build, you need to categorize them so the users can find the information they're looking for as quickly as possible.
The most common way to use SQL Server technology to access real-time data is to use?
is the navigation framework for the organizations structure.
IT is used to collect and categorize all structured and unstructured information throughout the organization and make it available through a rich user interface with intelligent search capabilities and the ability to personalize the experience.
Usable Content-rich Clean Current
The BI portal plays a significant role in the change management process. The BI portal must provide the following features:
Density: density translates to the BI portal in a couple of ways. Primarily, it means we flatten the information hierarchy. Structure: The BI portal should be structure in a way that helps you find the information you need. The BI
The process of creating the BI portal requires the careful combination of two basic design principles:
a top-level domain URL is known as a web application, and it corresponds to an Internet Information Services (IIS) website. A web application contains one or more site collections, which is a set of one or more websites that have the same owner and administration settings.
is a coherent set of pages with a home page, libraries, lists, a common layout template and theme, all based off the same root URL.
Alternate access mapping
When one sharePoint server has multiple application mapped to different ports on the same server.
are collections of items of a single type.... tasks, calendars, announcements, links, and contacts.
process of data
exploration with the intent to find patterns or relationships
that can be made useful to the organization.
first is usually a one-time project to help you gain an understanding of who your customers are and how they behave. We call this exploratory or undirected data mining, where the goal is to find something interesting.
The second is most often a project created to work on a specific problem or opportunity. We call this more focused activity directed data mining.
two common approaches to data mining
Dependent variable(s) (or predicted attribute or predict column
The variable the algorithm will build a model to predict or classify.
Independent variable(s) (or predictive attribute or input column
The variables which provide the descriptive or behavior information used to build the model.
A statistical technique that creates a best-fit formula based on a data set. The formula can be used to predict values based on new input variables.
A measure of how well the regression formula fits the actual values in the data set from which it was created
A Microsoft data mining term used as a name for the definition of a case set in Analysis Services. is essentially a metadata layer on top of a Data Source View that includes additional data mining-related flags and column properties, such as the field that identifies a column as input, predict, both, or ignore
is the task of assigning each item in a set to one of a predetermined set of discrete choices based on its attributes or behaviors.
The process of building a decision tree starts with the dependent variable to be predicted and runs through the independent variables to see which one most effectively divides the population. The goal is to identify the variable
that splits the cases into groups where the predicted variable (or class) either predominates or is faintly represented.
The process of building a decision tree starts .....
is to create the mining model structure in BIDS
The first step in developing the data mining model is
The first involves comparing models created with different
algorithms, parameters, and inputs to see which is most effective at predicting the target variable. The second is a business review of the proposed model to examine its contents and assess its value
There are two kinds of model validation in data mining
a lift chart, a classification matrix, and cross validation.
Analysis Services Data Mining provides three tools for
comparing the effectiveness of certain types of data mining models
the selection of the
"best" model based on its performance in the model comparison process and its implementation cost.
The data mining phase ends with....
It is a simplified version of Decision Trees, and can be used for
classification and prediction for discrete attributes. this algorithm is fairly simple, based on the relative probabilities of the different values of each attribute, given the value of the predictable attribute.
Daily/ Frequent /Very low latency Instantaneous
Data can be delivered in the following ways for real time systems.
Global company process data in a handful of regions, after local midnight. This isn't a true real-time scenario because each region's data is processed daily. This is the norm for business today.
means that the data visible in a report or analysis
is updated many times a day but is not guaranteed to be the
absolute current truth.
Very low latency
means the data must be delivered to the
business users faster than we can load it into the DW/BI
system, but not truly instantaneously.
means that the data visible on the screen
represents the true state of the transaction system(s) at
Reports are executed on a schedule and a copy of the report is saved to the system for a period of time for users to use.
saves the report's body, including the dataset, in the report catalog database. you can schedule the execution of the report to run on a schedule and store its results.
is a feature of SQL Server that's used primarily to increase the availability of a transaction database.
a separate physical table, usually in a different database that uses transactional database options
Dashboards and scorecards
These applications generally
involve a combination of multiple reports and charts in a
seamless interface that use exception highlighting and
drill-down capabilities to analyze data from multiple
Operational BI and closed loop applications
the use of applications that are more sophisticated than
typical operational reports. These applications leverage the rich historical context across multiple business processes available in the data warehouse to guide operational decision making.
The BI portal
is the business's primary interface
to the BI applications. It provides an organizing frameworkto help people find the information they need
Determining the initial report set
Creating a standard look-and-feel template
Creating a mock-up and documentation for each target report
Designing the navigation framework
Conducting the user review
Report specification typically includes the following tasks
Report/Dashboard prototyping and delivery
Business user analytics
PowerPivot adds value to the BW/BI system in the following ways
Work from a dimensional model
Create incremental calculations
Check your work
Create a robust date dimension
Use naming conventions
Guidelines for working with PowerPivot for Excel
1. Develop your plan.
2. Set up the design and development environment.
3. Create a Data Source View.
4. Create and fine tune your dimensions.
5. Run the Cube Wizard and edit the resulting cube.
6. Create calculations and other decorations.
7. Iterate, iterate, iterate.
Designing the OLAP Structure