Data Warehouse

137 terms by JAS1 

Create a new folder

Advertisement Upgrade to remove ads

Final +Midtern

Dimensional Model

1.A _________________ is made up of a central fact table and its associated dimensions.

users get the data they need to meet ongoing business requirements

1.The main objective of the data track is to make sure ________________________

Business Requirements

1.______________are the bedrock of the successful data warehouse/business Intelligence (DW/BI) systems.

Business Requirements

1. ______________ guide the development team in making the biggest strategic choices such as prioritizing subject areas for implementation.

Measurement event

A record in a fact table is a_____________________

Dimensions

__________________are the nouns of the dimensional model, describing the objects that participate in the business, such as employee, subscriber, publication, customer, physician, vehicle, product, service, author and store.

Partitioned table page

A _______________is essentially a large table split out into smaller tables under the covers.

Conformed dimension

A dimension that is shared across multiple processes is called a _____________________

Relational data warehouse database

The primary place to store and manage the dimensional model is in the _________________

Grain

The level of detail contained in the fact table is called the_________________

The Kimball Group

the definitive source for dimensional data warehousing expertise.organized in 2003 to respond to the increased demand for quality dimensional data warehouse consulting and education

Data Warehouse

relational database that is designed for query and analysis rather than for transactions

Data Warehousing and business intelligence

are techniques to provide business people with the information and tools they need to make operational and strategic business decisions

Focus on the business
Build an Information Infrastructure
Deliver in meaningful increments
Deliver the entire solution:

Kimball Lifecycle - The four primary principles are

The Kimball Lifecycle

a collection of steps, tasks, and dependencies based on the collective experience of the Kimball group

Business intelligence (BI)applications

Predefined applications that query, analyze, and present information to support a business need.

Data mining model

A statistical model often used to predict future behavior based on data about past behavior of identify closely related subsets of a population called clusters

ad hoc

are formulated by the user on the spur of the moment

Developing solid business sponsorship

the best place to
start the DW/BI project.

Dimensional Model

is the heart of the DW/BI system.
It is the target for the ETL system, the structure of the database, and the model behind the user query and reporting experience.

1Present the needed information to users as simply as possible 2.Return query results to the users as quickly as possible 3Provide relevant information that accurately tracks the underlying business processes

The three primary design goals of Dimensional Modeling are.

A Fact table

contains the measurements associated with a specific business process, like taking an order, displaying a web page, admitting a patient, or handling a customer support request.

conformed dimensions

When two business processes use exactly the same product dimensions with the same Keys,

surrogate key

a unique value, usually an integer, assigned to each row in the dimension.

Slowly Changing Dimensions

is the technique use to manage attribute changes in a dimension over time.

Type One is the technique used to deal with changes the business doesn't care about
Type Two the type two change tracking technique is a powerful method for capturing the attribute values that were in effect at a point in time and relating them to the business events

SCD's have two types:

degenerate dimension

A dimension without attributes is called

Snowflaking

is the practice of connecting lookup tables to fields in the dimension tables.

Step 1: Identify the business process. Step 2: Declare the Grain. Step 3: Choose the dimensions. Step 4: Choose the Facts

Creating the high level dimensional model: The Four-Step Modeling Process.

Completeness
Lower cost of ownership
Openness
High Performance and scale
Microsoft Investment in business intelligence

Advantages to using the Microsoft Toolset

Manageability: As discussed in Chapter 17, it's much easier to handle changes in structure and content in the relational database than in Analysis Services.

Why do you need to store the dimensional data in the relational database?

the main datafile with a suffix of .mdf and the
transaction log file with a suffix of .ldf.

SQL Server creates two default files when you create a
database:

The first is
a true dollar cost: Data compression is an Enterprise
Edition feature.
The second is a CPU cost.

There are two major costs to data compression

entity integrity

All tables have a primary key, which is that column or set of columns that
will identify a single row when constrained to a single
value.

referential integrity

Foreign keys define the relationships between the fact anddimension tables. It says every value for a given foreign key found in the fact table is guaranteed to have an entry inthe associated dimension table

Indexes Aggregates

two major performance tools available in the data warehouse database platforms

date key

For fact tables, the standard starting point is to create a single-column clustered index on the

aggregate table

table summarizes data at a higher level than the atomic data maintained in the detailed fact table...tables are the single most useful way to improve query performance on a dimensional DW/BI system.

Partitioned Tables

essentially a large table split out into
smaller tables under the covers. Each of these smaller
tables, can be accessed, indexed, and
managed independently.

Master data

is reference data that is managed centrally for an
organization. It
describes the business entities that participate in the transaction systems.

is before the data warehouse, in a separate though related management system.

The best place for managing your organization's master reference data

Data Governance

is the process and rules for managing data your organization

Required fields,
-Standard values,
-Data retention policies

Data Governance rules should include the following:

OLAP stands for On Line Analytic Processing, OLAP is an analytic database and not a OLTP Online transaction processing database

What is OLAP

OLAP cube

is a set of data, organized in a way that facilitates non-predetermined queries for aggregated information.

Focus on the business:
Build an Information Infrastructure:
Deliver in meaningful increments
Deliver the entire solution:

The Kimball Lifecycle is a collection of steps, tasks, and dependencies based on the collective experience of the Kimball group.
The four primary principles are:

Data warehouse

The platform for business intelligence. In the Kimball method it includes everything from the original data extracts to the software and applications that users see.

Business process dimensional model

A specific discipline for modeling data that is an alternative to normalized modeling

Normalized models

the Third Normal Form models, were designed to support the high-volume single-row inserts and updates that define transaction systems

Online analytic processing database (OLAP):

A technology for storing, managing, and querying data specifically designed to support business intelligence users.

Extract, Transformation, and Load system (ETL):

a set of processes that clean, transform, combine, de-duplicate, household, archive, conform and structure data for use in the data warehouse.

primary key

a multi-part key made up of a subset of the foreign keys from each dimension table involved in the business event.

Atomic level

Fact tables should be built with the lowest level of detail (Grain)as possible from the original source- generally this is known as

Atomic fact

provide complete flexibility to roll up the data at any level of summary needed across any dimension

Conformed Dimension

is a single dimension that is shared across multiple processes

The Date

fundamental business dimension across all organizations and industries

Sharepoint

is Microsoft 's offering in the portal and web-based application platform category

BI

The __________portal is the navigation framework for the organizations structure.

Libraries

_______________are a special type of list that store files as well as information about files (essentially directories).

Farm

A SharePoint _____________is the collection of all servers running in a SharePoint deployment and the services running on those servers.

Data Mining

We generally describe ____________as a process of data exploration with the intent to find patterns or relationships that can be made useful to the organization.

Knowledge Discovery and Data Mining

The field of data mining is known more broadly as______________________.

Algorithm

The programmatic technique used to identify the relationships or patterns in the data is called an.

Model

The definition of the relationship identified by the algorithm, which generally takes the form of a set of rules, a decision tree, a set of equations, or a set of associations is called a________

Case

The collection of attributes and relationships (variables) that are associated with an individual instance of the entity being modeled, usually a customer is called a.

Page or section break field

What is the function of this mock-up user interaction symbol ()

PowerPivot

In Excel 2010 what is the tool that is used for analysis?

The BI applications provided data warehouse access for a broad and important user community.
Response Feedback:

What is the definition of Broad access BI Applications?

Link/URL

What is the function of this mock-up user interaction symbol \\ \\

Application- entered variable

What is the function of this mock-up user interaction symbol {}

BI applications built in the development phase of the lifecycle.

What is the definition of Early impact BI Applications?

Purpose

The report mock-ups are a great way to communicate the content and ________ of the reports.

User-entered variable

What is the function of this mock-up user interaction symbol < >

Report template comments

What is the function of this mock-up user interaction symbol [ ]

Drillable field

What is the function of this mock-up user interaction symbol <<>>

Business Value

The process of identifying and creating BI application base on business requirements almost guarantees that you will provide something of significant value to the business.

All of the above

Business Requirements whose Functional Implication is to Create Reports have the following in common

Frequent

_______________means that the data visible in a report or analysis is updated many times a day but is not guaranteed to be the absolute current truth.

Stand-Alone

Report Builder 3.0 is a ______________________report authoring tool.

Report description

Every report should have a __________?

Navigation framework

Business Requirements whose Functional Implication is to create Final reports have the following in commonOnce you know which reports to build, you need to categorize them so the users can find the information they're looking for as quickly as possible.

Report Name

The_________________ communicates the contents of the report to the viewer.

Description

Every report should have a comment or ____________________

Reporting Services

The most common way to use SQL Server technology to access real-time data is to use?

Prototype of the report

Mock up reports represent a _____________

BI portal

is the navigation framework for the organizations structure.
IT is used to collect and categorize all structured and unstructured information throughout the organization and make it available through a rich user interface with intelligent search capabilities and the ability to personalize the experience.

BI portal

the central place where people can find the analytic information they need.

Usable Content-rich Clean Current

The BI portal plays a significant role in the change management process. The BI portal must provide the following features:

Interactive and Value-oriented

Role of the BI portal

Density: density translates to the BI portal in a couple of ways. Primarily, it means we flatten the information hierarchy. Structure: The BI portal should be structure in a way that helps you find the information you need. The BI

The process of creating the BI portal requires the careful combination of two basic design principles:

provide access to standard reports

The main purpose of the BI portal is to....

Web application

a top-level domain URL is known as a web application, and it corresponds to an Internet Information Services (IIS) website. A web application contains one or more site collections, which is a set of one or more websites that have the same owner and administration settings.

Site

is a coherent set of pages with a home page, libraries, lists, a common layout template and theme, all based off the same root URL.

Alternate access mapping

When one sharePoint server has multiple application mapped to different ports on the same server.

Web Parts

reusable code modules that allow you to display various types of content on your pages

Views

are ways to view the contents of a library or list, sort
of like a report definition

Lists

are collections of items of a single type.... tasks, calendars, announcements, links, and contacts.

Data Mining

process of data
exploration with the intent to find patterns or relationships
that can be made useful to the organization.

Knowledge Discovery and Data Mining (KDD

The field of data mining is known more broadly as

first is usually a one-time project to help you gain an understanding of who your customers are and how they behave. We call this exploratory or undirected data mining, where the goal is to find something interesting.
The second is most often a project created to work on a specific problem or opportunity. We call this more focused activity directed data mining.

two common approaches to data mining

Case set

A group of cases that share the same attributes

Dependent variable(s) (or predicted attribute or predict column

The variable the algorithm will build a model to predict or classify.

Independent variable(s) (or predictive attribute or input column

The variables which provide the descriptive or behavior information used to build the model.

Discrete or continuous variables:

Numeric columns that contain continuous or discrete values.

Regression

A statistical technique that creates a best-fit formula based on a data set. The formula can be used to predict values based on new input variables.

Deviation

A measure of how well the regression formula fits the actual values in the data set from which it was created

mining structure

A Microsoft data mining term used as a name for the definition of a case set in Analysis Services. is essentially a metadata layer on top of a Data Source View that includes additional data mining-related flags and column properties, such as the field that identifies a column as input, predict, both, or ignore

Mining model

The specific application of an algorithm to a particular mining structure

Classification

is the task of assigning each item in a set to one of a predetermined set of discrete choices based on its attributes or behaviors.

Anomaly Detection

the identification of cases that deviate from the norm in a significant way.

The process of building a decision tree starts with the dependent variable to be predicted and runs through the independent variables to see which one most effectively divides the population. The goal is to identify the variable
that splits the cases into groups where the predicted variable (or class) either predominates or is faintly represented.

The process of building a decision tree starts .....

Neural Network

mimic our understanding of the
way neurons work in the brain.

support

The number of times a combination occurs is
called its

is to create the mining model structure in BIDS

The first step in developing the data mining model is

The first involves comparing models created with different
algorithms, parameters, and inputs to see which is most effective at predicting the target variable. The second is a business review of the proposed model to examine its contents and assess its value

There are two kinds of model validation in data mining

a lift chart, a classification matrix, and cross validation.

Analysis Services Data Mining provides three tools for
comparing the effectiveness of certain types of data mining models

the selection of the
"best" model based on its performance in the model comparison process and its implementation cost.

The data mining phase ends with....

Naïve Bayes

It is a simplified version of Decision Trees, and can be used for
classification and prediction for discrete attributes. this algorithm is fairly simple, based on the relative probabilities of the different values of each attribute, given the value of the predictable attribute.

Real Time

means delivering data to the business users throughout the day.

Daily/ Frequent /Very low latency Instantaneous

Data can be delivered in the following ways for real time systems.

Daily

Global company process data in a handful of regions, after local midnight. This isn't a true real-time scenario because each region's data is processed daily. This is the norm for business today.

Frequent

means that the data visible in a report or analysis
is updated many times a day but is not guaranteed to be the
absolute current truth.

Very low latency

means the data must be delivered to the
business users faster than we can load it into the DW/BI
system, but not truly instantaneously.

Instantaneous

means that the data visible on the screen
represents the true state of the transaction system(s) at
every instant.

Cache

Reports are executed on a schedule and a copy of the report is saved to the system for a period of time for users to use.

Snapshot report

saves the report's body, including the dataset, in the report catalog database. you can schedule the execution of the report to run on a schedule and store its results.

Mirroring

is a feature of SQL Server that's used primarily to increase the availability of a transaction database.

BizTalk Application

An enterprise application developed to handle real-time data needs.

Partition

a separate physical table, usually in a different database that uses transactional database options

Dashboards and scorecards

These applications generally
involve a combination of multiple reports and charts in a
seamless interface that use exception highlighting and
drill-down capabilities to analyze data from multiple

Operational BI and closed loop applications

These include
the use of applications that are more sophisticated than
typical operational reports. These applications leverage the rich historical context across multiple business processes available in the data warehouse to guide operational decision making.

The BI portal

is the business's primary interface
to the BI applications. It provides an organizing frameworkto help people find the information they need

Determining the initial report set
Creating a standard look-and-feel template
Creating a mock-up and documentation for each target report
Designing the navigation framework
Conducting the user review

Report specification typically includes the following tasks

Database prototyping
Data profiling
Report/Dashboard prototyping and delivery
Business user analytics

PowerPivot adds value to the BW/BI system in the following ways

Work from a dimensional model
Create incremental calculations
Check your work
Create a robust date dimension
Use naming conventions

Guidelines for working with PowerPivot for Excel

1. Develop your plan.
2. Set up the design and development environment.
3. Create a Data Source View.
4. Create and fine tune your dimensions.
5. Run the Cube Wizard and edit the resulting cube.
6. Create calculations and other decorations.
7. Iterate, iterate, iterate.

Designing the OLAP Structure

Incomplete attributes
Data Integration
Systems integration

Common master data management problems include

Please allow access to your computer’s microphone to use Voice Recording.

Having trouble? Click here for help.

We can’t access your microphone!

Click the icon above to update your browser permissions above and try again

Example:

Reload the page to try again!

Reload

Press Cmd-0 to reset your zoom

Press Ctrl-0 to reset your zoom

It looks like your browser might be zoomed in or out. Your browser needs to be zoomed to a normal size to record audio.

Please upgrade Flash or install Chrome
to use Voice Recording.

For more help, see our troubleshooting page.

Your microphone is muted

For help fixing this issue, see this FAQ.

Star this term

You can study starred terms together

NEW! Voice Recording

Create Set