dimensional model

dimensional model, updated 1/31/24, 10:58 AM

visibility936
  verified

Dimensional data model overview - update in 2024.

About Jack Zheng

Faculty of IT at Kennesaw.edu

Tag Cloud

Dimensional Data Model
Basics
Jack G. Zheng
Spring 2024
http://idi.kennesaw.edu/it4713
http://idi.kennesaw.edu/it7123
https://www.edocr.com/v/bmvojelj/jgzheng/dimensional-model
IT 4713 BI System
IT 7123 BI
Overview
1. Dimensional model basic concepts
2. Dimensional model as relational schemas
– Star schema, snowflake schema
3. Basic dimension modeling techniques and
process
2
Dimensional model is the basis of multidimensional
analysis. It is also the foundation of traditional BI - data
warehouse design and OLAP reporting. The concepts are
still used in many modern BI tools including Power BI. This
lecture notes introduce the basics the dimensional data
model and its design at the conceptual and logical level.
Data Model Review
• A data model conceptualizes data elements and standardizes how the data elements relate to
one another
• Why data model (in BI and analytics)?
– Facilitates the understanding, query, reporting, analysis, and other use of data
– Creating business meaning & context
– Understand source and target data systems
– Optimize data structures to align queries and reports

Three levels: conceptual vs. logical vs. physical data models.

Typical data models
– Entity relationship model (a conceptual model)
– Relational data model (a logical model)
– Dimensional data model
– Graph (network) data model
– Hierarchical data model
3
Image from
https://www.slideshare.net/Dataversity/ldm-webinar-
data-modeling-business-intelligence and also view the
webinar https://www.dataversity.net/ldm-webinar-
data-modeling-business-intelligence/
The complexity
increases from
conceptual to
logical to physical.
For complete info, refer to the IT
3703 data model lecture notes.
Dimensional Data Model
• Dimensional data model is a data model that specifically constructed around the
elements of facts and dimensions

It is directly based on the most common business questions and queries.

It is the platform for (traditional) BI.
• Dimensional model is a conceptual model and can be implemented in a logical
model (like the relational model) or in a logical multidimensional database
model like an OLAP server
• Where is Dimensional Model used?
1.
A major data warehouse design method by Ralph Kimball
(http://en.wikipedia.org/wiki/Ralph_Kimball)
• Dimensional modeling has been broadly accepted as the dominant technique for DW/BI
presentation.

“Based on our experience and the overwhelming feedback from numerous practitioners from
companies like your own, we believe that dimensional modeling is absolutely critical to a successful
DW/BI initiative.” (Kimball)
2.
The basis of OLAP (multidimensional database)
3.
Commonly referenced (but may not be enforced) in many self-service BI tools.
4.
The foundation of multidimensional analysis; most widely used in many business
(descriptive) analysis
4
IT 4713 covers all areas; IT 7123 particularly focuses on #3 and #4.
Multidimensional Analysis
• Multi-dimensional queries
– A dimension is a particular way (or an attribute) of describing and
categorizing data
– Such queries are usually arithmetic aggregation operations (sum, average,
etc.) on records grouped by multiple dimensions (attributes) at different
aggregation levels.
– A pivot table or crosstab is usually used for OLAP result view (aggregated
data)
• Example analysis
– More about multidimensional analysis (we will cover more about
multidimensional analysis in a later module)
https://www.youtube.com/watch?v=IhFkNmVmwn4
5
"What is the total sales amount grouped by
product line (dimension 1), location
(dimension 2), time (dimension 3) and …
(other dimensions)?"
"Which segment of
business provides the
most revenue growth?"
Multidimensional Analysis using the Matrix Table
6
A pivot table (or matrix table,
crosstab) is usually used for
multidimensional analysis result view
(data aggregated by dimensions).
This is the transactional
data report with line by
line data in a flat table.
Dimension (columns)
Dimension
(rows)
Aggregated data by
dimension members
Measure
Dimension Model Key Concepts
• The following are the key concepts and
elements in dimensional model
– Fact/measure
– Dimension
– Attribute
– Hierarchy
– Aggregate
7
Fact/Measure
• A measure (or fact) contains measurement data (often
numeric values) that measures an aspect of the
business.
• A measure has two components
– Numerical value: e.g. sales price
– Aggregation formula: e.g. sum, used for
aggregating/combining a set of values number of measure
values into one measure value determined by dimension
value combination
• Different types of businesses have different facts. For
example:
– Retail: gross sales dollars, total cost, profit, etc.
– University: student enrollment, number of degree programs,
number of graduates, etc.
– Basketball player: score, rebound, assist, etc.
– Web traffic: page views, number of visitors, etc.
8
Three Types of Measures
Additive
• Can be added over all dimensions
• Example: sales amount
Semi-additive
• Cannot be added (but may be averaged) over some
dimensions - typically time
• Example: inventory, enrollment, class size, account
balance, etc.
Non-additive
• Cannot be added over any dimension. Can use
other functions like count or average.
• Example: product unit price, download speed,
network quality, signal strength, ratios, etc.
9
Dimension
• A dimension is a structure by which facts can be calculated/aggregated
– Consists of non-overlapping data elements which are used to categorize or
group facts and provided aggregated calculations
– For example: date, geo region, product, etc.
• Dimension attribute: is a piece of information describing a particular
component of a dimension.
– For example, the date dimension has attributes of month (month number,
full month name, abbreviate month name, etc.), day, year, etc.
– Dimension attributes are usually qualitative type of data (ordinal or nominal,
see https://dwbi.org/pages/31/classifying-data-for-successful-modeling)
• Dimension attribute serves the primary source of query constraints and
grouping levels. In a typical query and report, usually identified by the
“by” keyword.
• Example
– Sales amount by state (location), by year (date), by gender (customer),
product type (product), etc.
10
Example/Exercise

In this following statements
– What is the measure?
– What are the dimensions? Or
what kind of information is used
to group data?

In this following analysis/report
https://twitter.com/ZachLowe_NB
A/status/547036034660573184
– How did the author group the
data?
– Which part is used as
dimension?
11
“Currently, 65 percent of
republican women view him
favorably, up from 55 percent
in March -- and 49 percent in
September.”
“Trilliant Health's analysis found that during the pandemic,
telehealth was primarily used for behavioral health, especially by
commercially insured women between the ages of 20 and 49.”
Banded/Ranged Dimension
• Can numeric values be a dimension? See the following example
about the “income” data

Income as a fact/measure
– “According to the US Census Bureau persons with doctorates in the
United States had an average income of roughly $81,400. The
average for an advanced degree was $72,824 with men averaging
$90,761 and women averaging $50,756 annually.”

Income (range) as a dimension – ranged dimension
– http://en.wikipedia.org/wiki/Household_income_in_the_United_State
s#Household_income
12
This is called a
ranged/banded
dimension; as they
are used to group the
data in this case, not
to measure it.
Hierarchies
• Hierarchy is a structured way to group attributes in multiple levels within
a dimension
– The lowest level is called a leaf level.
• Examples of hierarchy in a dimension
– Location dimension: country, state, city
– Organization dimension: section, division, branch, region
– Date dimension: year, quarter, month, date
• A dimension can have multiple hierarchies
– https://www.informationweek.com/software/information-
management/kimball-university-maintaining-dimension-hierarchies/d/d-
id/1073326
• Dimensions contain hierarchies. The hierarchies contain levels. And the
levels contain members.
• Dimension member
– Members are the instances (values) in a level. For example, month has
members of Jan, Feb, etc.
1
3
Aggregate
• Aggregate is a value formed by combining facts
from a given dimension member of one or more
dimensions to create a single value.
• Aggregation can be based on any formula, with
the common ones to be: summation, average,
count, etc.
• Examples
– Total sales amount in December (the month
member in the date dimension)
– Total sales amount in Georgia in December (the
state member of the location dimension + the month
member in the date dimension)
– Average sales amount in each state in December
1
4
Modeling Dimensions
• The identification of dimensions and dimension members
is the key to multidimensional analysis

It directly reflects modeler’s understanding of the domain
and problems being analyzed
• Exercise
– Identify dimensions with possible hierarchies for the following
piece of data example, based on your understanding of the
KSU class schedule and class information.
15
https://owlexpress.kennesaw.edu/prodban/bwckctlg.p_disp_listcrse?
term_in=201808&subj_in=IT&crse_in=6713&schd_in=A
Implementation of Dimensional Models
• Dimensional model can be implemented as relational
model (as schemas) in
– data warehouse/mart design (strictly following the
relational rules)
– analytics app (Power BI) data modeling (loosely
following the relational logic)
• Schema describes how data are organized in a
(data) system
• Dimensional models can be designed in two types of
schemas under the relational data model (logical
model)
– Star schema
– Snowflake schema
16
Star Schema
17
Image from https://www.slideshare.net/Dataversity/ldm-webinar-data-modeling-business-intelligence
Star is a metaphor here
to visualize how
dimensional tables are
linked to the fact table.
Star Schema in Relational Model
• Dimension tables are linked to the fact table
through primary key and foreign key pairs.
18
Foreign keys in the
fact table.
Star Schema Design Example
• This is an example conceptual dimensional data model
for a typical retail business. It has one fact and four
dimensions.
19
Date
Sales Data
Customer
Product
Store
A fact table holds measure information.
Each measure can be described by
referring to linked dimensional tables.
Data can be aggregated by summing
relevant rows.
Dimension tables hold
dimensions, hierarchies,
attributes. Each dimension
uses at least one table.
Star Schema in the Relational Model
• This is the relational model designed based on the conceptual model in the last
slide. It has one fact table (in the center) and four dimension tables linked to it.
20
Date
DateKey
DayofMonth
DayofWeek
Month
Quarter
Year
Product
ProductKey
Product
Brand
Category
Subcategory
Store
StoreKey
StoreType
City
State
Region
Country
Customer
CustomerKey
AgeGroup
Gender
Sales
DateKey
CustomerKey
ProductKey
StoreKey
SalesAmount
Tax
The fact table
represents facts with
foreign Keys pointing
to each dimension
Facts attributes
(measures)
Each dimension table
has a primary key and
links to the fact table
though foreign keys.
Star Schema in SQL Server Database
21
Image from https://learn.microsoft.com/en-us/power-bi/guidance/star-schema
Fact Table

The fact table stores all measures. A fact table consists of two types of columns. The foreign
keys column allows to join with dimension tables and the measure columns contain the data
that is being analyzed.
• Each row in the fact table represents the atomic grain value for measurement units (an
intersection of all dimensions) – that cannot break down anymore. It is the most basic unit for
further aggregation.
– For example (refer to the prior schema): Customer A purchased product M in Store 01 on Date X, and
spent 99 dollars with 10 dollar tax.

Type of fact tables (based on type of facts)
– Cumulative: This type of fact table describes what has happened over a period of time. For example,
this fact table may describe the total sales by product by store by day. The facts for this type of fact
tables are mostly additive facts. A typical example is the Transaction fact table: each row represents a
single event/transaction at the lowest granularity.
– Snapshot: This type of fact table describes the state of things in a particular instance of time, and
usually includes more semi-additive and non-additive facts, such as account balance.
– http://www.1keydata.com/datawarehousing/fact-table-types.html
– Factless fact tables: no measures contained (with an implicit measure of 1, count of rows)
http://www.1keydata.com/datawarehousing/factless-fact-table.html
22
Dimension Table
• Dimension tables are used to describe dimensions. They contain dimension
keys, attributes and value.
• Each dimension table has a single column PK. It is best to use the surrogate
key as the PK, not natural key
– A surrogate key has a unique value assigned to each row in the dimension. It
becomes primary key of the dimension table and is used to join the dimension to
the associated foreign key field in the fact table
• Benefits of surrogate keys:
– Protect the DW/BI system from changes in the source system
– Allow the DW/BI system to integrate data from multiple source system
– Enable developers to add rows to dimensions that do not exist in the source
system
– Provide the means for tacking changes in dimension
– Are efficient in the relational database and analysis services
– http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-
Why-and-Why-Not.html

The date dimension is exempt from the surrogate key rule; this highly
predictable and stable dimension can use a more meaningful primary key.
https://www.kimballgroup.com/data-warehouse-business-intelligence-
resources/kimball-techniques/dimensional-modeling-techniques/dimension-
surrogate-key/
• Usually de-normalized, with multiple hierarchies in it (in star schema).
23
Grain/Granularity
• Granularity is defined as the (lowest) level of
detail that will be maintained.
• The greater the level of detail, the finer the
level of granularity.
• The level of granularity for each dimension
determines the grain for the atomic level of
the facts.
24
Date Dimension Table
• Date data is often stored as a separate date dimension table (instead of a date
column in a fact table) for two reasons:
• With a date dimension table, the rules to group dates are specifically defined
and can be customized. This is much more flexible than using SQL date
functions.

In addition, there are several date attributes that the SQL date functions do not support,
like fiscal periods, holidays, seasons, weekdays, weekends, and national events.

It is much easier to select columns from a date table instead of using complex
functions to dynamically generate the attribute for reports.
25
Here are two ways to
model the date
dimension. Examine
its attributes.
Snowflake Schema
• The snowflake schema is a variation of the star
schema with
– normalization of dimension tables
– defined hierarchy

In star schema, one dimension is modeled in only
one table, with potential hierarchy embedded.

In snowflake schema, hierarchies are specifically
modelled as a set of one-to-many tables.
– See next slide
26
Product
ProductKey
ProductName
Category
Subcategory
Category  subcategory  ProductName
is an embedded hierarchy
Snowflaking Product Dimension
27
AdventureWorks Data Warehouse Schema
Foreign keys
pointing to
dimensional tables
Role-playing dimensions
(ShipDate, OrderDate,
DueDate)
Defined product
hierarchy in a chain
of normalized 1 to
many tables
(snowflaking). Top
level hierarchy at
the far end.
Star vs. Snowflake
Snowflake Schema
Star Schema
Ease of
maintenance/change
No redundancy and hence easier to
maintain and change
Has redundant data and hence less easy
to maintain/change
Ease of Use
More complex queries and hence less
easy to understand
Less complex queries and easy to
understand
Query Performance More foreign keys-and hence more query
execution time
Less no. of foreign keys and hence
lesser query execution time
Type of Data
Warehouse
Good to use for data warehouse core to
simplify complex relationships
(many:many)
Good for data marts with simple
relationships (1:1 or 1:many)
Joins:
Higher number of Joins
Fewer Joins
Dimension table
It may have more than one dimension
table for each dimension
Contains only single dimension table for
each dimension
When to use:
When dimension table is relatively big in
size, snowflaking is better as it reduces
space.
When dimension table contains less
number of rows, we can go for Star
schema.
Normalization/ De-
Normalization:
Dimension Tables are in Normalized form
but Fact Table is still in De-Normalized
form
Both Dimension and Fact Tables are in
De-Normalized form
Data model
Bottom up approach
Top down approach
28
http://www.diffen.com/difference/Snowflake_Schema_vs_Star_Schema
http://searchbusinessintelligence.techtarget.in/answer/Star-schema-vs-snowflake-schema-Which-is-better
http://oracle-online-help.blogspot.com/2006/11/star-vs-snowflake-schema.html
Star Schema in Various Products
• Star and snowflake schema is not only in databases.
• As a general concept and model, it can be
implemented in self-service analytics tools like Power
BI or Tableau.
29
Table join in
Tableau,
following a star
schema style.
Table join in
Power BI, with
a snowflake
schema style.
Dimensional vs. Relational Modeling
30
Extended reading from Kimball:
https://www.kimballgroup.com/1997/08/a-dimensional-modeling-manifesto/
This comparison table is from: https://learndatamodeling.com/blog/comparison-of-
relational-and-dimensional-data-modeling/
Dimensional Modeling Process
Identifying the
business rules and
processes
• Business
requirements
document should
contain a list of
analysis cases,
with the business
measures and a
detailed list of all
dimensions, down
to the lowest level
of detail for each
dimension.
Creating fact
tables with
measures
• Translate business
measures into fact
tables
• Analyze source
system information
for additional
measures
Identifying the
dimensions and
hierarchies
• Identifying the
granularity levels
• Create dimension
tables (normalized
or not) with all
columns
• Link fact tables to
the dimension
tables
31
An alternative approach: see Kimball’s four step at:
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-
techniques/dimensional-modeling-techniques/four-4-step-design-process/
Star Schema Design Steps
1. First create all dimension tables
– Each dimension table needs to have an identifying
column (primary key)
2. Then create a fact table with all measures
stored in this table
3. Link dimension tables and the fact table
– Starting with one dimension table, make a key
column (foreign key) by adding a new one or
transforming an existing one, that matches the
primary key in a dimension table
– Link the dimension table to the fact table through
the key pairs.
– Repeat this for all dimension tables
32
Modeling and Designing Tools
• https://erdplus.com
– A simple online modeling tool for star Schemas
– This tool meets our needs, so we are using it for some
schema design practice.
• Others
– Visio https://www.youtube.com/watch?v=Z3uTWpuv54U
– Visual Studio Architect
– http://www.symcorp.com/tech_expertise_design.html
– IBM InfoSphere Data Architect http://www-
03.ibm.com/software/products/en/ibminfodataarch
– MySQL Workbench
https://www.youtube.com/watch?v=7Pwj7nV-oRM
33
Why do we need dimensional model or star schema?
• Self-service tools like Excel and Power BI might allow you to get
away with building queries and reports out of a big flat table, or
even more unstructured.
• A dimensional model (at least the concept of facts and
dimensions) seeks to divide your data into two big groups, and
thus two different types of attributes in your model: facts and
dimensions.
– facts contain the quantifiable data we’re interested in,
– dimensions contain the contextual information that we wish to
analyze
• The end result for users is a logical and intuitive grouping of
values
• Having dimensional model in mind, even the model is not
specifically built in a tool, will help greatly for the analysis and
reporting of data.
34
Star Schema Exercise
• Create a star schema for the following
example data, for the potential analysis of
student enrollment in classes (enrollment
pattern, class size, history, trend, etc.)
35
https://owlexpress.kennesaw.edu/prodban/bwckctlg.p_disp_listcrse?
term_in=201808&subj_in=IT&crse_in=6713&schd_in=A
Essential Resources
• The 101 Guide to Dimensional Data Modeling:
https://dwbi.org/pages/3/the-101-guide-to-dimensional-data-modeling
• Star schema and snowflake schema are two common relational
database schema types that implements the dimension model:
https://dwbi.org/pages/19/dimensional-modeling-schema

“What are Dimensions and Measures” a very quick, simple, and high-
level introduction video by Intricity:
http://www.youtube.com/watch?v=qkJOace9FZg
• A more technical introduction of dimensional modeling by LeapFrogBI
(the following three are the basic ones required in this module)
– Facts and dimensions: https://www.youtube.com/watch?v=6k3nwXXpnMY
– Types of Keys (some review of keys):
https://www.youtube.com/watch?v=_aN-8kszIdA
– Snowflake: https://www.youtube.com/watch?v=ZVV56JBPDgA
36
Good Learning Resources

Webinar - Data Modeling & Business Intelligence (and data analytics): Webinar recording https://www.dataversity.net/ldm-webinar-data-
modeling-business-intelligence/
and slides here
https://www.slideshare.net/Dataversity/ldm-webinar-data-modeling-business-intelligence

Dimensional model and schema

https://dwbi.org/categories/17/dimensional-model

Dataversity data modeling: http://www.dataversity.net/category/data-topics/modeling/

DWBI.org dimensional model tutorials http://dwbi.org/data-modelling/dimensional-model

https://learndatamodeling.com/blog/data-modeling-topics-index/

1KeyData: https://www.1keydata.com/datawarehousing/dimensional.html

http://en.wikipedia.org/wiki/Star_schema and http://en.wikipedia.org/wiki/Snowflake_schema

https://senturus.com/resources/data-modeling-comparison-tableau-cognos-and-power-bi/

Compare relational model and dimensional model

https://www.kimballgroup.com/1997/08/a-dimensional-modeling-manifesto/

https://pdfs.semanticscholar.org/f583/bd7fcc5cb027e22c6a94ba37bfda57ba9bb6.pdf

LeapFrog Dimensional Modeling training serials playlist https://www.youtube.com/playlist?list=PLrbIyvYCdg0iAUQoxG5vI_yKqzZ2AcgGe

Kimball DW/BI Resources http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/

A Dimensional Modeling Manifesto by Ralph Kimball: http://www.kimballgroup.com/1997/08/a-dimensional-modeling-manifesto/

Kimball Dimensional Modeling Techniques collection: http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-
techniques/dimensional-modeling-techniques/
or
http://www.kimballgroup.com/wp-content/uploads/2013/08/2013.09-Kimball-Dimensional-Modeling-Techniques11.pdf

Kimball Design Tips: http://www.kimballgroup.com/category/design-tips/

Articles: http://www.kimballgroup.com/category/business-intelligence-and-data-warehouse-articles/

https://www.kimballgroup.com/2009/05/the-10-essential-rules-of-dimensional-modeling/

Vendor resources

IBM redbook, Dimensional Modeling: In a BI Environment http://www.redbooks.ibm.com/abstracts/sg247138.html

https://www.ibm.com/support/knowledgecenter/SS9UM9_9.1.2/com.ibm.datatools.dimensional.ui.doc/topics/c_ida_dm_container.html

Oracle Data Warehousing Logical Design https://docs.oracle.com/en/database/oracle/oracle-database/23/dwhsg/data-warehouse-logical-design.html
37