Data Models

Data Models, updated 1/31/24, 10:56 AM

An introduction of data models in IT 3703 at Kennesaw State University - updated in 2024.

About Jack Zheng

Faculty of IT at Kennesaw.edu

Tag Cloud

Data Models
for Analytics and BI
Jack G. Zheng
Spring 2024
http://zheng.kennesaw.edu/teaching/it3703/
IT 3703 Intro to Analytics and Technology
Overview
1. What is data model?
2. Compared similar terms
– data structure, data architecture, schema,
semantic layer
3. Major types of data models
4. Why do we need data model (in analytics)?
2
This lecture notes provide some basic overview of data
model and related concepts. It does not go into details of
each data model or data modeling techniques. Students
will learn and practice more in other courses like CSE 3153
and IT 4713.
Data Model
• A model is a structured way of representing, describing and explaining things
(person, organization, process, system, data).
– Process model
– System model
– Data model
– Knowledge model
– Business model
• Data model is about representation and organization of data, with a set of
business concepts and rules.
– A data model conceptualizes data elements and standardizes how the data elements
relate to one another
• https://en.wikipedia.org/wiki/Data_model
– How is data grouped and associated? What is this data about? How are they related?
• Data models depict and enable an organization to understand its data assets
through core building blocks such as entities, relationships, and
attributes. These represent the core concepts of the business such as
customer, product, employee, and more.
– https://www.dataversity.net/data-modeling-vs-data-architecture/
3
Data model is like a way to organize your house –
where to put things, how to put things, and not just for
yourself but also be able to explain it to other people so
they can find things.
Data Model Examples
• E/R (entity-relationship model) is an example of a conceptual data
model.
• An ER model describes interrelated things of interest in a specific
domain of knowledge. A basic ER model is composed of entity types
(which classify the things of interest) and specifies relationships that
can exist between entities (instances of those entity types).
– https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model

It defines data as (or groups data using)
– entities
– relationships
4
You should have learned, or are learning,
about E/R in CSE 3153.
Three Levels of Data Model
• Three levels: conceptual vs. logical vs. physical data models
• The complexity increases from conceptual to logical to physical.
• More details
– Core reading: https://www.guru99.com/data-modelling-conceptual-logical.html
– An example in relational database design:
http://www.1keydata.com/datawarehousing/data-modeling-levels.html
5
Image from https://www.slideshare.net/Dataversity/ldm-webinar-data-modeling-business-intelligence
Basic Elements in Data Models
• Different level of data model has various elements, but the following
elements are general to most models. Each kind of data model has
defined its own way (structures and rules) to set up these elements.
• Entity
– Object, thing, activity, event
• Relationship
– How entities are associated
• Attribute
– Data that describes entities or relationships
• Naming
– Names of entity, attribute, and relationship
• Other more specific rules (often defined at lower levels like logical or
physical level), or metadata
– Data type: string, number, etc.
– Domain (value ranges): defined range of the dataset. For example, age
from 0 to 120.
6
Data Modeling
• The DMBOK 2 defines Data Modeling and
Design as “the process of discovering,
analyzing, representing and communicating
data requirements in a precise form called
the data model.”
• Extended readings: challenges in data
modeling
– https://learndatamodeling.com/blog/challenging-
situations-in-data-modeling/
7
Terms Comparison
• Compared to these similar concepts
– data structure
– schema
– data architecture
– semantic layer
– metadata
• Other related
– information architecture
– content structure
8
Data Structure

In computer science, a data structure is a data organization,
management, and storage format that enables efficient access and
modification. More precisely, a data structure is a collection of data
values, the relationships among them, and the functions or operations
that can be applied to the data, i.e., it is an algebraic structure about
data.
– https://en.wikipedia.org/wiki/Data_structure
• A data structure is a more technical and lower-level term. It emphasizes
a structure or a model that targets computer system (vs. human) for
optimal processing.
• For example, an array is a data structure, and “dictionary” is another
data structure. Also, the “classes” that form your data model, are data
structures too, any representation of a specific data object has to be in
form of a data structure.
– https://stackoverflow.com/questions/24228038/difference-between-
datastructure-and-datamodel-with-example
• A data set may be lack of data structure (unstructured data), but it has a
date model inherently.
9
Schema
• Schema is a structure, a representation of a plan or theory in the form
of an outline or model.
• Database schema refers to the organization of data as a
blueprint of how the database (a general sense) is constructed.
– Schema is a formal definition of the database and all of its objects.
– Schema can be thought as the physical implementation of a data
model.
– A database generally stores its schema in a data dictionary. It is
often visualized in a diagram.
• XML Schema is a definition of XML format files.
10
Extending reading: it also explains why using schema in
modern analytics
https://www.youtube.com/watch?v=3BZz8R7mqu0
Data Architecture
• Both data architecture and data modeling are the knowledge areas defined in DMBOK V2.

“Data Architecture defines the blueprint for managing data assets by aligning with organizational strategy
to establish strategic data requirements and designs to meet these requirements.”

“Data modeling is the process of discovering, analyzing, representing and communicating data
requirements.”
• So, data architecture is actually a system architecture focusing on the management of data. It is
not directly about how data is represented internally.
• Data architecture is an integral part in the overall enterprise architecture, defining the structure
and operation of data in the whole organization.
• Data architecture focuses
– Business: processes, organizational structures, and business goals

Infrastructure and platform: on-site or cloud storage?
– Systems and technology: virtualization?
– Apps and data flows: API, download, import/export?
– Security, privacy, compliance: access control, encryption
11
An example of data architecture from
https://www.mckinsey.com/business-
functions/mckinsey-digital/our-
insights/how-to-build-a-data-architecture-
to-drive-innovation-today-and-tomorrow
Extended reading:

https://www.dataversity.net/data-modeling-vs-
data-architecture/

https://www.integrate.io/blog/data-modeling-vs-
data-architecture-5-critical-differences/
Semantic Layer

“Semantic layer” is used by some systems like SQL Server
Analysis Services.
• A semantic layer is a business representation of corporate data
that helps end users access data autonomously using common
business terms. A semantic layer maps complex data into
familiar business terms such as product, customer, or revenue to
offer a unified, consolidated view of data across the organization.
• By using common business terms, rather than data language, to
access, manipulate, and organize information, a semantic layer
simplifies the complexity of business data. Business terms are
stored as objects in a semantic layer, which are accessed
through business views.
12
Extended reading:

https://www.atscale.com/blog/what-is-a-universal-semantic-layer-why-
would-you-want-one/

https://kyligence.io/blog/semantic-layer-the-bi-trend-you-dont-want-to-
miss-in-2020/
Metadata and Data Dictionary
• Metadata is "data that provides information about other
data", but not the content of the data.
– https://en.wikipedia.org/wiki/Metadata
– Technical metadata vs. business metadata
– Metadata itself does not imply any structure or model
– Metadata is a general term used in many domains.
• Metadata in webpages
• Metadata in video/audio files
• Metadata in database systems
• Data dictionary
– A data dictionary, or metadata repository, as defined in the
IBM Dictionary of Computing, is a “centralized repository of
information about data such as meaning, relationships to
other data, origin, usage, and format”
– Data dictionary is a system
– https://en.wikipedia.org/wiki/Data_dictionary
13
Typical Data Models and Their Use in Analytics
• Our curriculum emphasizes the following types of
data models
– Entity relationship model (a conceptual model)
– Relational data model (a logical model)
– Dimensional data model
– Hierarchical data model
• Other types
– Graph (network) data model
• Core reading
– Types of data model: https://www.educba.com/types-of-
data-model/ - a brief intro to the four basic data models.
14
Extended reading:
• More types of data models used in data storage systems
https://docs.microsoft.com/en-us/azure/architecture/guide/technology-
choices/data-store-overview
Entity Relationship Model
• E/R is the most commonly used conceptual
level model in software design and database
design.
• It is a conceptual level model that uses a few
simple elements to describe how things are
defined (entity) and related (relationship).
• It does not have extensive implementation
details
• Good for human understanding but not
computing operations
15
Study Relational Model in CSE 3153 introduction to database systems
course.
Relational Model
• Relational model is based on relation theories
and a set of mathematical rules
• It essentially a 2D data table (rows and
columns). Although in database, relational
model abides to additional stricter rules.
• It is common in relational databases and
spreadsheet data
• Most of the database classes (like CSE 3153,
IT 4153, IT 5433, and IT 6733) focus on
relational models
16
Study Relational Model in CSE 3153 introduction to database systems
course.
Dimensional Data Model
• Dimensional data model is a data model that specifically constructed around the
elements of facts/measures and dimensions

It is directly based on the most common business questions and queries
– The platform for (traditional) BI.
• Dimensional model is a conceptual model and can be implemented in a logical
model like the relational model or in a logical multidimensional database model
like an OLAP server (such as SQL Server Analysis Services dimensional
model)
• Where is Dimensional Model used?
1.
A major data warehouse design method by Ralph Kimball
(http://en.wikipedia.org/wiki/Ralph_Kimball)
• Dimensional modeling has been broadly accepted as the dominant technique for DW/BI
presentation.

“Based on our experience and the overwhelming feedback from numerous practitioners from
companies like your own, we believe that dimensional modeling is absolutely critical to a successful
DW/BI initiative.” (Kimball)
2.
The basis of OLAP (multidimensional database)
3.
The foundation of multidimensional analysis;
most widely used in many business
(descriptive) analysis
17
Dimensional model is specifically covered in
IT 4713 and IT 7123.
Hierarchical Model
• A hierarchical database model is a data
model in which the data are organized into a
tree-like structure.
• The data are stored as records which are
connected to one another through links.
• Hierarchical data models are commonly
implemented in many database systems
(such as IBM IMS) and data files in
XML/JSON format.
18
Hybrid Data Models
• Hybrid data models refer to a data model that accommodates
two or more types of data models in the same data set
• Example
– MariaDB uses JSON (Hierarchy) + Relational
https://mariadb.com/database-topics/semi-structured-data/ (webinar
video recording https://www.youtube.com/watch?v=ES0I_uYJLRw
with slides at https://www.slideshare.net/Dataversity/json-relational-
how-to-use-hybrid-data-models)
– Microsoft SQL Server also support similar features
19
In traditional relational schema
design, interests would be in
another table and linked back
to users (M:N). JSON format
simplifies the table structure.
Graph (network) Data Model
• Data entities are associated in multiple
relationships, in multiple paths
• Typically used in text analysis, literature
analysis
20
Where are data models used?
• Data models are used in many data related
fields and operations including
– Database design and development
– Data management and storage
– Business intelligence and analytics
– Analytical data storage
– Data analysis, query, and report
– Software engineering
– Data governance https://www.dataversity.net/data-
models-create-living-policy-data-management/
– Data architecture
21
Data Model/Schema in Analytics
• Why data model (in BI and analytics)?
– It is the “intelligence behind BI (analytics)” https://www.slideshare.net/Dataversity/ldm-
webinar-data-modeling-business-intelligence
– Facilitates the understanding, query, reporting, analysis, and other use of data
– Creates business meaning & context
– Understands source and target data systems
– Optimizes data structures to align queries with reports
– More: https://www.linkedin.com/pulse/7-reasons-non-analysts-should-understand-data-
modelling-chris-cook/

In traditional BI and data warehouse, data models are predefined and created
before any analysis or reporting. In more recent self-analytics movement, data
models may be automatically created at the time of analysis and reporting.

Is data model less important in modern analytics?
– It’s important to find a balance between managing & modeling “trusted data sets” and
giving users the flexibility to explore.
– Most users will find these trusted data sets a welcome asset, but don’t want to be restricted
from doing data exploration when appropriate.
22
Extended reading: Data Modeling & Business
Intelligence webinar https://www.dataversity.net/ldm-
webinar-data-modeling-business-intelligence/ with slides
https://www.slideshare.net/Dataversity/ldm-webinar-data-
modeling-business-intelligence
Why is a (good) data model important?
• Affect calculation methods: some data models result in more difficult
calculation methods (depending on tool capabilities).
– Create difficulty in calculation and query: calculation and query could use a
lot of hacks to make things complex and difficult to maintain
• Affect analysis/query results

Incorrect identify the year column as int data type, and sum of years
aggregation is performed.

Impacts readability and manageability
– Many tables can be messy and difficult to browse, or unnecessary tables
that could be merged.
– Ends up in many columns which can be rows.
• Auto modeling is not always accurate
– Missing/incorrect relationship
– Filters did not apply to a table
23
Good reading:
https://www.linkedin.com/pulse/7-reasons-non-analysts-should-understand-
data-modelling-chris-cook/
Example
• Consider the following two methods of
modeling student scores in a table
1. Column based
2. Key-value paired
• Impacts
– how is the calculation (such as average scores)
affected?
– if we will have additional information about each
score item (for example, weights), how will that
impact the calculations?
24
A Good Model
vs.
A Mess
(image processed for IP protection)
25
After carefully
modeling and clean-up
Applying a Data Model
• What’s the difference?
– Having a conceptual data model only
vs.
Enforcing a logical data model
– One big flat table
vs.
A set of linked table with proper column designs
• For example
– GA Schools Grades Data
https://public.gosa.ga.gov/noauth/extensions/SchoolGra
des-Georgia/SchoolGrades-Georgia.html
26
Model Design Tools
• ERDplus
– https://erdplus.com/standalone
– A tutorial of an earlier version is at
https://www.youtube.com/watch?v=hv6Id55tRaA
, but the new UI should be simple to follow.
• https://dbdiagram.io/home
27
Essential Learning Resources
• What is data model: https://en.wikipedia.org/wiki/Data_model (focus on
section 1 to 4)
• Video: What is a data model? https://www.youtube.com/watch?v=4qFZ-
5i4GS8
• Data model vs. data architecture: https://www.dataversity.net/data-
modeling-vs-data-architecture/
• More details of the three levels of data models:
https://www.guru99.com/data-modelling-conceptual-logical.html
• Why do we need data models?
https://www.linkedin.com/pulse/7-reasons-non-analysts-should-
understand-data-modelling-chris-cook/
• Types of data model: https://www.educba.com/types-of-data-model/ - a
brief intro to the four basic data models.
– For a more extensive overview, see
https://www.lucidchart.com/pages/database-diagram/database-models
28
Additional Good Materials/Resources

Review of three levels of data models

https://opentextbc.ca/dbdesign01/chapter/chapter-5-data-modelling/

http://www.databaseanswers.org/data_models/types_of_data_models/index.htm

Terms
– What is a database schema https://www.youtube.com/watch?v=3BZz8R7mqu0

https://www.atscale.com/blog/what-is-a-universal-semantic-layer-why-would-you-want-one/

https://kyligence.io/blog/semantic-layer-the-bi-trend-you-dont-want-to-miss-in-2020/
– Data architecture

https://www.integrate.io/blog/the-ultimate-guide-to-data-architecture/

https://www.dataversity.net/data-modeling-vs-data-architecture/

https://www.integrate.io/blog/data-modeling-vs-data-architecture-5-critical-differences/

A more extensive survey of major data models

https://www.lucidchart.com/pages/database-diagram/database-models

https://afteracademy.com/blog/what-is-data-model-in-dbms-and-what-are-its-types

Data modeling

https://panoply.io/analytics-stack-guide/data-modeling-examples-for-analytics/

https://www.dataversity.net/category/data-topics/modeling/

https://www.dataversity.net/data-modeling-trends-in-2022/

https://www.dataversity.net/data-modeling-trends-in-2020-a-year-of-optimization/

https://www.dataversity.net/data-modeling-trends-in-2019/

http://www.agiledata.org/essays/dataModeling101.html

https://learndatamodeling.com

Interview questions: https://www.janbasktraining.com/blog/data-modeling-interview-questions/