type
Post
Created date
Sep 18, 2022 09:01 AM
category
Data Science
tags
Data Warehousing
status
Published
Language
English
From
School
summary
slug
password
Author
Jason Ching Yuen Siu
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

Hierarchies

What and Why Hierarchies?

Like normalisation in ERD, having a hierarchy means that you drill down the information into a deeper level. We do this for the sake of efficiency.
For example, imagine a million rows in this table, in terms of storage, you do not want to store many duplicate values.
notion image
Therefore, you can have a multiple dims to drill down, forming a hierarchy.
notion image
Such that, the results look as follow:
notion image
 

Summary and Comparison between non-hierarchies and hierarchies

notion image
notion image

Determinant Dimensions

Note: here we note .

What is Determinant Dimensions?

  • A Determinant Dimension (or a Determinant Attribute) must be used in retrieving the fact, in order to make the retrieved data more meaningful.
  • In other words, you MUST always add this dim to perform any meaningful calculation on the fact measure.
  • And we should say that it is critical to use the Determinant Attribute in the query, either as a filtering mechanism (in the WHERE clause) or in the display (in the SELECT clause) in the SQL command.
 

Case study (Lecture Activity): Petrol - Determinant Dimension

Problem: Analysis will be UNREASONABLE if reporting without petrol type.

notion image
notion image
notion image

Solution: Add a MUST-ADD dimension, in which we call this as determinant dim.

1. 笨方法:#category * #measure
1. 笨方法:#category * #measure
2. Determinant dim
2. Determinant dim

Determinant vs. Non-Determinant Dimensions

Easy as it seems, sometimes we can choose not to include a dominant dimension, for the two reasons:
  • Complexity of values needed for fact measure in the table is low (e.g., min and max price in Petrol Case VS number of medal in Olympic).
  • Only has simple calculation like COUNT, instead of Avg.
We illustrate the necessity of including Det. Dim. through the case of Olympic, shown below:

Case Studies

Case study (Lecture): Olympic - Determinant vs. Non Determinant Dimensions

There are two versions of star schema: one with det dim and one without det dim.
V1. has more fact measure
V1. has more fact measure
notion image
notion image

Comparison between Determinant vs. Non-Determinant Dimensions

notion image

Determinant Dimensions vs. Pivoted Fact Table

Case study (Lab): PTE Academic Test - Determinant Dimensions vs. Pivoted Fact Table

notion image
notion image
notion image
notion image
notion image
Result for including a det. dim
notion image
Result for including a Pivoted Fact Table
notion image
Detailed procedure
notion image

Comparison between Determinant Dimensions vs. Pivoted Fact Table

notion image
 
 
FIT3003 - W4 - Multi-fact star schemaFIT3003 - W6 - Temporal data warehousing