type
Post
Created date
Oct 30, 2022 05:39 AM
category
Data Science
tags
Data Warehousing
status
Published
Language
English
From
School
summary
slug
password
Author
Jason Ching Yuen Siu
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

Chapter Summary

Active Data Warehousing is solely about interdependency between the operational database and the data warehouse itself, as well as among the star schemas of various levels of granularity in the data warehouse.
There are three types of interdependencies:
  1. Incremental updates, which is when new records are inserted into or old records are deleted from the operational database, they are immediately reflected in the star schema;
  1. Changes in the data warehouse, which is when the data warehouse creates a new star schema or deletes an existing star schema, as well as changes to the upper levels of star schemas when the lower-level star schemas change; and
  1. changes in the operational database, which is when the underlying operational database has changed and has an impact on the data warehouse.
Since interdependency can be quite complex in many cases, Active Data Warehousing is used only for applications that really need active data in the data warehouse. This includes the monitoring system which may rely on real-time data and not so much on historical data. In this case, an Active Data Warehousing is appropriate and useful.

Summary of all these changes in 3 tables

notion image
notion image
notion image
 

Passive Data Warehouse (一成不變)

The concepts learnt previously are based on Passive Data Warehouse; it has a few characteristics:

1. Star schemas are subject-oriented, including multi-fact star schema.

  • The purpose of the data warehouse is decided at the design state
  • They are generally designed for one specific purpose. Some examples are:
    • A Product Sales star schema is for the sole purpose of analysing product sales and nothing else.
    • A Computer Lab Activities star schema is for the sole purpose of analysing the use of computer labs by students.
    • A Hospital Admission star schema is for the sole purpose of analysing patient admissions to hospitals.

2. They are built based on historical data with a specific time range.

So, traditional DW is not only subject-oriented but also non-volatile and time-variant.
So, traditional DW is not only subject-oriented but also non-volatile and time-variant.
  • Non-volatile in a sense that the data in the data warehouse will never change
  • Time-variant means that it is for a specific range of time, which is a past time period.

ACTIVE DATA WAREHOUSE (一成可變)

Problem of Passive DW:

Passive DW can cater the general need, but there are applications where there is a need of monitoring sensors or real-time systems using current data.
Examples of continuous monitoring:
  • air traffic control,
  • the stock market, certain medical monitoring systems, etc.

Solution:

We need ACTIVE DATA WAREHOUSE. When the operation database is updated, the data warehouse is immediately updated.
In another words, Active DW is dynamic. And here we look into the how updates are impacted in different levels of DW.
  • Create viewmight be used in various levels of the star schema, which directly reflects any changes in the underlying tables (second part in the graph), as views are virtual tables based on the underlying physical tables. Any changes in the underlying physical tables will automatically be reflected in the virtual tables.
  • Additionally, certain database triggers can be used to automatically insert, update or delete the records in the data warehouse when an event (e.g. insert, update or delete) occurs in the underlying physical tables.
What is CREATE VIEW ?
A view is essentially a saved query, which you can query as if it were a table itself.
The data for a view is typically not stored at all - just the SQL query you used to define the view. When you write a query which selects from the view, the SQL for the view is effectively substituted-in dynamically to your main query.
So this:
create view v as (select a+b as total from some_table); select max(total) from v;
Is effectively the same as:
select max(total) from (select a+b as total from some_table);
Why are they useful? For a few reasons:
  • Convenience - if you have some often used SQL, you can save it as a view to avoid re-typing it each time
  • Clarity - giving a descriptive name to the view helps to document what it means, and what it is used for
  • Encapsulation - if the users of the database are querying via views, it might be possible to change the underlying table structures and then update the views so that they still work, and look the same as before
  • Security - you can grant different permissions for accessing the view, vs the underlying tables. So you could create a view which omits certain columns, or masks the data in some other way, and then grant people access to that

VIEW vs TABLE

A table is something that you can modify (update, delete, insert). A view is read only and usually presents related information (say, from two related tables).
 
notion image

Architecture of Active DW

Unlike Passive DW (top-down), ADW adopts a bottom-up approach, where the star schema is built from Level-0, and the upper levels (e.g. Level-1, Level-2, etc) are built on top of the immediate lower levels.

PK-FK constraints

Why do we need PK-FK in the operational databases?

  • We need this to maintain data integrity in the database because insert, update and delete of records are frequently performed and maintaining entity integrity and referential integrity is of the utmost importance.

Why we DONT ❌ need PK-FK in traditional DW

  • Because once the star schema is created and built, no records will be updated in the star schema (e.g. dimensions nor fact); there will be no insert as the data warehouse is passive and no delete.

Why we DO ✔️ need PK-FK in active DW

  • Active Data Warehousing is actively updated when there are changes in the operational database, the PK-FK constraint between dimensions and the fact becomes important to minimise data anomalies due to updates.
  • Therefore, the PK constraint must be enforced in the dimension and the PK-FK constraint in the fact.

Why do we prefer using CREATE TABLE to CREATE VIEW?

  • DIM: If using CREATE VIEW for creating the dimensions in Level-0, it is very difficult to assume that the primary keys have been implemented correctly and primary keys cannot be built on top of views. Therefore, it is advisable that dimensions in the Level-0 star schema are created using the create table statement, and not through create view.

How do you enforce a PK-FK constraint?

alter table DegreeDim add constraint DegreeDimPK primary key (DegreeCode);
 
notion image

1. INCREMENTAL UPDATES / INSTANT UPDATE

Three ways of incremental updates includes 1) Auto updates, 2) Expiry Date, and 3) Date warehouse rules changed.

Type 1 — Auto updates

Automatic updates refer to updates that occur INSTANTLY, once the operational database has changed.

Level 0

In creating tables for lv 0, one must consider:
  • Dim: can be created using CREATE VIEW (if the data is directly retrieving from op. database), or CREATE TABLE (but it needs a trigger to monitor instant insert)
  • Fact: table must be created using CREATE TABLE

Level 1

These rules applied to both fact and dim.
Rule 1: Since there are some aggregated measures in Level-1, we CANNOT use CREATE VIEW ; we need to use CREATE TABLE instead.
Rule 2: If the records are timeless (e.g., Semester start and end Date), there is no need to create a trigger, otherwise we MUST create a trigger.
A database trigger must be created to ensure that for every record inserted into the TempFact Level-1, it will increase the fact measure Number of Logins by one.
notion image
 
Rule 3: Same as level 0, we need to enforce PK-FK contraints.
 
 
 
alter table TempComputerLabFactLevel1 add (SemesterID varchar2(10))

Level 2

Quote from the textbook:
In Level-2, we basically remove the Student Dimension to lower the granularity of the fact measure Number of Logins. All other dimensions are reused. Fact Level-2 reuses Fact Level-1, where the aggregation of the fact measure is increased. There is no need to have a TempFact in Level-2 because we can create the fact in Level-2 by using create view from the fact in Level-1. As a result of this, there won’t be any need for a database trigger.
notion image

Type 2 — Expiry Date

Sometimes we need to remove the unneeded-obsolete data (because decision makers are interested in recent data), so DW can take consideration of “recentness” of the data. To achieve so, we can set an expiry date and remove the expired records from star schema.
Bear in mind that, the old data should NOT removed from the operational databases because the operational databases should not only support day-to-day operations but also act as an archival repository.

Level 0

As mentioned above, in level 0 can have two ways of creation: 1) CREATE VIEW (only when the schema without fact measure); and 2) CREATE TABLE. Here is how they can be implemented:
Implementation for CREATE VIEW
Implementation for CREATE VIEW
Implementation for CREATE TABLE
Implementation for CREATE TABLE

Level 1

To implemented the level 1 one, we must delete those expired records MANUALLY.
notion image

Level 2

Same as level 1.

Type 3 — Date warehouse rules changed

As time goes by, the data warehousing rules may have changed, so the data warehousing needs to adapt to the new rules.
notion image
notion image
 
 

2. DATA WAREHOUSING SCHEMA EVOLUTION

Type 1 — Changes propagating to next levels

Since Active DW is sensitive to changes of star schema which may propagate to the upper levels of the star schema. For example, if a Level-1 star schema changes its structure due to new requirements, the changes might be propagated to the Level-2 star schema and so on.
notion image
notion image

Level 0

Level 1

 
 
 

Type 2 — Changes not affecting the next levels

notion image
notion image

Level 0

Level 1

 

Type 3 — Inserting new star schema

notion image
notion image
 

Type 4 — Delete star schema

3. OPERATIONAL DATABASE EVOLUTION

As the operational database is changed, the data warehousing requirements may also change.

The operational databases which feed into the data warehouse may also evolve over time. This may then impact the data warehouse, as in Active Data Warehousing, updates or changes in the operational databases are reflected in the data warehouse in real time.
As the operational database is changed, the data warehousing requirements may also change. In this section, we are going to see how the changes in the operational database (at various degrees of severity) will change the data warehousing requirements from small changes to the table structure and changes to the entities and relationships to major changes to the operational database.
Some of the changes in the operational database may affect the Level-0 star schema only, while some will propagate to all star schemas in various levels of granularity. In the worst scenario, the entire data warehouse (e.g. all star schemas) will be overhauled, and the old data warehouse is fully decommissioned and replaced with the new one.
 

Type 1 — Changes in the table structure

notion image
notion image
There are two possible options to deal with this: 1) update the existing star schemas; 2) build a new data warehouse, since the star schema now has new requirements.
 

Type 2 — Changes in the ER diagram

notion image
notion image
 
 

Type 3 — Changes in the Op. database

Like ERP transition the old system is decommissioned and the new system is deployed. This would affect Op. database (e.g., Table names may change, table structures may change; the entire design may change.). Here is the case:
notion image
There are two possible options to deal with this: 1) update the existing star schemas; 2) build a new data warehouse, since the star schema now has new requirements.
 
FIT3003 - Unique SQL functions in Oracle DatabaseFIT3003 - W2 - Star Schema

Jason Siu
A warm welcome! I am a tech enthusiast, passionate about learning and self-discovery.
Statistics
Number of posts:
234