Kimball and Ross laid out a systematic way to approach dimensional modeling. I will review the four key steps below:
Step 1: Select the Business Process
A business process is a low-level activity performed by an organization. This can be taking orders, processing claims or registering students. To identify these, look for the following common characteristics:
Step 1: Select the Business Process
A business process is a low-level activity performed by an organization. This can be taking orders, processing claims or registering students. To identify these, look for the following common characteristics:
- Expressed as action verbs
- Supported by an operational system
- Generate key performance indicators
- Triggered by an input and result in output metrics
Be careful, though, when interviewing the users in an organization, as they often tend to talk in terms of broad strategic business initiatives. These would not serve well for dimensional modeling purposes. Make sure to dig deeper and seek out the specific underlying business processes that support the analytic requirements of the initiatives.
Step 2: Declare the Grain
Declaring the grain means specifying exactly what an individual fact table row represents. This step is crucial in the modeling process, yet many professionals tend to overlook its importance. In the past, failing to declare the grain has been the most frequent mistake that professionals make. Although the grain is ultimately equivalent to the primary key, it should still be described using business terms for clarity. Below are a few examples:
- One row per individual boarding pass scanned at an airport gate
- One row per bank account each month
- One row per line item on a bill from a doctor
Step 3: Identify the Dimensions
Once the grain has been properly declared, the dimensions typically can easily be identified as they represent the “who, what, where, when, why, and how” associated with the event. A robust set of dimensions representing all possible descriptions should be identified. The following are some examples:
- Date
- Customer
- Employee
- Facility
Step 4: Identify the Facts
The facts are the performance metrics that business users are concerned about. These must be appropriately defined in accordance with the declared grain. If not, they should be placed in a different fact table. Usually, facts are numerical data, such as total cost or order quantity. However, in some case, they are not - we would call them "factless" facts.
The four-step process must take into account not only the data sources, but more importantly, the business requirements. Although gathering user input may involve more complexity than relying on the data alone, it is essential for a sound dimensional design.
Reference: "The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition" by Ralph Kimball and Margy Ross
No comments:
Post a Comment