Thanks to the pioneering work of Ralph Kimball, slowly changing dimension (SCDs) have become a cornerstore of dimensional modeling. The most basic type of versioning is to simply update (overwrite) the fields that change—the so-called Type 1 SCD. Turns out that Type 1 SCDs do not work very well for many important business decisions. This is illustrated by the query “show me monthly revenue by customer and sales rep”. Sales rep assignments can change throughout the year. So directly updating the sales rep field in the customer dimension table creates a sticky problem. Why? The current sale reps would get credit for the efforts of her predecessors. Not a happy situation at commission time!
“Effective” is what the business says it is—not what the data warehouse thinks it is
The Type 2 SCD attempts to remedy this situation by creating a “current version” row in the customer dimension table when an important attribute changes like sales rep. New sales transactions are stamped with the key of the customer dimension version currently in effect. Now sales facts can be associated with the correct salesperson to produce an accurate picture of who sold what to whom, and when. Everyone is happy with their commission checks, right? Not so fast. Type 2 SCD gets you much closer to reality, but not 100%. Consider what happens if the sales rep is updated in the customer dimension several days after the change actually occurred? You would have the transactions tagged to the wrong sales rep during the period between the “business effective date” and the “data warehouse effective date”.
Type 2 SCD’s are sufficient for many business reporting scenarios. But when dealing with commission checks, almost is not good enough. Backdating (and forward-dating in some cases) is a fact of business life, and we must deal with it. The EDW must somehow align the late arriving dimension or fact changes.
BizEff To The Rescue
One way to address the back-dating problem is through “BizEff” (business effective dating) dimensions. BizEff dimensions are useful for business processes that have inherent delays in processing dimension updates. BizEff reporting depends on properly formed dimensions. Typically, only a small subset of “trigger” attributes need to be effective-dated. If the dimension is wide (many columns) and deep (many rows), then segregating the effective-dated columns into a separate BizEff table may improve ETL and query performance. If the dimension is small, or the effective-dated column count is low relative to the total columns in the dimension, then the BizEff dimension should be modeled as a single table.
To illustrate, let’s assume a customer dimension with 50 columns, of which 2 need to be effective-dated:
- Customer Status (e.g., prospect, new, tenure, inactive)
- Sales Rep (e.g., Sandy Smith)
We will model this dimension with two tables as illustrated in Figure 1. The dim_customer dimension table contains 47 columns–the customer key and all the other columns except the effective-dated columns. The dim_customer_eff table contains 7 columns–the customer key, the effective date, the expiration date, the current row indicator, and the 3 effective-dated customer attributes.
The natural key of the base customer table is the Customer ID. The natural key of the BizEff table is the customer ID and the effective date. In both tables, the customer_id is replaced with an integer surrogate key. The date key could also be replaced with a surrogate integer key. Review Ralph Kimball’s definitive article on the rationale for surrogate keys.
The versioned BizEff rows represent contiguous and non-overlapping effectivity periods defined by the business effective and expiration dates. The effective date is provided the source system as entered by users. The ELT application must derive the expiration dates for distinct sets of the effective-dated columns. The expiration date is 1 less than the effective date of the the next version. The current version has a high value expiration date (e.g. 9999-12-31) as well as a TRUE value for the current row indicator. A change to any effective-dated column will trigger a new row in the BizEff table. Note that if a particular column value does not change, that value is rolled forward to the next effective-dated row.
If the number of effective-dated trigger columns is large relatively to the total number of columns in the dimension, or if the base dimension is relatively small, then there is not much advantage in splitting the base and effective-dated columns into separate tables. Using a single BizEff dimension table will save a join when constructing the BizEff views. There is no fixed rule about when to split a BizEff dimension into two tables. You need to evaluate the capacity and performance implications for your particular situation.
Now that we have a properly formed BizEff dimension, how do we query against it? The trick is creating a BizEff view that conceals complexity from the client tool. In the case of the 2-table BizEff customer dimension, the view joins three tables–the base customer dimension, the BizEff table and the date dimension (Figure 2). The BizEff view returns a row for every date for which a distinct set of customer attributes is effective. If the customer status for Suzanne Jones is Prospect from January 10 to February 12, the BizEff table would contain one row for that effective period and BizEff view would return 33 rows.
The beauty of the BizEff view is that it can support both “AsOf” and “AsWas” queries.
- AsOf queries group data according to the dimensional context in effect “as of” a user-supplied date. The default AsOf date is normally the current date.
- AsWas queries group data according to the dimensional context in effect at the time the fact transaction occured.
BizEff queries are useful for companies that rely on the EDW for operational reporting. This is particularly true for business processes that have inherent delays in processing demension updates.
Best of Both Worlds — BizEff2
Just when you think you have satisfied the business users with an elegant BizEff solution, out comes a new requirement that sends you back to the drawing boards. The accountants want to report how the transactions were originally recorded based on the incorrect dimension values–thank you very much Mssrs. Sarbanes and Oxley! This requirement is met by including the standard Type2 SCD version key in the BizEff dimension table and in the fact table (Figure 3). The originally recorded “as was” situation can be be recreated by joining the dimension table to the fact table using the the Type2 version key.
The performance implication of BizEff must be considered separately for ETL (back-room) and query (front-room) processing. Traditionally, the brute force way to ETL late arriving dimension changes is to update the dimension keys for all affected fact transactions. This approach approach works fine in data marts with relative small fact tables, but is not very scalable for rapidly growing fact tables with hundreds of millions of rows. The BizEff approach is more efficient for ETL because it operates only on the smaller dimension tables. Even very large versioned dimension tables are small in comparison to transaction grain fact tables.
In general, BizEff queries perform somewhat slower than the standard Type2 queries because the the query uses expression joins to identify the correct dimension row. Expression joins using “less than” or “greater than” operands are slower than the equi-joins used for standard Type2 queries. The performance hit for AsOf queries is relatively light because the join of the static AsOf date is performed only once at the dimension level. AsWas BizEff queries will perform somewhat slower than AsOf queries because the expression join must be applied to every fact row.
With the performance of today’s MPP data warehouses, BizEff queries are a viable alternative to the brute force fact update approach. Early performance testing will confirm the viability of BizEff for your particular situation.
So the next time your users cry out for effective dated reporting, just tell them to them to go BizEff!