DATA MARTS

Building the Right Data Mart

By Bryan LaPlante

Building a useful data mart requires knowing how to set up the data so that it answers your specific business questions.

Data marts are growing in popularity, for many reasons: They're faster and less expensive to build than data warehouses and can be used to provide focused information quickly. But building a data mart that will answer the right questions at the right time is more involved than installing a product such as Oracle Data Mart Suites on a Microsoft Windows NT server, dumping in all your data, and hoping you can create a query that brings you the information you need.

How do you construct a mart that works for your enterprise? Start by defining your business requirements--who needs answers to what questions--and build from there.

Defining Business Requirements

Take the fictional business ABC Junior Clothes as an example. ABC is a major retail chain with outlets organized into regions. A data mart for ABC needs to be set up with a region-to-outlet hierarchy dimension.

ABC wants to be able to drill down from a high-level view--sales by region--into increasingly finer levels of detail, such as sales by outlet. With this data, managers can determine if the sales of a given region are down and whether sales are off in general in the East or if a particular outlet is not doing well. Each outlet belongs to a particular region at any given time, the hierarchy for which is reflected in the outlet manager's report to regional managers. Thus, it's natural for ABC to consider outlets and regions as a hierarchy.

But ABC has a typical problem: Sales regions are frequently reorganized. The first step in defining the business requirements, then, is to interview the potential users of the data mart, particularly those interested in the Sales Region dimension, which encompasses the region-to-outlet hierarchy.

At ABC, each outlet manager wants to know his or her sales and contribution to regional sales. In addition, the managers want to see sales by date, product, promotion, and other dimensions. Regional vice presidents want to show their region's sales, compare them with all region sales, and compare the results with the prior regional sales for the same period. The corporate vice president of sales also wants to be able to see what the regional sales numbers would be under a proposed regional organization.

Beginning the Translation Process

The requirements for ABC's data mart boil down to four ways of viewing the region-to-outlet relationship:

The first three views are clear. The of-record dimension captures sales data from outlets and assigns it to the region in which the outlet belonged at the time the data was collected. Although the Kansas City outlet will move from the Mid-West region to the North Central region on February 1, 1998, its prior sales must continue to be credited to the East region.

If of-record mapping is not done correctly, a report of 1997 sales run prior to February 1, 1998, will show different East-region totals than a report run after February 1, 1998, which can cause much confusion and lead to distrust of the data mart's reliability.

Implementation

The four region requirements, or dimensions, are essentially four ways of slicing the sales data. Once you realize this, implementing the first three dimensions is easy.

This leaves the of-record problem. The crux of the problem is that a single outlet can map to more than one region over time. For example, Kansas City moved from the Mid-West region to North Central and the database has data from both before and after the change. The key is to remember the differences between dimensions and facts. A dimension attribute is one that seldom changes, whereas facts--for example, the price of an item or the number of items purchased--change frequently. Because the sales-region reorganizations are frequent (across the time span of years), the region to which the sale is credited is also a fact.

You also need a new outlet table, because the same outlet can occur in multiple regions. As you drill down from North Central, you want to find Kansas City, and from there its sales--after February 1, 1998. If you drill down from Mid-West, you will also find Kansas City and its sales only for dates prior to February 1, 1998. One of the ongoing maintenance chores is to purge old region-to-outlet mappings when the database no longer holds any data for a specific mapping.

So, in 2001, when you decide that 1997 sales records are no longer useful, you can remove the Mid-West/Kansas City record from the Outlet_of_Record table. Although you could do this automatically by using simple SQL to find and delete all outlet/region records from any outlet of record that has no sales, it can be risky. What happens if you run the purge two weeks before a new outlet opens? You've already added the outlet to the database, but it has no sales yet. To avoid deleting it, add a discovery-date field to each dimension-table row. This field is normally used only for database maintenance and records when a new region/outlet record is added. Not only is it handy for tracking database- maintenance bugs but it also solves your purge problem.

Add this restriction to avoid purging new outlets:

where discovery_date <= '31-DEC-96' 

The Business Advantage

This example shows how a simple region-to-outlet-to-sales hierarchy can become several hierarchies as a result of conflicting business requirements. If you tried to force a single hierarchy on the implementation, you couldn't answer all the business questions posed. But by thinking through the requirements, you can build a design you can implement in phases, satisfying all your users in the process.

Bryan LaPlante ([email protected]) is a senior consultant in management systems at PRAGMATEK Consulting Group Ltd., in Minneapolis, Minnesota. A specialist in data-mart and data-warehouse design, LaPlante contributed to the TPC-D decision-support benchmark and has designed, tuned, and implemented data warehouses and data marts for the last five years.


Copyright © 1994, 1995, 1996, 1997 & 1998 Oracle Corporation. All Rights Reserved.




This is a copy of an article published @ http://www.oramag.com/