Are you part of the central data warehouse team in your large organization? Do you also work with remote divisional teams implementing far-flung data marts? Do you wonder what your role should be vs. the divisional teams' role? If these questions are familiar to you, you may be frustrated because you feel you have an important responsibility in your central role, yet you may not actually own anything. Even worse, perhaps the divisional teams are implementing their own data marts and are charging ahead with their own technologies, vendor selections, and logical and physical database designs.
If you are part of the central data warehouse team, you need to play an active role in defining and controlling your enterprise data warehouse. You should not be content to just be a liaison, a recommender, or some kind of ill-defined "glue" that holds projects together. You need to be a leader and a provider, and you must be very visible to the separate divisional teams implementing their own data marts.
Conversely, if you are a part of a divisional data mart team, you should expect the central team to provide you some very valuable services and to impose some noticeable structure and discipline on your effort. You are a part of an overall enterprise, and you need to conform to these structures and disciplines so that the data from your division can be usefully combined with the data from other divisions.
Defining and publishing the corporate dimensions. If your organization has multiple lines of business, and if you have any interest in combining these lines of business into an overall framework, you need to identify four or five dimensions that are common to the multiple lines of business. The most obvious corporate dimensions are Customer, Product, Geography, and Time.
The Customer dimension has one record for each one of your customers. Although this seems obvious, the crucial step is to make sure that each divisional data mart uses the same single Customer dimension wherever there is a reference to "customer." Each division will therefore see the customer the same way. Individual divisions will be able to see the relationship other divisions have with the customer, and the enterprise will, perhaps for the first time, be able to see the whole customer relationship.
The requirement that each division must always use the corporate Customer dimension is a strong requirement that will impact everyone. From the central team's perspective, the corporate Customer dimension must embrace the complete range of possible customers. The key for corporate Customer almost certainly will need to be an artificial customer number defined and created by the central data warehouse team. No individual divisional customer number will be administered and controlled in a way that serves the needs of the entire enterprise. The descriptive attributes for Customer will need to be a broad set that includes all of the desired descriptors for all of the divisional groups. Hopefully the central data warehouse team can meet with the divisional groups and try to compress and standardize the various descriptors into a reasonable working set, but in the long run there is no absolute need to force any individual group to give up its idiosyncratic customer descriptors. Because dimension tables are generally much smaller than fact tables in a dimensional data warehouse design, there is room for multiple sets of customer descriptors in the same Customer dimension record. In banks, I have often seen the master customer record contain 200 descriptive attributes. If you have been reading my column, you know that I advocate making this customer table a single flat, denormalized table in order to improve user understandability and allow the exciting new bit-vector indexing technologies to speed lookup. Oracle, Sybase, and Red Brick all have bit-vector indexes that thrive on fat, wide, denormalized Customer dimension tables.
The need to standardize on a corporate definition of Customer is a very important central team responsibility. This need will arise whenever management wants to look at overall customer relationships, and it will arise whenever your enterprise makes an acquisition. The good news is that you just acquired your biggest competitor. The bad news is that you must now merge the separate Customer dimensions.
Often there will be no individual division willing or able to provide the corporate Customer dimension. Yet frequently these same divisions will be eager to receive such a Customer dimension from you, the central team. Even the production OLTP systems in the division may be willing to upload the central data warehouse's Customer dimension if it contains cleaned and correct customer addresses. In these cases, the central data warehouse team becomes the creator and provider of the master enterprise Customer file.
The failure to enforce the use of a corporate Customer dimension is a very serious lapse and therefore must be addressed by the central data warehouse team. As I have pointed out in my columns, the ability to "drill across" separate data marts can only be accomplished if all of the shared dimensions in the separate data marts are conformed. In other words, if two data marts have the same dimension, such as Customer, these two Customer dimensions must either be exactly the same file, bit for bit, or one must be a perfect subset of the other. In particular, the descriptive attributes within these shared dimensions must be defined and populated in exactly the same way. Allowing two Customer dimensions in different data marts to drift apart means that the two data marts cannot be used together. Ever.
The Product dimension has one record for each one of your products or services. As the central data warehouse team, you must be the judge as to whether your individual divisions have enough in common to warrant building a corporate Product dimension. In a tightly integrated business, such as the multiple geographic divisions of a big retailer, it seems obvious that a single master Product dimension is called for, even if there are significant regional variations in the products sold. In such a case, you probably already have a production group that works constantly to define this product master and download its descriptions and price points to the cash registers of the individual stores. The central data warehouse team can easily work with this production group to create a proper Product dimension for all of the data marts. In a financial services organization, such as a bank or an insurance company, there is a lot of interest in a "core" view of the business that groups all of the products and services into a single hierarchical framework. But there will be lots of special descriptive attributes that only make sense for a single division, such as mortgage loans or credit card accounts. In this case, the best approach is to create a core Product dimension containing only common attributes, and a set of custom Product dimensions that are different for each division. This "heterogeneous product" design is discussed extensively in my book The Data Warehouse Toolkit: How to Design Dimensional Data Warehouses (Wiley, 1996). Finally, if your enterprise is so loosely integrated that your products have almost nothing in common, and if your management has never bothered to define a set of hierarchies that group all of the different divisional products into a common framework, then you may not need a corporate Product dimension. Just remember, if you don't create a single corporate Product dimension, you will not be able to combine the separate divisional data marts together.
The Geography dimension has one record for each one of your districts, regions, or zones. If your individual divisions have incompatible sales geographies, then the separate division data marts can only be used together at the highest common geographical aggregate. If you are lucky, this highest geographical aggregate will be something useful, such as state.
The Time dimension has one record for each calendar time period. Hopefully all of your divisions operate on the same calendar and report on the same fiscal periods. If at all possible, all reporting should be done either at an individual daily time grain or at the fiscal period time grain such as month. In this case, days always roll up to the fiscal periods, and the fiscal periods roll up to years. It becomes a monumental headache if the separate divisions have incompatible reporting calendars. Fortunately, the central data warehouse team has a strong ally in the corporate financial reporting group who hopefully is in the process of standardizing the reporting calendars of the divisions. Separate databases denominated in weeks and months should be avoided at all costs because these time dimensions cannot usefully be conformed, and the week databases will always be isolated from the month databases.
Providing Cross-Divisional Applications. The central data warehouse team is in a unique position of being able to provide cross-divisional applications. A "value chain" of data marts can be assembled from each of the separate divisions. In my March 1996 DBMS column (see "Drilling Down, Up, and Across," page 14), I showed six data marts for a clothing store retailer. These data marts may have been built at different times, but they share a number of common dimensions. From the previous discussion, it should be clear that the central team should define and enforce the use of these dimensions.
With the proper tools, it is fairly easy to build cross-divisional applications in these environments that are capable of generating multipass SQL. For instance, both MicroStrategy's DSS Agent and Hewlett-Packard's Intelligent Warehouse give the user a flat view of all of these separate data marts and let the user choose facts and measures from these data marts without worrying about the details of fitting the data together. I discussed multipass SQL in some detail in my February 1997 DBMS column (see "Features for Query Tools," page 12).
The central team's responsibility is to provide a reporting tool that generates multipass SQL and to enforce the use of common dimensions.
Defining a Consistent Data Warehouse Security Architecture. In my April 1997 DBMS column (see "Hackers, Crackers, and Spooks," page 14), I discussed the components of a security architecture for a modern data warehouse. The point of that column was to alert the data warehouse team that they must play a very proactive role in understanding and defining security. The data warehouse team must include a new member: the security architect.
The security architect for the data warehouse should define and enforce a data warehouse security plan for the enterprise and all of the individual data marts; define consistent user privilege profiles; identify all possible access points in the corporate network, including every modem; implement a single logon protocol so that a user is authenticated once for the whole network; track and analyze all attempts to override security or gain unauthorized access to the warehouse; implement a link-encryption scheme, or its equivalent, for remote users; implement a remote user-authentication scheme, such as a hardware card reader at each PC, that is more reliable than typed user passwords; and educate the divisional teams about security issues and consistent administration of the data marts.