Building Effective Data Marts

By Greg Jones

A practical, step-by-step guide to developing and deploying the data mart that meets your business needs.

Now that data marts have been accepted as viable alternatives to data warehouses, you might decide to build one for your organization. After all, you'd like your company to have what is often considered the most effective new mechanism available for delivering fast, reliable decision-support capability. Data marts provide the quick return on investment demanded by the pace of modern business.

A data mart is very much like a data warehouse but is usually smaller in size and focused on a single purpose, such as sales, finance, or marketing. Because of its reduced size, departments of larger entities can create focused data marts from information in a larger data warehouse. Data marts also earn back their cost in a fraction of the time required by enterprise-scale data warehouses, since they take less time and money to build, which makes the payback period shorter. And since you can build data marts to be scalable, they can work well for organizations of any size and level of complexity.

Data Mart in Action

One company's data-mart project helped increase sales by several million dollars on one account by allowing the company to share key account information with the customer. With more than 40 distribution centers nationwide, Owens & Minor is the largest U.S. wholesale distributor of medical and surgical supplies. The company sells more than 250,000 products to hospitals, integrated health-care systems, alternative-care sites, and group-purchasing organizations. Its ability to share information with customers and suppliers was becoming increasingly important in sales negotiations. Internally, the firm couldn't track the results of sales initiatives effectively, and trend analysis was impossible because the historic data was inaccessible.

At first, Owens & Minor hired outside programmers to query its outsourced mainframe operations and write special reports. But the turnaround was slow and the process expensive, and the firm eventually abandoned that tactic. Next, the company hired Talus, Inc., to design and implement an integrated data mart on top of an existing Oracle database. The goal was to create a sales-specific system with timely information sharing with key customers and suppliers. Talus teamed with the company's end users and IS professionals to create a sales data mart in five months. The project was the first of several subject-area data marts designed for an integrated data warehouse with more than 300 users.

Don Stoller, director of Decision Services for Owens & Minor, explains, "We wanted to bring in the people with the experience and transfer their knowledge. We met those objectives . . . staying within our budget and delivering on schedule."

In recognition of its solid performance, the project team received the Data Warehousing Institute's 1997 Best Practices Award. Owens & Minor's newly acquired ability to share key information with customers led directly to a $44 million increase in sales from one hospital group alone. For the first time, the company was able to give customers and suppliers timely information.

The Rules of the Game

There are two immutable laws of data-mart development: the law of loading complexity and the law of design uncertainty. Their effects are felt on all data-mart projects, regardless of scope, complexity, or cost. The key to success is taking these laws into account when planning and executing a project.

The Law of Loading Complexity The most complex part of building a data mart is loading the data. The process comprises three steps: extraction from operational (legacy) systems; transformation and aggregation; and database loading.

Among the factors that contribute to the complexity of data loading are the following:

When one or more of these factors comes into play, the complicated nature of data loading is compounded. The problems of data loading become worse, because their complexity is hidden. Only after the process is under way does the magnitude of the undertaking reveal itself. Given all this complexity, it's not hard to see why the loading process is the most difficult, time-consuming, and risky part of building data marts. It is a common cause of project failure and is very often overlooked until it's too late. See Break the Law, Pay the Price, I.

You can avoid problems by first planning your project and recognizing that moving data from legacy systems to the data mart will be complex and time-consuming. Next, organize the project so that loading starts as early in the life cycle as possible. This means moving the loading tasks from later stages to near the beginning of the project. As a result, risk identification will occur early and you will have more time to deal with the problems that arise.

The Law of Design Uncertainty It is impossible to effectively evaluate any data-mart design until you load legacy data and show it to users. Designing databases for data marts is fundamentally different from designing them for operational systems. Operational database design is about capturing business requirements, and the design process is considered a success if the design accurately represents these requirements. But for data marts, the design process can be considered successful only if the database design accurately reflects user and procedural requirements and can accommodate data from one or more legacy systems. You can be sure the design meets both criteria only if you load actual source data and have end users evaluate queries and reports based on that data.

Many data architects familiar with entity-relationship modeling find the Law of Design Uncertainty hard to accept. They have come to believe that models alone can ensure an appropriate database design. For operational databases, this assumption is often valid. However, database designs for decision-support systems such as data marts must strike a balance between user requirements and legacy data.

The law of design uncertainty affects all projects (see Break the Law, Pay the Price, II). The only way to be confident of your data-mart design is to create a feedback cycle early in the project. This feedback cycle uses end users to evaluate prototypes built with live data in an iterative process that continues through much of the project. In this way, the project team can monitor user acceptance and discover design flaws early on.

Taking on the Challenge

One way to help ensure success from the beginning is to use a proven methodology to structure your project. Talus calls its methodology PrimeMover. It is designed to address the issues that cause most data-mart projects to fail. PrimeMover should be viewed only as a starting point, however, not as a prescription that guarantees success or demands unthinking conformance. In all likelihood, the individual needs of your organization will dictate how some of the project steps are organized. Here are some of the benefits you can derive from using such a method:

Three-Stage Process

PrimeMover breaks down to three stages: planning, development, and deployment. Planning results in the delivery of a project-definition document and a detailed project plan. Development delivers a prototype and a revised plan for executing the deployment stage. Deployment delivers a production data mart and documentation. Each stage is created with the two laws of building data marts in mind, and each is designed to help keep your project ahead of problems that may arise.

Stage 1: Planning

The planning stage establishes the scope and business objectives for the data mart. The team reaches agreement on the data-mart subject area and the extent of the star schema. The plan clearly establishes a realistic project schedule, budget, staffing resources, assumptions, and constraints. A risk-mitigation plan anticipates potential risks and outlines specific measures to ensure project success.

Two major deliverables--the project overview document and the project plan--are created in the planning stage. Each of these details a clearly defined scope, an overall schedule of the project activities, the project organization, and clearly defined project-control procedures. The project plan includes a detailed schedule of activities, assignment of activities, budget, and planning steps. Here are the steps:

  1. Determine the scope of the data mart, including the business subject area, number of star schemas, and key business objectives. Identify staff requirements, project assumptions, and time and resource constraints. Identify risks and corresponding mitigation measures.

  2. Identify the level of effort necessary to create the data mart, and develop a budget accordingly. Develop a detailed schedule of activities for each development step. Document all schedule and budget assumptions.

  3. In the final planning, your team must reach an agreement on the project scope, schedule, and budget. Assign staff resources specifically by name and responsibility. Establish lines of communication, develop procedures for project changes, and deliver the project plan.

Stage 2: Development

Using an iterative design-and-build approach, the development stage produces a working prototype of the data mart loaded with actual source data. This is the time to define key business objectives, by interviewing managers, analysts, and database administrators to find out their information needs. The project team then builds a prototype and loads it with legacy data. Users query the prototype system to verify that the data mart meets their requirements. Using this approach, the team discovers early on any problems associated with loading complexity and improves the design to balance user requirements with source-data availability and quality. The team develops the prototype to a level sufficient for users and developers to agree on the final data mart. The development stage generally takes 12 to 16 weeks.

The development stage creates two principal deliverables: the data-mart prototype and a revised plan for executing the deployment stage. Data-mart prototype deliverables include a database loaded with live operational data; extract and load scripts/sessions; installation and configuration of the query environment; and design documents, including star schema and metadata. The project plan for deployment-stage prototypes includes a detailed schedule of activities, assignment of activities, budget, and an agreed-upon final scope.

  1. Determine the number of data-mart iterations and the dependencies between iterations. Define prototype exit criteria to avoid runaway prototyping. Assign objectives, standards, and milestones for each iteration. Determine what data will be analyzed and loaded for each iteration and how the results will be presented to users.

  2. Interview managers and business analysts in the assigned subject area to determine their requirements for decision-support data. Record the business questions that the data mart must answer. Define the overall technical architecture and performance requirements for the data mart.

  3. Interview database administrators and other individuals who have knowledge of the source system data and structures. Analyze the data to identify quality problems and areas of loading complexity. Locate areas where data is missing, to determine whether the source data can support the users' business requirements. Record detailed metadata about the sources, data quality, and business rules.

  4. Based on the team's understanding of the business requirements and the source-data availability, design a star schema for the data-mart subject area, including any agreed-upon aggregates. Develop transformation logic for mapping source data into the target data-mart schema. Design the end-user query environment, including any required canned reports. Verify the design in an iterative process with managers, analysts, and database administrators.

  5. Extract data from the source system. Load actual production data into the target tables to check conformance with the design and requirements. Identify any additional source-data deficiencies, by iteratively loading data and ensuring that the correct results are achieved. Develop the queries and reports designed in the previous step. Record detailed metadata about the source and target data elements, source-to-target table mappings, and data-cleansing steps applied to each field.

  6. Present the prototype to business analysts, managers, and database administrators in interactive sessions, using a query tool to present the actual loaded data. Validate that the data mapping and load are correct, by ensuring that the data loaded meets the users' business objectives. Determine how well the loaded data supports the data-mart design. Identify areas where the design does not match user requirements and expectations (within the subject-area scope), and determine how to modify the design for the next prototype iteration. Review the exit criteria for the iteration, and determine whether another iteration is needed. Reconfirm the project scope to ensure that the project is on track.

  7. Assess the status of previously identified or new risks, and evaluate risk-mitigation measures. Assess whether the development stage has met the project objectives, and determine the next-stage activities. Evaluate the level of effort necessary for finalizing the prototype into an operational data mart. Identify any corrective measures and changes to the project plan that may be necessary.

Stage 3: Deployment

During the deployment stage, the team engineers and deploys the data mart by building on the prototype. The team implements data-loading, change-data-capture, slowly-changing-dimension (records that change infrequently over time, such as a customer's marital status), and quality-assurance procedures. It trains users to successfully navigate the data mart, by using a query tool. By the time the data mart is moved into the production environment and responsibility is handed off to the production support staff, the project team will have thoroughly tested the prototype and tuned it for optimal performance. The team establishes procedures for monitoring usage and collecting query profile data.

The deployment stage creates two principal deliverables: a production data mart and documentation.

Deliverables for the production data mart include a fully populated database optimized for performance, production extract and load scripts/sessions, a production end-user query environment, and production reports. The documentation deliverables include system documentation, user documentation, final design documents (including star schema and metadata), and initial training of users and support staff.

  1. Building on the prototype, determine the activities necessary to finalize the prototype into an operational data mart. Develop a strategy for identifying and handling slowly changing dimensions. Specify detailed steps and time lines. Assign resources, staff roles, and responsibilities.

  2. Finalize the prototype design and the extract, transformation, and load scripts. Implement the strategy for identifying and handling slowly changing dimensions. Establish change-capture procedures, and schedule load sessions. Finalize the query-tool environment and reports. Conduct performance optimization and tuning of the data-mart components. Build indexes to improve query performance.

  3. Using the business questions identified during requirements analysis, develop and run test cases against the loaded data to ensure that it answers business questions correctly. Ensure that the data mart is aggregating data correctly, and establish procedures for assuring the quality of each data load.

  4. Using the development metadata repository, prepare system documentation. Write user documentation, including information about the data-mart subject area and objects available for query. Record end-user metadata in the data-mart repository.

  5. Determine procedures for system administration and management. Establish data-mart help-desk and end-user support procedures. Create backup-and-recovery plans. Develop a training plan, and conduct user training.

  6. Perform acceptance testing, evaluate the results, and take corrective action. Perform quality-assurance spot checks on the loaded data and the extract and load programs/scripts. Formally test all predefined reports. Perform load testing on the database.

  7. Load the data mart, and implement the schedule for regular loads. Install a query-tool server, repository, and all clients. Conduct a final check to ensure that all hardware and software is correctly installed and that the system is fully functional. Hand over system responsibilities to the data-mart support staff. Initiate live operations of the data mart.

  8. Evaluate the development process, and identify process improvements based on lessons learned. Choose candidates for the next data-mart subject area, and identify the issues associated with making new data marts conform to existing ones. Estimate the time frame and high-level goals for initiating the next data-mart development cycle.

Within Your Reach

As the PrimeMover method shows, effective data-mart development and deployment are possible. This system deals with complexities that can cause your project team to fail, including complications in loading legacy data and dealing with design uncertainty. Now that you have a step-by-step basis for implementation, you can design a data mart for your company that promises rapid deployment of decision-support capability with the quick return on investment demanded by the pace of modern business.

As the data-warehousing-practice manager for Talus, Inc., Greg Jones ([email protected]) spends his time designing and building data marts for clients throughout the United States.


Copyright © 1994, 1995, 1996, 1997 & 1998 Oracle Corporation. All Rights Reserved.





This is a copy of an article published @ http://www.oramag.com/