If you currently operate or plan to build and deploy a data mart or data warehouse, you'll be glad to hear there are new features in Oracle8 designed specifically to help you accomplish these tasks. Without question, data marts and data warehouses are important tools that businesses can use to analyze data, detect trends, and make better business decisions for future success. This article provides a technical introduction to the features in Oracle8 that can help you build and maintain a data warehouse, as well as applications that fully utilize all of the data warehouse's capabilities.
A data warehouse can present many unique challenges, including situations that are not inherent in more-familiar transactional or operational databases. Whereas online transaction processing (OLTP) systems store information about business functions and allow users to execute transactions that insert, update, and delete information, a data warehouse is a database designed specifically for the detailed analysis of data. The creation, ongoing maintenance, and performance of a data warehouse are all major areas of functionality for which a database-management system must have special features if the data warehouse is to serve an organization's business needs adequately.
Most data warehouses are built by loading existing data from a transactional database or databases. To keep a data warehouse current, data loads may be frequent, depending on the type of information. And when a data load is large, it can take a lot of time to complete, especially if the database server managing the data warehouse is not up to the task. (For more information, see Building Effective Data Marts.)
Oracle8 has several techniques you can use to load data into a data warehouse. The SQL*Loader data-loading utility, free with Oracle Data Server, is one of the best choices, because it has many features that enable efficient data loads. SQL*Loader's direct-path load feature allows you to load data into a table efficiently by building data blocks instead of processing arrays of INSERT statements through the SQL layer. The following SQL*Loader statement loads data according to the ORD1.CTL control file, using the direct-path load option:
SQLLDR USERID=sales/sales CONTROL=c:\data\load1.ctl DIRECT=TRUE
SQL*Loader supports three loading options: INSERT puts new rows into an empty table, APPEND adds rows to the end of an existing table, and REPLACE substitutes new rows for the table's existing rows. SQL*Loader lets you load data in parallel into a single table, using multiple concurrent user sessions, as long as you are only adding rows. The following statements, started by different sessions or as separate background jobs, will append data into the same table:
SQLLDR USERID=sales/sales CONTROL=c:\data\ load1.ctl DIRECT=TRUE PARALLEL=TRUE SQLLDR USERID=sales/sales CONTROL=c:\data\ load2.ctl DIRECT=TRUE PARALLEL=TRUE SQLLDR USERID=sales/sales CONTROL=c:\data\ load3.ctl DIRECT=TRUE PARALLEL=TRUE
Another common data-loading technique for a data-warehouse table is to use simple SQL statements. For example, if you have a distributed database system and your warehouse has a database link to the operational database, you can load data into the data warehouse from the operational database by using simple INSERT INTO... AS SELECT ... or CREATE TABLE ... AS SELECT ... statements. To speed up these types of loads, Oracle8 can execute these statements in parallel. In the case of an INSERT INTO ... AS SELECT ... statement, Oracle8 can use parallel processing to execute both the subquery and the insert portions of the statement. (To do this, however, you must configure your database instance with parallel-server processes and the database or the tables referenced by the statement must have parallel-processing defaults for parallel processing to happen transparently.)
You can dramatically increase the speed of INSERT INTO ... AS SELECT ... statements by using Oracle8's new APPEND optimizer hint (an optimizer directive), which instructs the database to append a new row in the first unused block of a table's last extent rather than incur the overhead of searching for available space in a used block of the table. The following example demonstrates how to use the APPEND hint with PARALLEL hints (indicated in bold) to override the default parallel-processing settings for the tables referenced by the different parts of an INSERT INTO ... AS SELECT ... statement:
INSERT --+APPEND PARALLEL(customers,2) INTO customers SELECT --+PARALLEL(temp_cust,2) * FROM temp_cust;
To reduce the time needed to complete a large data load into a table, you can disable the table's integrity constraints (and triggers) and drop all the table's indexes before starting the load operation. Then, after you have loaded the data, you can enable the table's constraints (and triggers) and re-create other indexes. This sounds rather straightforward, but if you are working with extremely large tables, the creation and re-creation of indexes can be extremely time-consuming. Fortunately, Oracle8 has a few features that reduce the completion time for large index builds.
Building indexes for large tables can take a long time as the database server fetches and sorts table data and then builds the index. You can take advantage of parallel processing to build and rebuild large indexes more quickly.
You can disable the logging of index creation in the database's transaction log to further reduce the processing overhead of creating indexes for large tables. However, the trade-off is that indexes created without logging are not recoverable from subsequent media failures; if a media failure occurs, you must re-create the index. The following statement enables the PRIMARY KEY of the ORDERS table without logging the implicit unique index creation:
ALTER TABLE customers ENABLE PRIMARY KEY USING INDEX NOLOGGING;
The following statement builds a bitmap index for the GENDER column of the CUSTOMERS table, using parallel processing. The index is also built without logging, to further increase the performance of the index build:
CREATE BITMAP INDEX cust_gender ON customers (gender) PARALLEL (DEGREE 3) NOLOGGING;
Rebuilding problem indexes or unbalanced indexes is another common operation in a data warehouse. Oracle8 also lets you rebuild indexes by using parallel processing and without logging, as shown in this example:
ALTER INDEX sys_c001860 REBUILD PARALLEL (DEGREE 3) NOLOGGING;
When possible, Oracle8 automatically uses the existing index to rebuild the index rather than using the table itself. This transparent enhancement can increase the speed of an index rebuild, because all the data necessary to build the index is present in a much smaller number of data blocks.
Oracle's SQL statement optimizer can do its job well only when the data dictionary contains up-to-date statistics that accurately reflect the data in tables and indexes. If you constantly load data into a data warehouse, it is important to frequently generate and refresh the statistics for the tables and indexes by using the SQL command ANALYZE. To minimize the amount of time it takes to analyze large tables and corresponding indexes, Oracle8 uses parallel processing to execute an ANALYZE statement. No special command syntax is required to take advantage of this optimization; you need only ensure that the database, table, or index being analyzed is configured with a default degree of parallelism.
Data warehouses typically contain extremely large tables and corresponding indexes. To ease the management of large tables and indexes, Oracle8 introduces a new feature called data partitioning. The purpose of data partitioning is to divide a table's storage into smaller, more manageable pieces called partitions that you can create in different tablespaces in the database. If you are not familiar with data partitioning and its benefits, see "Divide and Conquer," in the May/June 1997 issue of Oracle Magazine.
A common use for data partitioning in a data warehouse is for managing historical tables that contain a revolving set of records. In this scenario, as you load new records into a table, you also purge the oldest records. By doing so, you can control the overall size of the table and keep only the useful records. A typical data warehouse for a business might contain a large ORDERS table with historical information about the company's sales over the last 12 months. As you add the most recently completed month's orders to the table, you also delete the oldest month's orders.
Although it might seem straightforward, managing large historical tables such as an ORDERS table can present problems. For example, the availability of the entire table can be adversely affected as you load or purge older records from the table. Data partitioning is an elegant solution to help manage this situation. The strategy is to partition the table by a date range, creating a new partition at each interval and loading new records into the partition while dropping the partition that contains the oldest records.
Oracle8 allows you to range-partition a table's rows by the values in a key made up of 1 to 16 columns. The following example shows how to partition the ORDERS table by the ORDER_DATE column. Each partition in the example contains a particular month's orders in ascending order.
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
cust_id INTEGER REFERENCES customers,
order_date DATE,
ship_date DATE,
paid_date DATE )
PARTITION BY RANGE ( order_date )
( PARTITION feb98
VALUES LESS THAN (TO_DATE('01-03-1998','DD-MM-YYYY'))
TABLESPACE tbs1,
PARTITION mar98
VALUES LESS THAN (TO_DATE('01-04-1998','DD-MM-YYYY'))
TABLESPACE tbs2,
PARTITION apr98
VALUES LESS THAN (TO_DATE('01-05-1998','DD-MM-YYYY'))
TABLESPACE tbs3 )
When it's time to add May's orders, add a new partition to the table with the following ALTER TABLE statement:
ALTER TABLE orders
ADD PARTITION may98
VALUES LESS THAN (TO_DATE('01-06-1998','DD-MM-YYYY'))
TABLESPACE tbs4
After you load data into a new partition, update the optimizer statistics in the table's data dictionary. Rather than analyze the entire table, you can use the PARTITION clause of the ANALYZE command to target the new partition:
ANALYZE TABLE orders PARTITION (may98) COMPUTE STATISTICS
You can continue adding partitions each month. You don't have to worry about running into a limit of partitions per table, because Oracle8 lets you create as many as 63,999 partitions per table or index. When it's time to drop the oldest records in the table, you can simply drop the partition that contains them. The following ALTER TABLE statement drops the oldest partition from the ORDERS table:
ALTER TABLE orders
DROP PARTITION feb98
To improve the performance of data-analysis queries, data warehouses typically use a schema design known as a star schema. A star schema stores data by using one or more large "fact" tables surrounded by several smaller lookup or "dimension" tables. A star schema for a typical order-entry application might have traditional CUSTOMERS, PARTS, EMPLOYEES, and SALES_PERSONS as fact tables and a dimension table such as the following SALES table:
-- DIMENSION TABLE CREATE TABLE sales ( customer_id INTEGER, part_id INTEGER, order_date DATE, sales_person_id INTEGER, quantity INTEGER, sale_total NUMBER(10,2) );
A data-analysis application might execute the following query against a star schema:
SELECT SUM(sale_total)
FROM customers c,
parts p,
employees e,
sales s
WHERE p.description = 'BLUE WIDGET'
AND c.company_name = 'Animated Learning'
AND c.last_name = 'Bobrowski'
AND p.id = s.part_id
AND e.id = s.sales_person_id;
With this type of query, Oracle8 uses a special optimization technique called the star query, in which the database joins the dimension tables with each other and then joins the result to the fact table. A star query executes best when there are concatenated indexes on the join columns of the fact tables. The idea is to defer scans of the large fact tables until the later steps in the query-execution plan, to minimize the overall cost of the statement.
Star-query optimization has been available since Oracle7 Release 7.2, but Oracle8 includes a new technique called star transformation, in which the database transforms a join query into a query that uses several subqueries for efficient lookups. Star transformations are more efficient than typical star queries, but you must meet a couple of requirements to use them. First, you must enable star transformations for either the database or a particular session. To enable star transformations for all database sessions, set the server parameter STAR_TRANSFORMATION_ENABLED=TRUE. To enable star transformations for the current database session, you can have the application issue an ALTER SESSION statement:
ALTER SESSION SET STAR_TRANSFORMATION_ENABLED=TRUE
Second, you must create bitmap indexes for the join keys in the fact tables. A bitmap index is a B-tree index that stores bitmaps. (A bitmap is a series of bits that are either on [1] or off [0].) A bitmap index has bitmaps for each distinct value in the indexed column (or columns). Consider a CUSTOMERS table that has a bitmap index on the GENDER column:
CREATE BITMAP INDEX cust_gender ON customers (gender) PARALLEL (DEGREE 3) NOLOGGING;
The bitmap index includes a series of bitmaps for the distinct values M (male) and F (female). A set of bitmaps in the index might look like that shown in Table 1 for a very small CUSTOMERS table. In this example, the rows with ROWIDs 11, 16�19, 21�23, and 25�27 correspond to male customers.
Because of the way that bitmap indexes are structured, they are typically used only in read-only databases such as data warehouses, where the data is updated infrequently. Furthermore, bitmap indexes are appropriate only for columns that have a high degree of cardinality relative to the number of rows in the table. For example, it would be equally appropriate to create a bitmap index for a table column that contains 5 distinct values when the table contains 1,000 rows and a table column that contains 100,000 distinct values when the table contains 1 billion rows.
TABLE 1: A bitmap index for a small customers table.
| Key | Start ROWID | End ROWID | Bitmap | |
| M | 10 | 19 | 0100001111 | |
| M | 20 | 29 | 0111011100 | |
| F | 10 | 19 | 1011110000 | |
| F | 20 | 29 | 1000100011 |
Oracle8 has many features that can help you create and manage a successful data warehouse. For example, the SQL*Loader utility and parallel-processing features let you load large volumes of data into a database and then quickly index and analyze the data, using parallel processing. Plus, the comprehensive data-partitioning features can help you easily rotate data into and out of large history tables.
Steve Bobrowski is the CEO of OraWorld (www.oraworld.com). Bobrowski is also the author of the award-winning Mastering Oracle7 & Client/Server Computing (Sybex, 1996) and the new book Oracle8 Architecture (Oracle Press, 1997).
Copyright © 1994, 1995, 1996, 1997 & 1998 Oracle Corporation. All Rights Reserved.