Oracle8 VLDB Enhancement

By Guy Harrison

This article explores Oracle8's "big set" optimizations, including table partitioning together with improved parallel and DML capabilities, which support the creation and maintenance of monster-sized databases.

Although the press has almost exclusively concentrated on the object-relational features of Oracle8, such as as nested tables, array data types, large objects, and user-defined objects, there is another side to this new release. Oracle8 also has some significant "big set" optimizations that support the creation and maintenance of much larger databases than are practical with Oracle7. These features include table partitioning together with improved parallel and DML capabilities. This month, I look at some of the "big set" features, particularly those that impact performance.

Partitioned Tables and Indexes

As databases increase in size, maintaining and manipulating the data in very large tables can become problematic. For instance, it may be impractical to rebuild or export a very large table. Bulk updates and purges may consume excessive resources or over-run maintenance windows. Partitioning provides a "divide-and-conquer" solution to some of these problems. Oracle8 provides a partitioning scheme based on column ranges. For example, you might partition a SALES table by year, so that the table stores sales for each year in distinct partitions. The following SQL shows how to create such a partitioned table:

CREATE TABLE sales_partition
  (column definitions for SALES) 
  PARTITION BY RANGE (sale_date)
  (PARTITION s1 VALUES LESS THAN 
   (`01-JAN-93'),
   PARTITION s2 VALUES LESS THAN 
    (`01-JAN-94'),
  - Other years
   PARTITION s5 VALUES GREATER THAN 
    (`01-JAN-98'));

Each partition is a separate segment, and you can store partitions in separate tablespaces and assign the partitions distinct storage definitions. Partitions are implemented transparently and an application doesn't need to know the names of inividual paritions. However, your SQL statements can specify a partition name. For example, the following statement updates an individual partition (S1) only:

UPDATE sales_partition PARTITION(s1)
  SET purge_ind='Y'
  WHERE sale_value < 10000

Partition Indexes

You can also partition indexes. You will typically create a partitioned index on a partitioned table, although it is possible to create a partitioned index on an unpartitioned table. If an index on a partitioned table is itself unpartitioned or is partitioned on different column range conditions than those of the source table, then the index is known as a global index. If an index is paritioned in exactly the same fashion as the parent table, it is known as a local index. If the leading columns of the index are also the columns upon which the index is partitioned, then the index is known as a local prefixed index. Local indexes have some significant advantages over global indexes. If a partitioned table with a global index has a partition split, merge, or move, then the corresponding index partitions will automatically be subjected to the same maintenance operations. Global indexes also preclude some new Oracle8 operations - such as direct load inserts (which I discuss shortly).

Partition Maintenance

Oracle8 allows you to manage partitions in a number of ways:

You can export or import individual partitions. This also lets you export a partitioned table in parallel by running a separate export job for each partition.
You can convert a partition into a standalone table, or a table into a single partitioned table.
You can truncate individual partitions.
You can merge or split partitions. For instance the following statement splits a specific partition into two new partitions:

ALTER TABLE sales_partition
  SPLIT PARTITION s5
    AT (`30-JUN-97')
  INTO (  partition s5a,
    partition s5b);

The Advantages of Partitions

Oracle8's partitions enable you to perform administrative work on subsets of very large tables. In Oracle7, these very large tables pose special challenges. For example, in Oracle7 you may not be able export very large tables within an acceptable time frame. With Oracle8's partitioned tables, you can reduce maintenance time by restricting maintenance to an individual partition or by performing operations on multiple partitions in parallel. Partitioned tables can also improve the performance of table scans. If you issue a query based on the same columns by which a table is partitioned, the Oracle8 optimizer can restrict the scans to the relevant partitions. You can also manually restrict operations to a particular partition by specifying the partition name after the table name. You can subject partitioned tables to parallel update and delete operations and to more efficient parallel inserts. (I discuss these operations later in this article.) Partitioned tables also offer an excellent means of purging historical data. For example, in the SALES partition, you could quickly purge all data for a particular financial year simply by dropping the relevant partition.

Restrictions on Partitioning

Although Oracle8 partitioning is powerful and effective, there are some limitations:

You can't create global bitmap indexes and you can't partition a bitmap index or a clustered table.
Partitioned tables can't contain LONG columns or user-defined object types.
Oracle8 partitions are not automatically balanced. That is, there is no guarantee that each partition will hold equivalent data volumes. If a single partition contains a disproportionate number of of rows then the advantages of partitioning will be diminished.
Oracle8 only supports a column-range partitioning algorithm. Some other VLDB implementations support round-robin or hash partitioning, which automatically balance the height of each partition. (For more on these algorithms, see the article "VLDB Strategies, Part 2," DBMS magazine, March 1997, page 63. For an online copy, see www.dbmsmag.com.)

Direct Load and Unlogged Inserts

Oracle8 has two enhancements that improve the performance of inserts based on queries (for example, INSERT INTO table1 SELECT columns FROM table2). These are:

Direct load insert in which rows are created and written directly to data files - bypassing the Oracle buffer cache.
NOLOGGING mode, similar to the UNRECOVERABLE in Oracle7, but extended to include INSERT operations. Direct Load Insert. Users of SQL*Loader are familiar with the "direct path" mode, in which data blocks are constructed in memory and inserted directly into the database files - bypassing the Oracle buffer cache. Direct load insert uses a very similar model. Bypassing the buffer cache reduces overhead and contention and can lead to substantial improvements in insert performance. To invoke the direct load insert, you can use the APPEND hint. Because of the way direct load insert works, it may degrade the performance of full table scans. Oracle maintains a pointer to the "highest" block in a table that has ever contained data. This high water mark is advanced as rows are inserted into the table. As rows are deleted, the high water mark does not retreat, but may cause blocks below the high water mark to become re-eligible for inserts. Because direct load insert creates only new blocks, free space in existing blocks cannot be used. If the table is partitioned, the blocks can be inserted into existing extents above the high water mark.If the table is unpartitioned, then rows must be inserted into newly allocated extents. In either case, the high water mark is advanced. Because full table scans always read every block up to the high water mark, direct load inserts can increase full table scan overhead more than normal inserts. Following are some other limitations:
After a direct load insert, you cannot issue any subsequent statements (other than a COMMIT or a ROLLBACK).
You cannot perform direct load inserts on partitioned tables with global indexes.
A direct load insert applies a full table DML lock to the table in question. No other concurrent insert, delete, or update operations are permitted.

NOLOGGING Mode.

In Oracle7 you can specify that a CREATE TABLE … AS SELECT statement or a CREATE INDEX statement is UNRECOVERABLE. This instructs Oracle not to generate redo log information for the operation and can result in very significant reductions in build time - especially when you use it in conjunction with the PARALLEL clause. However, tables and indexes created this way cannot be recovered from information contained in redo logs. The NOLOGGING keyword in Oracle8 replaces the UNRECOVERABLE keyword in Oracle7, but UNRECOVERABLE is retained for compatibility reasons. You can use the NOLOGGING mode for direct load inserts, as well as in CREATE TABLE and CREATE INDEX statements. Because redo log information is not generated, the insert will be faster. But if there should be a disk failure before the next tablespace backup, you will lose all the information inserted. You can specify the NOLOGGING keyword in CREATE TABLE, CREATE INDEX, ALTER TABLE, or ALTER INDEX statements, or you can specify it as the default for a tablespace. Once you associate NOLOGGING with a table, any direct mode inserts will automatically be performed without logging. Watch out for this if you currently use the Oracle7 UNRECOVERABLE option to build tables: Under Oracle8 you'll need to issue an ALTER TABLE statement to turn logging mode back on if you need to log subsequent inserts. Because the data inserted with the NOLOGGING mode can't be restored from redo logs, you're not likely to use this facility for critical application data. However, you might use the NOLOGGING feature in the following circumstances:

loading a temporary table for intermediate processing in a long-running batch job
quickly building a dispensable summary or aggregate table New Parallel Options If you have a computer with multiple CPUs or have your database spread over multiple disk devices, you'll probably get the most significant performance improvements from Oracle8's extensions to parallel processing.In Oracle7, you could perform only the following operations in parallel:
queries based on full table scans
queries based on full index scans (only in 7.3.3 and above)
index creation and rebuild
creating a table from a full table scan

Oracle 8 offers the following additional parallel operations:

parallel insert based on a query
parallel update or delete on partitioned tables
parallel partition indexed query

Parallel Insert.

Unlike other forms of parallel DML, it's possible to perform an INSERT that uses a subquery in parallel, even if the table is not partitioned. A parallel insert into nonpartitioned tables works in APPEND mode by default. That is, blocks are created and formatted in session memory and inserted directly into the datafiles, bypassing the buffer cache. Even when you tell Oracle not to use the APPEND mode (by using the NOAPPEND hint), each parallel query server process must insert into a separate extent or partition. For parallel insert into nonpartitioned tablesthis means that for each degree of parallelism at least one new extent must be created. As a result, a parallel insert might result in a greater increase in storage allocation than a serial insert.

For a parallel insert into a partitioned table, each parallel query server process is assigned a separate partition. They may insert into an existing extent if blocks in the extent are on the free list. If there is no space available in existing extents, then new extents must be allocated. Parallel Update and Delete. Updates and deletes on partitioned tables may be processed in parallel. Multiple parallel query processes are allocated to the operation, although each partition is allocated no more than a single process. For this reason, the degree of parallelism (number of parallel threads of execution) may be no more than the number of partitions in the table. Using Parallel DML. To activate parallel DML, you should first issue the ALTER SESSION ENABLE PARALLEL DML command. As with parallel queries, parallel DML is initiated if there is a PARALLEL clause on the table or a PARALLEL hint within the SQL statement. For example, the following statement inserts all transactions older than one week into a TRANSACTION_ARCHIVE table. Note that the PARALLEL hint is used both in the INSERT statement and in the underlying SELECT. I also used the APPEND hint to invoke the direct insert mode.

insert /*+PARALLEL (t,5) APPEND */ 
  INTO transaction_archive ta
SELECT /*+PARALLEL (t,5) */ *
  FROM transactions t
  WHERE transaction_date < SYSTDATE -7;

When using parallel DML, you should try to avoid the following pitfalls:

For a parallel DML based on a query, ensure that the query is also parallelized. For instance, it may not be effective to use a serial (perhaps index-based) SELECT to feed rows into a parallel INSERT.
When using parallel DML on partitioned tables, ensure that the number of parallel query processes can be evenly divided into the number of partitions. Otherwise, some of the parallel query processes will have more partitions to process than others and you won't get the full benefit from your degree of parallelism.

Any Oracle user who has to deal with large tables can probably imagine a number of effective uses for parallel DML. Following are some of the more obvious:

Parallel inserts into temporary tables- you might use parallel insert to populate a temporary report table.
Bulk updates of tables for maintenance-giving every taxpayer a 10 percent reduction in their tax rate.
Purging and archiving-using parallel insert to copy rows older than one year to an archive table and using parallel delete to remove the same rows from the source table.

Restrictions on Parallel DML

There are a number of restrictions on parallel DML:

As with direct load inserts, a parallel DML statement cannot appear in a transaction with other DML statements. The only statement that can follow a parallel DML statement is COMMIT or ROLLBACK.
You can't perform parallel DML on a partitioned table with a unique global index. For parallel insert, even non-unique global indexes are not permitted.
Tables that have triggers or tables with self-referencing or delete cascade constraints cannot be subject to parallel DML. The restriction on triggers is particularly significant because it means that any table that is replicated (for example, it is the source table of a snapshot) cannot be subjected to parallel DML.
Parallel DML statements cannot be used in distributed transactions.
Parallel DML requires some additional resources. Each parallel slave requires it's own transaction and rollback segment entry. There are also more freelist and enqueue resources required. A two-phase commit is necessary to complete the parallel DML.
If you rollback a parallel DML statement, the rollback is executed in parallel and will complete in about the same time as the parallel DML itself. However, if your session or the database crashes, it will be the responsibility of SMON or PMON to perform the rollback. Because these processes are single-threaded, the time needed to rollback the parallel DML may be many times longer than the parallel DML's execution time.

Parallel Queries Using Partitioned Indexes

Prior to Oracle8, only queries that were based on a full table scan could be processed in parallel. In Oracle8, queries based on index range scans can be parallelized if the index is partitioned. The algorithm for this is similar to that used for parallel DML for a partitioned table: Each slave process is allocated one of the index partitions and performs a scan on that partition only. Therefore, the degree of parallelism can be no greater than the number of partitions in the index. A new hint, PARALLEL_INDEX, lets you request a parallel partitioned index scan.

Index-Organized Tables

In Oracle7 you may occasionally create an index that includes all or most of the columns in a table. The table itself may then become superfluous because all queries are satisfied from the index. However, the table still consumes storage and incurs insert/update/delete overhead. To address this situation, Oracle8 enables you to create an index-organized table in which the table itself is stored in an index format.

You can create an index-organized table using the ORGANIZATION keyword of the CREATE TABLE statement. For example:

CREATE TABLE iot_table
  (pk_col1   number NOT NULL,
  pk_col2    number NOT NULL,
  data_col1  VARCHAR2(30) NOT NULL,
  data_col2  LONG ,
  constraint iot_table_pk PRIMARY KEY 
   (pk_col1,pk_col2))
  ORGANIZATION index
  PCTTHRESHOLD 50
  OVERFLOW TABLESPACE long_tbs;

Index-organized tables are organized as a B-tree index constructed against their primary key. The primary key and additional columns are stored in the leaf blocks of the B-tree. However, only the columns that contribute to less than PCTTHRESHOLD of the total row length are stored in the B-tree leaf blocks. The remainder of the row is stored within chained blocks in the OVERFLOW TABLESPACE.

The ability to relocate longer columns away from the B-tree structure lets Oracle keep the B-tree relatively small and efficient, while keeping small, frequently accessed columns in the B-tree. When creating an index-organized table, define frequently accessed, small columns early in the column list and large, infrequently accessed columns later in the list. Define PCTTHRESHOLD so that only the frequently accessed columns are retained in the B-tree. The primary key should include all columns that might appear in the WHERE clause. It is not possible to create a secondary (that is, nonprimary key) index on an index-organized table, because the index-organized table rows are not associated with a ROWID and hence there is nothing to which the secondary index can point. Index-organized tables are suitable for the following circumstances:

all queries on the table can be satisfied using the primary key; secondary indexes are not required
a table contains a few frequently accessed short columns and infrequently accessed long columns
either the total row length or the length of frequently accessed columns is relatively small

Other Enhancements

I've looked at just some of the new performance-enhancing features in Oracle8. I've detailed table and index partitioning, direct load and unlogged insert, parallel DML, and partitioned index scans. These facilities work very well together: Many of the new parallel operations work only on partitioned tables and the direct mode and nologging options are often combined with parallel insert to rapidly populate temporary or derived tables.

In subsequent months I'll continue to examine Oracle8 features and facilities. In particular, look forward to an article on Oracle8 object-oriented facilities and how they can improve - or degrade - the performance of your Oracle system.

Guy Harrison is an independent Oracle consultant specializing in Oracle development and performance issues. He is the author of Oracle SQL High-Performance Tuning (Prentice Hall, 1997). You can contact Guy through the Internet at [email protected] or at his home page http://werple.net.au/~gharriso.

This is a copy of an article published @ http://www.oreview.com/