Mr. Tune Up June 1997

Five Performance Hints for Efficient SQL

Welcome to OReview's new performance-tuning column. In this column I hope to provide regular and reliable performance-related strategies, news, and information that will supplement - although not replace - the more in-depth performance-related feature articles that appear periodically in OReview. In particular, I'll provide:

advice on new performance-related features of the Oracle server and tools
coverage of Oracle performance-optimization techniques at beginner, intermediate, and advanced levels
information about undocumented or little-known performance-tuning facilities

Most performance experts agree that tuning application SQL is usually the most effective way of improving performance, and that SQL tuning is an essential prerequisite to effective server or operating-system tuning. Unfortunately, SQL tuning often takes second seat to server tuning because SQL tuning is less of an exact science and often out of a DBA's control.

I have 10 hints that can help you tune your SQL and improve your application performance. In this inaugural column, I give you five hints; look for five more hints in my August column.

Hint #1: Establish a Tuning Environment

It's not uncommon for SQL that works well in a development environment to exhibit poor performance once it is in a production system. One of the primary causes of such unpleasant surprises is an inadequate development or volume-testing environment. In particular, environments without realistic or representative data volumes are bound to lead to unrealistic SQL performance.

The ideal tuning or development environment is one in which:

data volumes are realistic or at least proportional - With today's large production databases, it's often not possible to duplicate production data volumes exactly. However, you should always be able to construct a reasonable subset of the data. For example, a 20 percent sample of a 1GB database may be adequate for performance tuning. You must, however, avoid the situation in which SQL developers are constructing and testing code against empty tables - even the proverbial "SQL query from hell" will run fast in these environments.
tuning facilities are available - Supply your volume and development environments with as many tuning tools as you have available. This may involve third-party SQL tuning tools, but it will at least involve enabling the default Oracle tools. Make sure that developers know how to use EXPLAIN PLAN, SQL_TRACE, and tkprof facilities. Ensure that relevant database options are effectively set.(See my article"Getting the Most from the SQL_TRACE Facility,"OTJ,Winter1996)
documentation is available - Documentation should include the database design, index details, performance requirements, and volume metrics. The SQL programmer needs all of this information in order to produce efficient SQL.

Hint #2: Use SQL Tuning Tools

Oracle provides powerful facilities for measuring the execution of SQL statements, such as the EXPLAIN PLAN command, which can generate the execution plan for a SQL statement. To use EXPLAIN PLAN:

Create the PLAN_TABLE using the script in $Oracle_HOME/rdbms/admin/utlxplan.sql (the location might be slightly different under non-Unix hosts).
Issue an EXPLAIN PLAN FOR sql_statement statement.
Display the execution plan for the statement using a query such as:

SELECT rtrim(lpad(' ',2*level)||
    rtrim(operation)||' '||
    rtrim(options)||' '||
    object_name) query_plan
 FROM plan_table
CONNECT BY prior id=parent_id
 START WITH id=0;

Delete the contents of the PLAN_TABLE.
A more powerful relative of the EXPLAIN PLAN command is the SQL_TRACE facility. To use SQL_TRACE you do the following:
Execute the statement ALTER SESSION SET SQL_TRACE=TRUE from within an Oracle session.
Locate the trace file - its location is specified by the server parameter USER_DUMP_DEST.
Use the tkprof program to format the trace file. For example:

tkprof input_trace_file output_report_file 
 sort='(prsela,exeela,fchela)' 
explain=username/password

Interpreting tkprof output and execution plans is a complex topic that I can't give adequate coverage here. For more information, see my article "Getting the Most from the SQL_TRACE Facility," OTJ, Winter 1996.

Hint #3: Index Carefully

Indexes exist to improve the performance of SQL statements. In order to establish the "best" indexes to achieve optimal performance, you should do the following:

Index Selectively. A "selective" index is an index with columns that have a large number of distinct values. Such indexes are efficient because each index key lookup identifies only a small number of rows and hence requires less database I/O. The most selective index is a unique index - each index key points to only one row. It's usually not worth indexing on columns with only a few distinct values, unless one of the values is very rare - and even then you may need to implement histograms (which I discuss below) to use the index appropriately.
Use Concatenated Indexes. Try not to use two indexes when one will suffice. For example, if you are searching for SURNAME and FIRSTNAME, don't create separate indexes for each column. Instead, create a concatenated index on both SURNAME and FIRSTNAME. You can then use the "leading" portion of the concatenated index on its own; thus, if you sometimes query on the SURNAME column without supplying the FIRSTNAME, SURNAME should come first in the index. Note that you can't use the trailing portions of a concatenated index. For example, the concatenated index on SURNAME and FIRSTNAME can't help you find all people with a first name of "Fred."
Overindex to Avoid a Table Lookup. You can sometimes improve SQL execution by "overindexing." Overindexing involves concatenating columns that appear in the SELECT clause to the index, but not in the WHERE clause. Imagine that you are searching on SURNAME and FIRSTNAME in order to find EMPLOYEE_ID. Your concatenated index on SURNAME and FIRSTNAME will let you quickly locate the row containing the appropriate EMPLOYEE_ID, but you will need to access both the index and the table. If there is an index on SURNAME, FIRSTNAME, and EMPLOYEE_ID, then the query can be satisfied using the index alone. This technique is particularly useful when optimizing joins, because intermediate tables in a join are sometimes queried merely to obtain the join key for the next table.
Consider Advanced Indexing Options. Oracle's default B-tree indexes are flexible, efficient, and suitable for the majority of situations you will encounter. However, Oracle offers a number of alternative indexing schemes that can improve your performance in specific situations:
- Index clusters let you locate rows from one or more tables in cluster key order. Clustering tables can result in a substantial improvement in join performance. However, the performance of table scans of individual tables in the cluster may be severely degraded. Index clusters are usually recommended only for tables that are always accessed together. Even then, you should consider alternatives such as denormalization.
- In hash clusters, the key values are translated mathematically to a hash value. Rows are stored in the hash cluster based on this hash value. Locating a row when the hash key is known may require only a single I/O rather than the two or three I/Os required by an index lookup. However, you cannot perform range scans of the hash key. Furthermore, if the cluster is poorly configured or the size of the cluster changes, overflows on the hash keys may occur or the cluster may become sparsely populated. In the first case, the performance of hash-key retrieval can degrade; in the second case, table scans will be less efficient.
- Bitmapped indexes were introduced in Oracle 7.3. These indexes suit queries on multiple columns that each have only a few distinct values. They are more compact than a corresponding concatenated index and, unlike the concatenated index, you may query columns in any combination. However, bitmapped indexes are not suitable for tables with high modification rates, because locking of bitmapped indexes occurs at the block level rather than the row level. Bitmapped indexes are also not suitable for columns with large numbers of distinct values.
Ensure that your query uses the best index. Novice SQL programmers are often satisfied that the execution plan for their SQL statement uses an index - any index. However, sometimes you have a choice of indexed retrievals, and the Oracle optimizer - especially the older rule-based optimizer - will not always choose the best index. Make sure that the indexes selected by Oracle are the most appropriate, and use hints (which I discuss later in this article) to change the index if necessary.

Hint #4: Reduce Parsing

Parsing a SQL statement includes the process of validating the SQL and determining the optimal execution plan. For SQL that has low I/O requirements but is frequently executed (for example, SQL generated by OLTP-type applications), reducing the overhead of SQL parsing is extremely important.

When an Oracle session needs to parse an SQL statement, it first looks for an identical shared SQL statement in the Oracle shared pool. If it cannot find a matching statement, Oracle will determine the optimal execution plan for the statement and store the parsed representation in the shared pool.

The process of parsing SQL is CPU-intensive. When I/O is well tuned, the overhead of parsing a SQL statement can be a significant portion of the total overhead of executing that statement. You can reduce the parsing overhead by:

using bind variables - Bind variables allow the variable part of a query to be represented by "pointers" to program variables. If you use bind variables, the text of a SQL statement will not change from execution to execution, and Oracle will usually find a match in the shared pool, which dramatically reduces parsing overhead. (For more on bind variables, see Roger Snowden's article "Application Tuning, Part 1," OReview, January/February 1997).
reusing cursors - Cursors (or context areas) are areas of memory that store the parsed representation of SQL statements. If you execute the same SQL statement more than once, you can reopen an existing cursor and completely avoid issuing a parse call. The mechanism of reusing cursors varies from tool to tool. Table 1 summarizes the techniques you can use in some popular development tools.
using a cursor cache - If your development tool makes it difficult or impossible to reuse cursors, you can instruct Oracle to create a cursor "cache" for each session using the SESSION_CACHED_CURSORS server parameter. If SESSION_CACHED_CURSORS is greater than zero, Oracle will store recently executed cursors in a cache and, if you reexecute such a SQL statement, you can avoid reparsing the statement.

Hint #5: Take Advantage of the Cost-Based Optimizer

The Oracle software component that determines the execution plan for a SQL statement is called the optimizer. Oracle supports two approaches to query optimization: rule-based and cost-based. The rule-based optimizer determines the execution plan based on a set of rules that rank various access paths. For example, an index-based retrieval has a lower rank than a full table scan; therefore, the rule-based optimizer will use indexes wherever possible.

The cost-based optimizer determines the execution plan based on an estimate of the computer resources (the cost) required to satisfy various access methods. The cost-based optimizer uses statistics, including the number of rows in a table and the number of distinct values in indexes, to determine this ideal plan.

Many people's early experiences with the cost-based optimizer in Oracle 7.0 and 7.1 were disappointing, which gave the cost-based optimizer a bad reputation in some quarters. However, the cost-based optimizer has been improving with each release of Oracle7, and many advanced SQL access methods, such as star and hash joins, are only available when you use the cost-based optimizer.

The cost-based optimizer is your best choice for almost all new projects, and it may be cost-effective to convert your existing projects from rule-based to cost-based optimization. Consider the following guidelines for getting the most from the cost-based optimizer:

The default mode of the cost-based optimizer (OPTIMIZER_MODE=CHOOSE) attempts to optimize the throughput - the time taken to retrieve all rows - of SQL statements and often favors full table scans over index lookups. When converting to cost-based optimization, many users are disappointed to find that previously well-tuned index lookups change to long-running table scans. To avoid this situation, set OPTIMIZER_MODE=FIRST_ROWS in the server's parameter file or ALTER SESSION SET OPTIMIZE=FIRST_ROWS in your code. These settings instruct the cost-based optimizer to minimize the time taken to retrieve the first row in your result set and encourages the use of indexes.
No matter how sophisticated the cost-based optimizer becomes, there will still be occasions when you need to modify its execution plan. SQL "hints" are usually the best way to do this. Using hints you can instruct the optimizer to pursue your preferred access paths (such as a preferred index), use the parallel query option, select a join order, and so on. You enter hints as comments following the first word in a SQL statement. The plus sign "+" in the comment lets Oracle know that the comment contains a hint. Hints are fully documented in the Oracle Server Tuning guide, and some of the most popular hints are summarized in Table 2. (Also see Steve Bobrowski's article "Tuning Application Logic," OTJ, Summer 1996). The following example uses a hint to instruct the optimizer to use the CUST_I2 index:
```
SELECT /*+ INDEX(customers cust_i2) */ * FROM customers WHERE name=:cust_name
```
The cost-based optimizer's execution plans are calculated using table statistics collected by the ANALYZE command. Make sure that you analyze all of your tables regularly and analyze them at peak volumes (for example, don't analyze a table just before it is about to be loaded by a batch job). For small or medium-sized tables, use the ANALYZE TABLE table_name COMPUTE STATISTICS statement. For larger tables, analyzing every row of the table may be too time-consuming. In this case, analyze a random sample of rows using a statement such as ANALYZE TABLE table_name ESTIMATE STATISTICS SAMPLE 20 PERCENT.
Prior to Oracle7.3, the cost-based optimizer could determine the number of distinct values in a column but not the distribution of data within the column. This meant that it might decline to use an index on a column with only a few values, even if the particular value in question was rare and would benefit from an index lookup. Histograms, introduced in Oracle 7.3, enable you to collect column distribution data and let the cost-based optimizer make better decisions. You create histograms with the FOR COLUMNS clause of the ANALYZE command (for example, ANALYZE TABLE table_name COMPUTE STATISTICS FOR ALL INDEXED COLUMNS). Note that you can't take advantage of histograms if you are using bind variables (which I discussed earlier). (For a further discussion of histograms, please see Steve Bobrowski's article "Tuning Application Logic," OTJ, Summer 1996)

In Retrospect

To briefly review this month's SQL tuning hints:

Hint #1. Establish a tuning environment that will enable you to effectively write and tune efficient SQL. This environment should contain realistic or representative data volumes, include tuning and diagnostic tools, and have application design documentation available.

Hint #2. Use the tuning tools that are included in Oracle. In particular, learn to use EXPLAIN PLAN, SQL_TRACE, and tkprof.

Hint #3. Make effective use of Oracle indexes. Take advantage of concatenated indexes, overindexing, and "advanced" indexing options, such as bitmapped indexes and hash and index clusters.

Hint #4. Reduce the overhead of SQL statement parsing by using bind variables, retaining SQL "cursors," and implementing an Oracle cursor cache.

Hint #5. Take advantage of the Oracle cost-based optimizer. If you're still using the older rule-based optimizer, consider switching to the cost-based optimizer. When using the cost-based optimizer, ensure that you analyze all of your tables regularly. Also, make sure that OPTIMIZER_MODE is set appropriately. Using hints and column histograms can improve the performance of SQL statements that are particularly difficult to optimize.

Next month, I'll look at some of the performance features of Oracle version 8. In August, I'll present a final list of five hints for optimizing SQL, which will include tips on avoiding accidental table scans, optimizing unavoidable scans, using array processing, effective locking strategies, and leveraging features introduced in recent releases of Oracle.

Guy Harrison is an independent Oracle consultant specializing in Oracle development and performance issues. He is the author of Oracle SQL High-Performance Tuning (Prentice Hall, 1997). You can contact Guy via email at [email protected] or at his home page, werple.net.au/~gharriso.

Table 1

How to reuse cursors and reduce parsing in popular development tools.
Development Environment	How to Reduce Parsing by Retaining Cursors
PowerBuilder	The SQLCache setting in the DBParm determines the number of cursors to keep open after execution. The default value is zero, which means that no cursors are retained. Setting this parameter to 20 or 30 usually reduces parse overhead significantly without using excessive memory.
SQL*Windows	It's hard to avoid destroying cursors in SQL*Windows, but it is possible if you: Set the `DBP_PRESERVE` parameter to `TRUE`. Create separate SQL statement handles for frequently executed SQL statements with the `SqlConnect` call. Use `SQLPrepare` to associate these handles with the SQL statements. Make sure you do this only once in your code. Use `SQLExecute` when you want to execute your SQL statement or refresh the result set.
Precompilers (ProC,ProCobol, and so on) compiler option	The Precompilers will cache SQL cursors. The number of SQL statements in the cache is determined by the `MAXOPENCURSORS`.You can use the `HOLD_CURSOR` and `RELEASE_CURSOR` options to determine which statements get preference in the cache.
Oracle Forms(Developer/2000)	Oracle Forms automatically retains and reuses cursors effectively. If you specify the Optimize`TP=NO` option on the command line,only `SELECT` statements are allocated separate cursors.
Oracle Objects for OLE	Cursors are represented as dynasets in Oracle objects for OLE. As long as your dynasets are not deallocated, your cursors will be reusable.
Oracle Call Interface (OCI)	In OCI, you have complete control over cursor management. To keep a cursor in memory, do not close the cursor using the `OCLOSE` call after statement execution - this completely deallocates the memory and the SQL statement. You can reuse the cursor simply by rebinding and reexecuting the cursor.
Oracle Power Objects PL/SQL	Cursor reuse is transparent and automatic. Cursors are kept in memory as required. If you specify `CLOSE_CACHED_OPEN_CURSORS`=`TRUE` in an `ALTER` `SESSION` statement or in the server's parameter file, cursors are discarded on `COMMIT` or `ROLLBACK`.

Back to text

Table 2

Some commonly used optimizer hints.
Hint	Usage
`ALL_ROWS`	Uses the cost-based optimizer to optimize for the retrieval of all rows.
`CACHE`	Encourages rows retrieved by a full table scan to remain in the buffer cache of the SGA.
`FIRST_ROWS`	Specifies that the cost-based optimizer should optimize the statement to reduce the cost of retrieving the first row only.
`FULL`(table_name)	Uses a full table scan to retrieve rows from the specified table.
`INDEX`(table_name [index_name])	Uses the specified index to retrieve rows from the table or, if no index is specified, to use any index.
`ORDERED`	Instructs the optimizer to join the tables in exactly the left-to-right order specified in the `FROM` clause.
`PARALLEL`(table_name, degree_of_parallelism)	Instructs the optimizer to perform parallel scans on the nominated table. If no degree of parallelism is specified, the default will be used.
`RULE`	Uses rule-based optimization.

This is a copy of an article published @ http://www.oreview.com/