DATABASE ADMINISTRATION
JANUARY/FEBRUARY 1996

Tuning the Oracle7 Database
for Optimal Performance

Whether troubleshooting performance problems or attempting to do fine-tuning, DBAs can choose from several tools and techniques to optimize their Oracle7 databases. Part 1 of a two-part article shows you how to achieve the greatest performance impact for the time you invest in tuning.

By Gita K.Gupta, Oracle Open Systems Performance

This article provides an overview of tips and techniques for getting optimal performance out of the Oracle7 database. The goals of this tuning approach are simple: to optimize utilization of limited resources and to maximize throughput for a given configuration. Certain key areas of Oracle tuning provide the greatest performance impact for the time invested. Part 1 of this article looks at tuning these areas, using the performance tools Oracle provides, and introduces generic tuning issues. Part 2, in an upcoming issue, will focus on data-warehousing applications and performance features introduced in Oracle7 releases 7.2. and 7.3.

Key database-tuning considerations include memory, I/O, and database writer (DBWR) performance. This article discusses only database-tuning issues---keep in mind that application tuning is crucial for performance, so you should tune your applications carefully as well.

PERFORMANCE-TUNING TOOLS

Oracle provides several tools for DBAs to use in conjunction with operating-system utilities when tuning databases. These include the dynamic V$ views, SQL scripts such as utlbstat/utlestat.sql based on the V$ views, and packages and procedures (such as dbmspool.sql) that provide tuning functionality. The dynamic views are crucial for Oracle7 tuning, because they provide a real-time, continuously updated display of database activity. The base objects of the dynamic views are the X$ pseudotables that are updated as the database runs. You can list these X$ tables by selecting from V$FIXED_VIEW and then directly query them, but you should use the V$ views except in cases where V$ views providing the information don't exist.

A common approach to tuning is to take a snapshot of system activity around a peak period, using utlbstat/utlestat.sql, and examine the report these scripts provide. Because this article provides a broad introduction to tuning concepts, I will not attempt to run through such a report in detail but will instead use queries that are adapted from the utlbstat/utlestat scripts to illustrate use of the V$ views in tuning performance.

TUNING MEMORY

Oracle7 stores internal data structures, metadata, and frequently accessed data in memory, because memory accesses are considerably faster than disk accesses. The goal of tuning memory is to size memory structures to satisfy a significant proportion of requests via memory lookups in order to minimize Oracle disk I/O. Real memory is a finite resource, however, and allocating excessive amounts of memory to Oracle7 can cause paging and swapping at the operating-system level, resulting in significant performance degradation. Allocate memory with the following goals:

Size the SGA to fit in real memory.
Make sure there is sufficient memory for peak-load user processes to connect and run applications.

If your operating system provides support for it, you can optimize use of real memory by installing the Oracle software so that several users can share a single executable image.

System Global Area (SGA) Tuning

The SGA holds the database-block-buffer cache, the redo- buffer and archive-buffer caches, and the shared pool, among other structures. The database buffers hold recently accessed information as well as buffers that have been modified and not yet flushed to permanent storage. The redo-buffer cache holds the redo records used in database recovery before Oracle7 flushes these records to the online redo log. The shared pool comprises the library cache, which holds the shared SQL and PL/SQL areas, as well as the data-dictionary cache session information for user processes if you are using the multithreaded server. Oracle7 allocates these areas statically at startup based on the values of initialization parameters. You can list summary information on the size of the main components of the SGA by querying V$SGASTAT as follows:


select name, bytes from v$sgastat where name in 
	('db_block_buffers ,  
		'log_buffer','dictionary cache', 'sql area','library 
	cache',  'free memory');

Incidentally, the free-memory statistic should be low rather than high; unlike on operating systems where a high value for free memory is healthy because it translates to a lower likelihood of paging, a high value from V$SGASTAT can indicate that Oracle7 has aged objects out of the shared pool and, as a result, that the shared pool is fragmented. Later sections discuss how to set the parameters that determine the size of each of the above SGA components.

Tuning Private SQL and PL/SQL Areas

Oracle7 can use private SQL and PL/SQL areas most efficiently if you can identify and eliminate unnecessary parse calls made by your application. The number of parse calls your application performs and the number of private SQL areas Oracle7 needs to allocate will be determined by whether or not your application reuses private SQL areas for multiple SQL statements.

There is a trade-off---an application that reuses private SQL areas won't need to allocate as many private SQL areas. This kind of application will therefore save on memory but will have to make more parse calls to reuse the private SQL areas. You can control the frequency of parse calls and reuse of private SQL areas at the level of most Oracle tools, including Oracle Precompilers, Oracle Call Interfaces (OCIs), and Developer/2000. If you decide to reduce the number of parse calls, you may need to increase the initialization parameter open_cursors to increase the number of cursors permitted for a session.

Tuning the Shared Pool

The shared pool includes the library cache, the dictionary cache, and session information. You can trace most shared- pool performance problems to the following causes:

Fragmentation of the shared pool.
Contention for the library-cache latch.
Incorrect sizing.

The shared pool becomes fragmented in the course of normal database operation. Once the pool is fragmented, if Oracle7 has to load a large object into the shared pool, it will flush objects that are not currently in use from the shared pool, using an LRU (least recently used) algorithm. Flushing shared-pool entities frees up memory, and the allocation will succeed if Oracle7 can find a sufficiently large chunk of contiguous memory; however, if a query references the flushed objects later, it will cause an implicit reparse, with the associated performance penalty.

A summary statistic for library-cache activity is library-cache misses, which you can compute by querying V$LIBRARYCACHE. Library-cache misses can occur in any of the following steps in processing a SQL statement:

Parse.If the application makes a parse call for a SQL statement and the parsed representation of the statement does not already exist in the shared SQL area, Oracle7 will have to parse the statement and allocate a shared SQL area. You can reduce library-cache misses in this phase by adopting the guidelines below for writing SQL statements:

1. Use bind variables rather than explicitly specified constants, and standardize bind-variable naming conventions.

2. Standardize the case and spacing conventions for SQL statements and PL/SQL blocks---for example, because of the difference in spacing and case, "Select * from emp" is not identical to "Select * from EMP".

3. Maximize use of packages and stored procedures to ensure that multiple users reuse the same SQL area, minimizing runtime parsing.

4. Where possible, use fully qualified table names---that is, prefix the table name with the schema name. With precompiler applications, you can reduce parse calls by setting hold_cursors to true. Similarly, in situations where users repeatedly parse the same statement (such as when a user is switching between forms), setting the init.ora parameter session_cached_cursors to true will allow Oracle7 to cache closed cursors within the session, eliminating the need for parsing on a subsequent parse call.

Execute. Library-cache misses during the execute phase occur when the parsed representation exists in the library cache but has been aged out. You can monitor the misses, particularly on execution, by running the following query on V$LIBRARYCACHE:


select namespace,gets, round(decode(gethits,0,1,gethits)
	/decode(gets,0,1,gets),3)
"GET HIT RATIO", pins, round(decode(pinhits,0,1,pinhits)
	/decode(pins,0,1,pins),3) "PIN HIT
RATIO", reloads, invalidations from v$librarycache;

The pin-hit ratio should be as close to 1.0 as possible. What can a DBA do to reduce misses on execution?

Reduce fragmentation: You can do this by pinning large objects, usually PL/SQL objects, in the shared pool. You can query V$DB_OBJECT_CACHE to identify these objects:

select * from v$db_object_cache where sharable_mem > 
	< threshold >;
(you should set the threshold based on your configuration)

You can have Oracle7 "keep" the objects by usingdbms_shared_pool.keep(), which you create by running dbmspool.sql. You should place all large PL/SQL objects into packages and mark the packages as "kept."

Reserve shared-pool space. You can reserve memory within the shared pool for large allocations that would normally cause a high degree of flushing. You can treat part of the shared pool, specified by the init.ora parameter shared_pool_reserved_size, as a reserved list of memory chunks. To ensure that the memory on this list remains in large contiguous chunks, a configurable parameter (shared_pool_reserved_min_alloc) controls the size of allocations that can allocate memory from this list when there is insufficient memory on the shared pool's free lists. You can configure these parameters by using statistics on reserved-pool usage from the view V$SHARED_POOL_RESERVED, which tracks reserved-pool free memory, used space, request misses, and failures. These features are available in Oracle7 Release 7.1.5 and above.

Tuning the Database Buffer Cache

Every request for a database block first scans the database buffer cache; if it finds a block in cache, a logical read will satisfy the request. If the request does not find a block in cache, it requires a physical read to read the block into cache from disk. Because minimizing the need for disk I/O can improve performance significantly, you need to optimize the number of data requests satisfied by memory. You can examine statistics on buffer-cache performance by querying V$SYSSTAT:


Select name, value from v$sysstat where name in
	('db block gets','consistent gets','physical reads');
 You can then compute the measure of interest---the hit ratio---as:

Hit Ratio = 1---( Physical Reads / ( DB Block Gets +
	Consistent Gets))

The hit ratio should be as close to 1.0 as possible. If it is suboptimal, increasing the size of the buffer cache can improve performance. You can increase the size of the buffer cache by increasing db_block_buffers up to some optimal value; this value naturally varies among installations, but the general principles that decide it are as follows:

Increasing db_block_buffers increases SGA size. Most operating systems will page the SGA if it does not fit entirely into real memory. Therefore, you need to balance the expected performance gain from increasing the size of the buffer cache with the degradation likely to result from paging and swapping at the operating-system level if the SGA does not fit in main memory.
Even on systems where main memory is abundant, this memory is a resource with an associated cost. The performance gain from increasing the size of the buffer cache reaches diminishing returns at a certain point. Beyond this point, you need to evaluate the benefits of increasing the buffer-cache size in terms of the associated costs.

Oracle7 provides the pseudotables X$KCBRBH and X$KCBCBH for evaluating the performance change from increasing or decreasing the size of the buffer cache. The table X$KCBRBH examines the effect of adding buffers to the buffer cache. To enable collection of these statistics, set the initialization parameter db_block_lru_extended _statistics to an integer n, which indicates Oracle will collect statistics on the addition of as many as n extra buffers. After gathering statistics, you can query the table as follows:


Select 250*trunc(indx/250)+1||' to'||250*(trunc(indx/250)+1) 	
	"Interval,"
sum(count) "Cache_Hits" from sys.x$kcbrbh group by trunc
	(indx/250);

The cache_hits column from the query indicates the additional cache hits that might result from adding the number of buffers indicated by the interval, in addition to the cache hits from the preceding interval. Similarly, you can use the pseudotable X$KCBCBH to evaluate the additional misses from reducing the size of the buffer cache. You can use this information to reduce the buffer-cache size to generate savings in main memory if the hit ratio is already very good.

If you have sufficient information about the applications that run against the database and can characterize certain small tables as frequently accessed, you can specify the CACHE clause when creating or altering these tables to modify Oracle7's normal behavior of putting blocks accessed in table scans at the cold, or less frequently accessed, end of the LRU list. You can specify this more generally to apply to tables of less than a certain size by setting the init.ora parameter cache_size_threshold.

TUNING DISK I/O

Tuning memory can reduce disk access, but even if you keep the number of disk accesses as low as possible, disk I/O can be a performance bottleneck. This is due to disk contention, when multiple processes access the disk concurrently, causing the system to reach limits on the number of accesses per second or the data-transfer rate. An application is said to be I/O bound if CPU activity is often suspended waiting for I/O activity to complete. Oracle7 is designed to prevent I/O from becoming a performance bottleneck, and Oracle provides tools you can use in conjunction with operating-system utilities to tune I/O.

Keep in mind that the recommendations given here are generic; a significant part of tuning I/O performance involves exploiting the I/O features provided by your operating environment. Features that are port-specific include using raw devices for database files, asynchronous I/O, scattered read capabilities, direct I/O, and memory-mapped files. Refer to your system's documentation to find out what I/O-specific features are available and how to use them.

Tuning Disk Contention

To tune disk contention, you first need to identify which disks containing database files are hot, or very busy with long wait queues. You can use operating-system utilities, such as sar on UNIX systems, to identify which disks are busy and to provide a report of average disk-service times, waits, and queue lengths. The next step is to map disk activity to application database file-access patterns. You can examine the accesses to database files by querying the V$FILESTAT, with emphasis on access statistics on a per-tablespace basis and on a per-file basis.

Accesses grouped by tablespace. This query indicates which tablespaces are most heavily accessed by applications. You should spread hot files across disks. In addition, objects that are accessed concurrently, such as a table and its indexes, should be in separate data files on separate disks. Similarly, you should stripe large tables that are subject to a high degree of concurrent activity across disks (by using alter table allocate extent or the parallel loader with the file = clause) to allow multiple processes to access different parts of the table with minimal contention. Application performance can also benefit from operating-system striping, with the choice of interleave determined by the nature of the application. Use the following queries to examine file I/O on a per-tablespace basis:


create view stats$file_view as select ts.name table_space, 
	i.name file_name,x.phyrds 
phys_reads,x.phywrts phys_writes,x.readtim,  phys_rd_time,
	x.writetim
phys_wrt_tim,x.phyblkrd phys_blks_rd, x.phyblkwrt phys_blks_wr
from v$filestat x, ts$ ts, v$datafile i, file$ f where 
	i.file#=f.file# and ts.ts#=f.ts# and x.file#=f.file#;
 
select table_space, sum(phys_reads) phys_reads, sum(phys_blks_rd)
	phys_blks_rd,
sum(phys_rd_time) phys_rd_time, sum(phys_writes)  
	phys_writes, sum(phys_blks_wr) phys_blks_wr, 
sum(phys_wrt_tim)  phys_wrt_tim from stats$file_view  group
	by table_space order by table_space;

Accesses on a per-file basis. The following query provides data on the reads and writes to each individual file:


Select name filename, phyrds phys_reads, phywrts 
	phys_writes from
v$datafile df, v$filestat fs where df.file# = fs.file#;

This query provides the access figures for data files; you can obtain the number of I/Os for non-Oracle files on the same disks by using operating-system utilities. You can then group the files by disk and compute the total number of I/Os per second. You can examine these figures to ensure the following:

No disk is operating too close to the hardware manufacturer's figures for maximum capacity.
Load is balanced evenly across all the disks. To ensure even load balancing, place data files and redo log files on separate disks and avoid placing non-Oracle files on disks holding Oracle files whenever possible.

Tuning to Reduce Disk I/O

Because disk I/O can have a large impact on performance, you should minimize the number of I/Os wherever you can. A general recommendation is to increase the initialization parameter db_file_multiblock_read_count so that Oracle7 can read a maximum number of blocks in one I/O operation during a sequential scan, thus reducing the total number of disk accesses required.

Tuning Sorts

Some applications, such as decision-support systems (DSSs), tend to be I/O intensive, and a significant proportion of their I/O can be to the temporary tablespace for sort operations. Therefore, tuning sort operations is crucial for the performance of these applications. The sort-area size (set by the init.ora parameter sort_area_size) determines the space allocated in main memory for each process to perform sorts. If this is not adequate, Oracle7 will allocate sort segments on-disk---also on a per-process basis---to hold the intermediate runs. Therefore, increasing sort_area_size can reduce the total number of runs, thus reducing the I/O to disk. You can identify the percentage of sorts that involve disk access with the following query:


Select name, value from V$SYSSTAT where name in 
	('sorts(memory)', sorts(disk));

The recommendation for setting sort_area_size comes with the following caveats, however:

Sort areas are allocated per process, so they can add up quickly, particularly when Oracle7 is using the parallel query option and each of the query slaves allocates sort_area_size. You should monitor operating-system statistics on paging/swapping to assess the trade-off between decreased I/O due to in-memory sorts and the increased likelihood of paging/swapping.
The cost-based optimizer sometimes favors Sort Merge Joins over Nested Loop Joins when you increase sort_area_size, thereby causing increased sorting. Set the value of sort_read_fac to the value of db_file_multiblock_read_count, to facilitate bigger merge widths (and therefore fewer sort runs.) In Oracle7 Release 7.2.2 and above, the load on DBWR caused by sort processes writing sort runs through the buffer cache is drastically reduced, because these releases allow you to bypass the buffer cache by setting the init.ora parameter sort_direct_write to true. Similarly, table scans and sorts will bypass the buffer cache, provided you set compatibility to 7.2.2 or higher.

Where possible, you should eliminate sorts. For example, you can presort data by using operating-system utilities and then load it into the table in sorted order, which allows you to use the NOSORT option to create the index and thereby bypass the sorting overhead of index creation.

Controlling Migration and Chaining

Disk I/O increases when Oracle7 must read multiple blocks to retrieve the pieces of a single row. This happens when rows are chained or migrated. A row is chained over many blocks when it is too long to fit in one block. A row is said to be migrated when updates to a row do not fit into the free space in the block and the entire row has to be moved to another block, leaving a pointer to the row in the original block. Migration and chaining degrade performance of updates as well as selects.

You can detect the presence of migrated/chained rows by using the ANALYZE command to generate statistics on a given table or cluster and specifying the LIST CHAINED ROWS option. This places each chained row into an output table you've created by running the Oracle-supplied script utlchain.sql. If this procedure identifies a large number of chained or migrated rows, it may be useful to consider rebuilding the table. To prevent as much migration as possible, you must set storage parameters (especially PCTFREE) carefully when creating tables. Chaining, in general, can be relatively difficult to avoid, since the Oracle7 data-block size is often the bottleneck. On hash clusters, however, an inappropriate hash function can cause chaining, so watch out for this.

TUNING DBWR PERFORMANCE

The utlbstat/utlestat.sql reports provide data on DBWR performance. This section addresses how you should interpret the data. Keep the following two key points in mind:

1. Many of these statistics are intended to provide feedback to dynamically adjust internal thresholds. In that sense, DBWR should be largely self-tuning, although your system may benefit from tuning related init.ora parameters. Therefore, although utlbstat/utlestat.sql reports a wealth of statistics, don't focus on each one but rather construct a general picture of DBWR activity.

2. The trade-off to keep in mind while setting init.ora parameters related to DBWR activity is that if DBWR does not clean out the buffer cache fast enough, foreground processes will wait for DBWR and performance can suffer. Some of the statistics reported by utlbstat.sql/utlestat.sql are detailed below, along with key indicators and suggestions for tuning actions.

Interpretation of DBWR Statistics

Summed dirty queue length: The total length of the dirty queue after writer requests have completed.

DBWR timeouts: Incremented on DBWR timeout if there was no DBWR activity since the last timeout.

DBWR make free requests: The number of messages received requesting DBWR to make some more free buffers for the LRU.

DBWR free buffers found: Tracks the number of buffers DBWR found already clean when requested to make free buffers. Divide by DBWR make free requests to find the average number of reusable buffers at the cold end of the LRU queue. Note that this is not incremented when the LRU is scanned for any purpose other than a make-free request.

DBWR LRU scans: The number of times DBWR does a scan of the LRU queue, looking for more buffers to write, including times when the scan is to fill out a batch being written for another purpose such as checkpoint. Thus, this value will always be greater than or equal to DBWR make free requests.

DBWR summed scan depth: Oracle7 adds the current scan depth (an internally maintained parameter) to this statistic every time DBWR scans the LRU for dirty buffers. Divide by DBWR LRU scans to find the average scan depth.

Dirty buffers inspected: The number of times a foreground process looking for a buffer to reuse encountered a dirty buffer that had aged out through the LRU queue.

DBWR buffers scanned: The number of buffers looked at during scans of the LRU for dirty buffers to make clean. Divide by DBWR LRU scans to find the average number of buffers scanned. This count includes dirty as well as clean buffers. The average buffers scanned may be different from the average scan depth, due to write batches that fill up before a scan is complete.

Key Indicators and Tuning Suggestions

Among the statistics reported above, you can use three sets of statistics to gauge whether DBWR is working hard enough. First, dirty buffers inspected gives you the number of buffers seen by a foreground process on the LRU list; its value should be low (close to 0) if DBWR is keeping up with foreground processes. Second, the average dirty queue length, computed as summed dirty queue length divided by write requests, indicates the average number of buffers on the dirty list after writing completes. If this is not lower than _db_block_write_batch (check the value of this parameter by querying X$KSPPI), try increasing _db_block_write_batch. Third, DBWR summed scan depth should be close to DBWR buffers scanned if DBWR is keeping up with the foreground processes.

Gita K.Gupta is a senior technical staff member for Oracle's Open Systems Performance Group. Other members of the Open Systems Performance Group provided valuable feedback to this article, and members of the Oracle RDBMS Development Group contributed as well.

This is a copy of an article published @ http://hayden.home.mindspring.com/

Tuning the Oracle7 Database for Optimal Performance