On the Server Side- OTJ

Guy Harrison
OTJ, Spring 1996

A DBA's Guide to Optimizing Oracle Sorts

Tips and techniques for avoiding free buffer waits and "sort-bound" systems while maximizing your Oracle sort performance.

Any text dealing with database programming techniques will devote substantial coverage to the implementation of sort routines. Efficient sort routines are fundamental to high-performance computing, and the search for "better sorts" is continuous. Sorts also play a major role in the performance of relational databases such as Oracle. Optimizing Oracle sorts can be one of the key factors in achieving a high-performance database system.

In this article I review the need for sort routines within Oracle, and offer ways to maximize your sort performance. I also concentrate on the weaknesses in the Oracle sort algorithms that can lead some Oracle systems to become "sort bound." I will conclude with some general principles for ensuring that your sort activity is optimized.

How and When Oracle Performs Sorts

Oracle may need to perform sorts under the following circumstances:

when you are creating an index
when an aggregation is requested via the group by or distinct keywords
when you request that the output be sorted via the order by clause;
when a join is performed but there are no suitable indexes (a sort merge join)
when you use the set operators union, intersect, or minus
when you perform a correlated subquery

Oracle sorts fall into two major categories: disk sorts and memory sorts. Two Oracle configuration parameters determine which type of sort is eventually performed: sort_area_size and sort_area_retained_size. sort_area_size sets the maximum amount of session memory available for all concurrent sort operations for an individual Oracle session. sort_area_retained_size specifies the maximum amount of session memory available for any individual sort. A single query may need to perform more than one sort (a good example is a sort merge from a nonindexed join), but it's unusual for more than two sorts to be required concurrently. If each sort can fit into a sort_area_retained_size and all sorts can fit into sort_area_size, then the sort can be performed in memory. Note that these parameters define memory allocated for each session, so even comparatively small values can translate to large amounts of system memory.

If the memory defined by sort_area_size and/or sort_area_retained_size is insufficient to complete the sort, a disk sort is necessary. Oracle will allocate a temporary segment and write intermediate sort results to this segment. Toward the completion of the sort, the intermediate results will be read back from the temporary segment and merged into the final sorted result set.

The temporary segment will be written to the tablespace that is defined as the user's "temporary tablespace" by the create user or alter user statement. In order to reduce fragmentation and contention for the system tablespace, it is important to create one or more dedicated tablespaces for temporary segments and set each user's temporary tablespace to one of these tablespaces.

The Oracle documentation states that the memory for sort_area_size and sort_area_retained_size is released once the sort is complete. However, this is somewhat misleading because the memory is not actually released back to the operating system. Rather, the memory is released into the session's local memory for reuse in future sorts. (See Oracle bug #214430, "Oracle doesn't return space to sort_area_retained_size," and documentation bug #218990.) The session must disconnect in order to return the memory to the operating system.

Measuring Disk Sorts

The following three important statistics are contained in the dynamic performance table v$sysstat:

sorts (memory)
sorts (disk)
sorts (rows)

These statistics report the number of sorts that were completed in memory, the number of sorts that required writes to disk, and the total number of rows sorted by both methods, respectively. You can obtain these statistics for a particular time period by running the utlbstat and utlestat utilities. The utlbstat/ utlestat performance diagnostic scripts are located in the rdbms/admin subdirectory of your Oracle installation. They are documented in most Oracle performance-tuning texts, as well as in the Oracle bulletin 100955.451, "Performance Tuning with estat/bstat." Alternately, to get a summary of sort activity since the instance began, you can use the following query:

select m.value "Memory Sorts",
 	d.value "Disk Sorts",
	r.value "Rows Sorted",
	round(d.value/decode((m.value+d.value),0,1,
		(m.value+d.value))*100) "% Disk Sorts"
  from v$sysstat m,
	v$sysstat d,
	v$sysstat r
 where m.name='sorts (memory)'
	and d.name='sorts (disk)'
 	and r.name='sorts (rows)'

Memory Sorts  Disk Sorts  Rows Sorted  % Disk Sorts
	9            19	        318394	     17

Unfortunately, these statistics are unreliable in some ports or versions of Oracle7.03. (See Oracle bug #184302, "No Sorts Statistics in v$sysstat [and v$sesstat].") They do appear to be accurate in versions 7.1 and 7.2.

Avoiding "Accidental" Sorts

Two things can cause Oracle to perform a sort that you don't really require:

Unnecessary use of the DISTINCT clause. The DISTINCT clause will almost always require a sort to eliminate duplicate rows. Some programmers like the DISTINCT clause so much that they use it in every SELECT. Unfortunately, many client-side tools will also throw in a DISTINCT for good measure. Sometimes the DISTINCT clause is necessary and unavoidable, but bear in mind that it does have a sort overhead, so use it only when necessary.
Using UNION instead of UNION ALL. The UNION operator sorts the result set to eliminate any rows that are duplicated within the subqueries. union all includes duplicate rows, and therefore does not require a sort. If you don't require that duplicate rows be eliminated, use UNION ALL.

Avoiding Sorts Via Indexes

If an index exists with some or all of the columns in the order by clause, Oracle may use the index to fetch the rows in sorted order, and therefore eliminate the sort operation. This can occur when there is no where clause that results in the use of a conflicting index, and when the columns to be sorted are not nullable (because any null values won't appear in the index).

Although using an index may eliminate the need to perform a sort, the overhead of reading index blocks and table blocks may be greater than the overhead of performing the sort. This often results in increased execution time. You can avoid both sort and table lookup overhead by creating an index that contains all of the columns in the select list, as well as the columns in the order by clause. Oracle can then resolve the query via an index lookup.

Table 1 shows an example of each type of query, and Figure 1 shows the relative execution times. As you can see, although using an index without a table lookup was the most efficient means of resolving the query, an indexed table lookup was less efficient than a full tablescan and sort. Your mileage will certainly vary based on the size of your table, power of your machine, and contention for resources. The important thing is to measure the performance of the various execution plans using sql_trace and tkprof (see Oracle7 Application Developer's Guide, Appendix B, Oracle Corp., part # A12716-A), and then decide upon the execution plan that best suits your application. You can then create appropriate indexes or use hints to promote the optimal plan.

Table 1



Type	Statement	Execution Plan

TABLE SCAN/SORT	select /*+ FULL (BIGEMP) */	SORT (ORDER BY)
(default)		empno,job	   TABLE ACCESS (FULL)
		from big emp
		order by empno,ename
INDEX SCAN/TABLE	select empno,job	TABLE ACCESS (BY ROWID)
LOOKUP		from bigemp		   INDEX (RANGE SCAN)
		order by empno,ename
INDEX ONLY	select empno,name	INDEX (RANGE SCAN)
		from
		    bigemp order by empno,ename

Figure 1

Harrison Figure 1

Avoiding Disk Sorts

Sorts in memory are substantially faster than sorts that require disk I/O, so it makes sense to avoid disk sorts whenever possible. As I discuss in a moment, aside from the overhead of writing to the temporary segment, there are a number of bottlenecks you may encounter with disk sorts. This can result in drastic performance problems, which makes avoiding disk sorts even more important.

The easiest way to reduce the number of disk sorts is to increase sort_area_size-- this is the amount of memory available for all concurrent memory sorts. It's usually not worth altering sort_area_retained_size, because it defaults to sort_area_size, and most queries require only a single sort_area_retained_size.

The higher the value of sort_area_size, the higher the likelihood that a sort can be completed in memory. However, the impact on your system's memory utilization could be dramatic. Because the memory acquired during a sort is not returned to the operating system until the session terminates, high values of sort_area_size can increase session memory requirements substantially and permanently, even if sessions perform large sorts only infrequently. Furthermore, you are likely to hit a point of diminishing returns as you increase sort_area_size because some sorts are not going to fit into memory without wildly excessive values for sort_area_size. Therefore, to reduce the incidence of disk sorts without consuming a disproportionately high amount of memory, increase sort_area_size only when you can detect a measurable decrease in the disk/memory sort ratio.

Disk Sorts and Free Buffer Waits

Disk sorts are worth avoiding simply because of their high I/O overhead. And, as if this I/O overhead isn't bad enough, a weakness in the Oracle disk sort algorithm can bring a busy database with frequent disk sorts to its knees.

When a disk sort occurs, the Oracle server process writes the blocks destined for the temporary segment to the buffer cache within the SGA. It is then up to the Oracle database writer(s) to write these "dirty" blocks to the files that make up the temporary segment. Later in the sort, the Oracle server process must read the blocks in the temporary segment back into the SGA.

This disk-sorting algorithm places a very heavy load on the database writers and the buffer cache. If, for whatever reason (I discuss some Unix-specific reasons later), the database writer cannot write dirty blocks out to the temporary segment as quickly as blocks are being read in by all of the Oracle servers, then sessions that need to write new blocks into the SGA will have to wait for the database writer. Therefore, they will experience a "free buffer wait." (See Figure 2.) These sessions may be performing sorts, or they may simply be trying to bring blocks into the SGA to satisfy a normal query. In severe cases, I've seen free buffer waits that account for more than half of the total instance-wide response times.

Figure 2 Harrison Figure 2

The "free buffer waits" event is recorded in the v$system_event, v$session_wait, and v$session_event dynamic performance tables. The utlbstat/utlestat utility can report on the events in v$system_event for a specified time period. Alternately, you can run the following query to determine the proportion of time that active sessions are waiting on free buffer waits in your system:

select sum(decode(event,'free buffer waits',time_waited,0))
	"free buffer waits",
	sum(time_waited) "all waits",
	round(sum(decode
	  (event,'free buffer waits',time_waited,0))
	/decode(sum(time_waited),0,1,
	  sum(time_waited))*100,2) "free buffer waits %"
  from v$system_event
 where event not in ('client message','pmon timer',
		'smon timer','rdbms ipc message')

free buffer waits	 all waits	   free buffer waits %
	973177	      5811657	      17

In this example, free buffer waits account for 17 percent of nontrivial wait times, and therefore contribute substantially to system response time.

Avoiding Free Buffer Waits

I only know of two ways to reduce free buffer waits in Oracle7.0 or 7.1: Either reduce the number of blocks being written into the buffer cache, or improve the efficiency of the Oracle database writes to the temporary segment. You can address the first issue only by reducing the number of disk sorts--either by increasing sort_ area_size or avoiding sorts altogether. To address the second issue, you can try the following tricks:

Map your temporary segment files across multiple devices or use operating system disk striping to reduce contention for the disk devices that contain your temporary tablespace.
If your system has multiple CPUs and the temporary tablespace is on multiple disks, increase the number of database writers.
Adjust database writer activity by increasing the number of checkpoints or by altering the undocumented parameter _db_write_batch.
If you are using Unix, create your temporary tablespace on raw devices and use operating system asynchronous I/O options.

The last recommendation stems from a significant decline in Oracle database writer performance when temporary tablespaces are on Unix filesystems. This topic is worth further discussion, so I'll tackle it in the next section.

DEALING with Unix

Unix inodes provide a "directory entry" for files on Unix filesystems. Most implementations of Unix provide a locking mechanism, based on the inodes, that prevents inconsistencies in the Unix filesystem cache. This locking mechanism prevents writes to a file while a read operation is in progress. As I explained earlier, it is common during a disk sort for the Oracle server process to be reading from the temporary tablespace at the same time that the database writer is trying to write to it. Therefore, there is a strong potential for contention among the inodes of temporary segment database files.

In a recent benchmark that involved multiple processes performing disk sorts, I observed that database writer processes were spending up to 80 percent of their time waiting for these inode locks -- dramatically reducing their throughput, and leading to very high free buffer wait rates. Therefore, you can try the following tricks to avoid these inode locks:

Create the temporary tablespace on a raw device that has no inodes and offers some additional I/O throughput improvements.
On at least one Unix port (such as Sequent running Oracle7.1.3), an undocumented configuration parameter (_direct_read=true) will allow Oracle to read from the database files without involving the Unix filesystem cache and without locking the inode (see Oracle bug #208135).

Sizing the Extents in the Temporary Segment

The default storage parameters for the temporary segment can also have a dramatic effect on the performance of disk sorts. In the benchmark that I previously mentioned, I noted a large number of the processes performing sorts were waiting on a "space management" lock. Observation and experimentation led me to conclude that contention on this lock occurs when a temporary segment requires a second or subsequent extent. When I rebuilt the temporary tablespace with extent sizes that were large enough to allow the disk sort to proceed in a single extent, contention for the lock was eliminated.

You can detect space management lock waits with the script in Listing 1. In this listing, Oracle session 23 holds the space transaction lock, and session 33 is waiting for it. Under normal loads, the lock is released quickly and contention is minimized. However, under heavy load, processes seem to "single thread" through this lock. If you observe contention for this lock, try increasing the default initial extent sizes for the temporary tablespace.

Listing 1



select sid,
	decode(lmode,
		0, 'None', 	/* Mon Lock equivalent */
		1, 'Null', 	/* N */
		2, 'Row-S (SS)', 	/* L */
		3, 'Row-X (SX)', 	/* R */
		4, 'Share', 	/* S */
		5, 'S/Row-X (SSX)',	/* C */
		6, 'Exclusive', 	/* X */
		to_char(lmode)) mode_held,
	decode(request,
		0, 'None', 	/* Mon Lock equivalent */
		1, 'Null', 	/* N */
		2, 'Row-S (SS)', 	/* L */
		3, 'Row-X (SX)',	 /* R */
		4, 'Share', 	/* S */
		5, 'S/Row-X (SSX)', 	/* C */
		6, 'Exclusive', 	/* X */
		to_char(request)) mode_requested
  from v$lock
 where type='ST'
/

 SID	MODE_HELD	MODE_REQUESTED
----	-------------	-------------------
   23	Exclusive	None
   33	None	Exclusive

Improvements in Oracle7.2

In version 7.2, Oracle made substantial improvements to the disk sort algorithm that address the problem of free buffer waits. A new parameter, sort_ direct_writes, allows disk sorts to be performed while bypassing the Oracle SGA buffer cache. Instead, the writes are buffered by areas of memory defined by sort_write_buffer_size and sort_write_buffers. By avoiding the buffer cache sessions, sorts no longer contend with other processes for free blocks in the SGA. This will usually result in better sort performance and will also reduce the impact of these sorts on the rest of the system (sort activity will no longer cause free buffer waits).

Figure 3 shows the result of a simple test under Oracle7.2. Setting sort_direct_writes to true eliminates free buffer waits for a 60,000 row sort. Therefore, overall elapsed time for the sort is reduced by 32 percent. It is important to know that sort_direct_writes is false by default. Therefore, I recommend that you set it to true if you observe any free buffer waits that are associated with sorting. The presence of this option also provides a strong incentive to upgrade to version 7.2 (or 7.3) if your system is currently suffering from sort-induced free buffer waits.

Figure 3 Harrison Figure 3

Some Basic Principles

I have explained how Oracle performs sorts, and how they can impact performance. In particular, I've discussed how the Oracle sorting algorithm can lead to free buffer waits and "sort bound" systems. I'll conclude with the following principles, which I hope will help you reduce the impact of sorts:

Create appropriate indexes. Use explain plan and tkprof to measure the difference between index-based and sort-based execution plans, and force the most efficient plan.
Create dedicated tablespaces for temporary segments and ensure that all users have this tablespace defined as their temporary tablespace.
Avoid disk sorts by setting an appropriate value for sort_area_size. Monitor the effect on disk sorts by querying v$sysstat. Avoid setting a needlessly high value and wasting memory.
Monitor your system for free buffer waits. If they occur, you can take the following actions: 1) tune your database writer(s); 2) spread your temporary tablespace across multiple physical disks; 3) create your temporary tablespace on raw devices (if running Unix); and 4) upgrade to Oracle7.2 or 7.3 and set sort_direct_ writes=true.
Size the extents in your temporary tablespace so that your sorts can be performed within a single extent.

Guy Harrison is an independent consultant working in Melbourne, Australia. He specializes in client/server development using Oracle and Oracle performance tuning. You can contact Guy via email at [email protected] or via the Internet at http://werple.mira.net.au/~gharriso.

This is a copy of an article published @ http://www.oreview.com/