By Eyal Aronoff
There's one question to which I never seem to get a straight answer: "How is the performance of your system ?"
Invariably, the reply is something like: "Well, it's been in production for almost three years!"
One of the major benefits of the open systems environment is the ability to integrate third-party packaged solutions into the enterprise automation scheme. Packaged solutions have many attractions: they are cheap when compared with in-house development; the cost of continuous support and enhancement is divided between all the users; and the responsibility of keeping up with government regulations is shouldered by the vendor. On the other side of the same battlefield are the in-house solutions and the customized code. Ironically, after just one year of production support for either an off-the-shelf package or an aging customized application, we face very similar challenges.
There are several reasons for this similarity. One important reason is that the developers of the original system are no longer involved with the implementation. For both packaged implementations and production applications, management does not want to spend the money necessary to increase the level of in-house expertise. Often times they cite the following "business reasons": "The entire package cost us US$100,000, now we have to spend another US$500,000 just to keep it running?" or "We already spent US$2,000,000 on this project! I won't spend any more!" In both cases we reach a situation where there is very little in-house knowledge of the code and what the application is supposed to do.
The Challenges
Is there a problem? A batch job is taking more than six hours. You call Oracle support and the response that you get is "if it runs for six hours that's probably what it should take." Oh well!
The great forefathers: Many of us assume that the developers before us and specifically the developers of packaged software are very experienced Oracle people - surely they have tuned their programs to the limit!
If it works don't touch it: Management is reluctant to make any modification to the application because no one knows what the full impact will be.
"Here today, gone tomorrow" consultants: Many sites use consultants for installation and setup. Usually these consultants have no incentive to educate the permanent staff. When they leave, their knowledge leaves with them.
Where to Start
We have adopted a three-step approach to the performance management of production application and package software.
- Tune the most "offensive" SQL statements.
Our experience has shown that, in general, a small number of SQL statements are responsible for most of the database activity. The first tuning effort should focus on that small number of "offensive" SQL statements, analyze what they are doing and take corrective action. This eliminates the need to learn and understand the programs in their entirety. Unfortunately, the collection and analysis of executing SQL statements is not trivial. We noticed that the amount of work a SQL statement does seems to be in direct relation to the amount of logical reads and disk gets involved in its execution. These statistics can be found in the V$SQLAREA view. The full text of the SQL statement is found in the V$SQLTEXT view, stored in little pieces of 64 bytes each.
- Tune specific long-running applications and batch jobs.
Once the most offensive SQL statements were either tuned or accepted for what they were, we looked at long-running batch jobs and individual applications with poor response times. We tried using Trace and TKPROF. Some of our applications could not issue the "ALTER SESSION SET SQL_TRACE=TRUE" statement, and we did not want to make any changes in the production version of the application. The output of the trace was only accessible for users with privileges to the Oracle installation directory. When multiple developers and batch jobs generated trace information, it was sometimes very difficult to correlate a given trace file with the user that generated it. In many cases, specifically when tracing a batch job, the amount of output produced was so large it was almost impossible to sift through the information. Instead, by joining V$SQLAREA to the current executing SQL statements (found in V$SESSION) and the current opened cursors for the session (found in V$OPEN_CURSORS), we could specify criteria to retrieve only the most active SQL statements, as they were executed.
- Tune the access to individual tables.
In every application there is a set of common tables used throughout. Access to should be monitored regularly to determine how the tables and indexes are being used. The location and the condition of the tables and indexes should be reviewed from time to time to ensure load balancing and to avoid I/O bottlenecks. Oracle does not have any V$ view that contains the relationship between the SQL statements and the tables being used. To track that relationship, we had to come up with our own V$ view.
Case Study
To show the range of problems involved in tuning production applications and packaged software, we present some of the issues encountered in tuning an Oracle Financials implementation. Most of these considerations will apply to any implementation.
- Missing indexes
The number one cause of poor performance is missing indexes. Missing indexes mean that simple accesses to a table result in full table scans. Scans are both time-consuming and cause excessive I/O, since database blocks read for full table scans do not cache very well in the buffer cache. A high ratio between the "buffer gets" and the "disk reads" on individual SQL statements may indicate missing indexes. Another such indication on a broader level is I/O to an entire file which is characterized by a high percentage of multi-block read requests.
By collecting the most "offensive" SQL statements, we identified that the installation did not have an index for FND_PROFILE_OPTION_VALUES. This table is used every time a menu selection is made. Naturally, an index on this table significantly improved menu selection response time and reduced the overall workload of the system. In total, we ended up adding about 50 new indexes to that one instance.
- Indexing scheme does not match the business use
The original indexing scheme may not match your ever-changing business profile. As tables grow, the selectivity of the indexes changes. By tracking the distribution of work among all access paths to a specific table, you can find which indexes are used the most. To accomplish this, we collected all the SQL statements that access some of the main tables over a period of three days. Here are the results:
Some tables had too many indexes. In many cases, nobody knew why these indexes were created. We have identified indexes that were rarely used or not used at all. Those indexes could be dropped.
In some cases we found that only part of the primary key was used in the join clause. The result was that although the PLAN showed access by index, only the leading columns of the index were actually used, which caused non-selective index range scans. This usually meant adding a better qualifying join condition to the WHERE clause, or a change in the order of the columns in the index. In other cases, we identified indexes that contained columns with a few distinct values or that contained columns with a single distinct value. It was better to remove such a non-selective column from the index or change the order of the columns in the index to put the non-selective part of the index as the last column. For example, in an installation that uses only one ORGANIZATION_ID in the MTL_SYSTEM_ITEMS table, it is better to remove this column from the index completely.
- Table stagnation - the "high water mark problem"
The high water mark of a database segment is a pointer to the last occupied block in the last extent of the segment. This pointer is stored in the segment header block. When a table is scanned, Oracle only reads through the high water mark, rather than through the last extent. However, if all the rows are deleted from the table, the high water mark does not get adjusted. The result is the table is actually much smaller than it appears to be. The high water mark can be reset by truncating the table or by re-creating it.
We found that tables that were used as temporary data containers as part of a batch process (such as temporary, interim and interface tables), were especially susceptible to this problem. Such tables may benefit from periodic reorganization.
We identified that all the programs that accessed the GL_INTERFACE table were very slow although the table was almost empty. The reason turned out to be that the initial load of the system extended the table to over 100 Megabytes. After the load, all rows were deleted from the table, but naturally the high water mark did not change. This meant that every full table scan scanned over 100 Megabytes of empty space. After table reorganization, the response time of the SQL statements accessing the GL_INTERFACE table improved dramatically. We then incorporated the reorganization script into the end of the GL interface run.
- Index stagnation
Since indexes are always sorted, every update to a table that modifies indexed columns translates to delete-and-insert in the index. For example, if a name was changed from Adam to Bob, in the table the modified row will remain in the same block. In the index, the entry for Adam will be deleted and an entry for Bob will be inserted. The freed space in the index block that used to contain the index node for Adam may not be used again. Additionally, when rows are inserted into a table block, and there is no more room, a new block will be taken from the free list. This guarantees during a load the blocks get filled up before being flushed to disk. In an index, after the initial creation, when nodes are inserted into an index block and there is no more room, the block is split into two. Each new index block will contain half the nodes of the original block. This means that if the table uses a sequential key or the key contains the date, there is good chance that the index is always half empt.
Our monitor showed that when accessing FND_CONCURRENT_REQUESTS the number of buffer gets was very high although only a few rows were retrieved. We identified that the amount of free space inside the indexes was at 98 percent. This meant that most of the index blocks were empty. Once again we reorganized the indexes and the result was much less overhead for the Concurrent Manager, the developers and users that monitor the execution of their own jobs.
- I/O bottleneck
After we tuned most of the "offensive" SQL statements and reorganized many of the tables and indexes, we still suffered poor response time when accessing some of the large common tables. R*SQLab indicated that most of the I/O was directed to a small number of tables and indexes. Additionally, AdHawk Monitor showed that the I/O rate to the disk was approaching saturation (40-50 reads per second). We created scripts to move these tables and indexes to dedicated tablespaces on separate disks. We put the most active tablespaces on a RAID disk utilizing raw devices. This seemed to alleviate the response time problem.
- Multiple database writers
When monitoring the system with AdHawk Monitor during peaks of activity, we noticed what we call the "hiccup syndrome:" all database user activity would suddenly freeze for a fraction of a second and then continue. Additionally, the graph of the database write rate looked like a sawtooth, exhibiting very high peaks followed by relatively long periods of no activity. In many cases, the "hiccups" (the freezes in database activity) matched the peaks in the write rate graph. After increasing the number of buffer writers (setting the DB_WRITERS parameter in the initinst.ora to "4") the "hiccups" disappeared and the database write rate graph took on a haystack appearance. This also reduced the number of buffer busy waits and enabled us to run many more batch jobs simultaneously.
- Chaining
We noticed that the amount of buffer gets for SO_LINES and SO_HEADERS was rather high. We tried to reorganize the indexes, but that did not help very much. AdHawk Monitor identified that there were many chained reads when accessing those tables. Once again, reorganization of the tables solved the problem.
- Tuning views
Many implementations use views to simplify complex accesses to the database. The advantage of using a view is that you can tune the view without changing the application source code. Once again, we found it was difficult to get the view definition when needed, so we added this feature to R*SQLab. The idea is that if some of your "offensive" SQL statements use views, by tuning the view you can improve performance significantly. For example, if the view contains a "Group By" operation, it forces the optimizer to resolve the entire view before other Where clause conditions can be applied. If, instead, the Group By is moved to the Select statement, the response time may improve significantly
- Concurrent Manager configuration
In Oracle Financials, a set of processes called the Concurrent Manager acts as a batch scheduling system. The Concurrent Manager defaults may not be suitable for the needs of a particular implementation. R*SQLab showed that 12 percent of the instance resources were consumed by two SQL statements that scanned the MTL_TRANSACTIONS_INTERFACE and MTL_SALES_ORDERS tables. These statements were executed every minute by the Sales Order Interface program. However, in this implementation, sales were accrued during the day and were submitted to the Sales Order Interface program at night. By changing the setting in the Concurrent Manager to run the interface program once at night, we saved 12 percent of the instance activity.
Summary
Tuning production applications and packaged software is like walking on a tightrope. On the one hand, you do not want to make perturbations that shake the stability of the system. On the other hand, you want to improve performance. We have identified a three-step approach to accomplish just that. Unfortunately, the tools provided with a standard installation of Oracle -even with Enterprise Manager-do not give all the capabilities necessary to accomplish the job. We realized that third-party tools are needed and identified a few vendors that offer such products.
Using these products, we managed to accomplish multiple goals. We opened the "black box" and looked inside the guts of our package implementation (Oracle Financials). By doing so, we learned a great deal about what the programs were doing and how the tables and indexes were being used. We used reports and graphs to convince management to purchase more disk space. The additional disk packs provided us with more choices, which translated into better performance. System administrators and developers with little Oracle experience managed to tune the system with limited support from expensive consultants. Management achieved increased reliability and better predictability from the system.
About the Author Eyal Aronoff has been working with Oracle databases and tools since 1985. Eyal has performed benchmarking, tuning and capacity analysis for some of the world's largest Oracle databases.
In 1991, Eyal founded R*Tech Systems, Inc. to provide a range of products and services to Oracle projects. At R*Tech, Eyal headed the development of AdHawk Monitor, the Oracle performance monitor and AdHawk Spacer, the database layout manager and most recently, R*SQLab an SQL collection and tuning tool.
Eyal publishes regularly in the major Oracle magazines and newsletters. He is also an active speaker at user conferences. He was nominated "Best Speaker" at the East Coast Oracle (ECO) User Group 1995 annual meeting.
Eyal can be reached at:
R*Tech Systems, Inc.
+1.609. 279.0709
+1.800.774.4295 (US only)
Email: 75547,[email protected]
This is a copy of an article published @ http://www.ioug.org/