Performance Monitoring's Cutting EdgeHoward FosdickA monitoring tool must be able to lead the DBA straight to a problem across heterogeneous, highly distributed environments. How are the major vendors meeting the challenge?If Rip Van Winkle were an Oracle DBA, he'd only need to have taken a three-year nap to be astounded by the advances made in performance and system monitoring. He might be a little overwhelmed by how important performance and database management are to deriving full value from his corporation's applications and by how essential it is to monitor the Oracle database to ensure that it's available and working efficiently. He would have to learn about security, managing tablespaces and other database objects, proper placement of data on disk, and space management.Understanding current trends is essential to deciding upon your company's approach to performance monitoring and what tools you may need. Performance management occurs at several different levels. The highest level relies on systemwide or aggregate statistics that define the state of the system. This high-level monitoring takes place operating system-, DBMS-, and applicationwide. And the combination of these three high-level viewpoints should constitute a "macro" view that provides immediate information on the state of the system and on the general nature of any problems that may exist. A good monitoring tool will give you sufficient information to address any operating system-, database-, or applicationwide problems. For example, it should be able to alert you to turn on OS asynchronous I/O, increase the size of the database buffers, or adjust the application's batch schedule to fit your nightly window. In addition to these global views, you'll also need views to monitor individual programs, SQL statements, or queries. At this level, a good performance monitoring and analysis tool should lead the intelligent programmer or DBA straight to the source of the problem and provide sufficient information to help resolve it. It should provide details on a broad range of program or query behavior, such as locking traces, real I/Os, buffer hits, and the like. The market currently offers three types of performance-monitoring tools: those specializing in global views, those that focus on specific aspects of SQL statements or program performance, and those that integrate both views into a single tool. If you choose this third type, look carefully to see how it handles the transition from the global to the specific. For example, make sure it can trace a particular global problem down to the particular SQL statement or query whose behavior causes the problem. Performance-monitoring tools are currently able to measure and display a vast range of "counters" or performance analytics. Computer Associates' Unicenter-TNG, for example, was initially oriented toward operating system and network availability and management but has now added components such as an Oracle Agent to expand the product's purview into the realm of database availability and reliability. In contrast, BMC Software's traditional core strength has been database technology, and its Patrol products have grown to cover closely related networking and operating system problems. Platinum Technology also evolved from a core expertise in DBMSs and widened its scope by purchasing a large number of companies in the past two years that produced specific "point" solutions to operating system, networking, and database issues. Platinum is now busy molding these products into a more coherent, integrated whole. Oracle's products have always focused on database issues but necessarily cover some closely related platform availability concerns. Oracle's tools address several operating systems but target only Oracle's DBMS. TRENDS
The past few years have seen a variety of trends in database management and performance (see Table 1). One overarching trend is the shift from physically centralized homogeneous systems toward distributed systems on a variety of platforms. Most companies today expect to be able to oversee a shortlist of diverse platforms from advanced workstations at any physical location. Environments frequently consist of various operating systems--MVS, various flavors of Unix, and Windows NT--and a mixture of databases, including Oracle, Informix, SQL Server, and DB2. Networks also add to the blend, with companies mixing NetWare and LANs, wide-area TCP/IP networks, and mainframe networks based on SNA and VTAM.
Databases no longer function in isolation: They are bound up, inseparably, with operating systems and networks, with availability and performance issues cutting across all three areas. Companies want to move systems out to their users and design their computer support according to the companies' needs (not the other way around, as was necessary when expensive computer hardware dictated the plan). For this reason, monitoring tools must be able to handle heterogeneous, highly distributed environments. The tools from Computer Associates, BMC, Platinum, and Oracle represent a new breed of monitoring/availability tools that have evolved to meet most of these requirements only in the past several years. How have they done it? One key is that all their products are based on intelligent agent architecture, a design that places independent modules, or agents, on each monitored node in a distributed system (see Figure 1). These agents are responsible for collecting performance data at the node on which they run and reporting it back to a central collector node. Workstations running GUI tools then access the data from the collector node and present it to administrators, DBAs, and other support personnel. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
|
Separating remote agents from the collector mechanism can yield many benefits. For example, individual nodes only need to run the code or agents necessary to monitoring that machine. One node might run an Oracle agent, another a Sybase agent, and another both. Similarly, machines can run operating system- specific agents for platforms such as AIX, HP/UX, Solaris, and NT. Separating remote agents from the collector mechanism also ensures higher reliability in a distributed system. If a node goes down, only its agent(s) are out of the picture. The collector continues to oversee other platforms and agents. When the node recovers, it can "catch up" by receiving any queued messages, commands, or schedules from the collector node. Thus an agent architecture permits the greatest degree of reliability and flexibility possible in complex distributed systems. The relative "intelligence" of agents is a competitive issue among vendors. Some agents are merely data-gathering and reporting tools. CA Unicenter-TNG, BMC Patrol, Platinum POEMS, and Oracle Enterprise Manager and Performance Pack all go beyond these simple tasks by raising event alerts or triggers that notify administrators when predefined thresholds are exceeded. For example, if the Oracle log archive directory fills up, intelligent agents in the tools monitoring the event can either notify a person (through console flags, email, phone, or pager) or trigger a program that can automatically address the problem. Either way, the agents are able to take measures proactively before Oracle is affected. Ultimately, agents may develop a greater degree of independent decision making. After all, as a DBA, do I need to know at 3 a.m. that the log archive has filled up? Why not just have the intelligent agent run a corrective script to fix the problem, and then have it send me an email that I'll see when I log in first thing in the morning? The degree of "proactivity" in agents is as important as their intelligence. The current trend is to shift from simple monitoring to sophisticated independent action and the ability to address and correct an increasingly wide range of problems. The ultimate goal is to fully automate systems monitoring and correction through agents. Several competitive points also arise among vendors concerning the collector node. Can responsibility for the collector shift dynamically among nodes as performance or availability requirements dictate? What resources and overhead do the collector node and remote monitoring agents consume? Does the data reside in a standard database that administrators can query directly? A key feature in making agent architecture work well is the ability to schedule and monitor batch jobs and scripts remotely. The collector node needs to be able to create job schedules of arbitrary complexity in the distributed system, then communicate with the affected remote nodes, assign them their portions of the schedule, and ensure that they successfully execute the programs. Not only must the monitoring tool have a sophisticated job scheduler embedded, but this job scheduler must also address all the complexities inherent to distributed environments. For example, if a node goes down, the tool must know whether and when to reschedule the job, ensure that the job still completes, and notify administrators as to what happened and why. Intelligent agent architecture also yields design benefits in distributed scheduling. Individual agents are responsible for executing and monitoring their portions of the overall schedule on their individual machines: They can queue jobs for execution, organize scheduling, and reschedule jobs in a timely manner if the machine becomes unavailable for some reason or if a crisis occurs. EXPERTISE ISSUES
Performance measurement, management, and improvement is typically a high-expertise endeavor. DBAs and support personnel often work with product-specific terminology. How well a tool masks this complexity--by explaining terms or concepts and rendering the environment easier to manage--is a key aspect of product differentiation. Of course, the trend is toward having software rather than humans manage system performance, reducing the need for expertise. For example, if a system is intelligent enough to see the need for a data reorganization, generate the script to achieve this task, schedule the script, and oversee its successful completion, fewer personnel will need to have intimate knowledge of reorganization utilities.
UNICENTER-TNG
Computer Associates first gained recognition as leader in providing add-in software for data center management back in the mainframe era. In the 1990s, it expanded its purview to cover almost all the common operating systems as well as the networking components of distributed systems. Recently it has expanded into database monitoring and problem resolution as well.
The company tackled the challenge of remote database management with its Oracle Agent. As Table 4 shows, this intelligent agent handles a number of common database problems in the areas of table and tablespace use, sequence numbers, disk I/O, workload balancing, and licensing. As with competing products, Oracle Agent gathers key monitoring information from Oracle's dynamic V$ tables. It measures parameters in dba_users, dba_tablespaces, dba_extents, dba_free_ space, dba_sequences, dba_data_files, dba_ segments, and other areas.
A key distinguishing factor among products is configurability: How customizable is the tool, and how much expertise does customization require? The Oracle Agent is highly configurable and can be tailored as necessary for unique environments. Computer Associates' remote agent architecture relies on SNMP running on top of TCP/IP for communication. The tools use the management information base as their repository. Whether products employ widely recognized standards such as these wherever possible is another distinguishing competitive feature. While CA's tools are exceptional for monitoring and managing operating systems, their Oracle-specific capabilities are still evolving. The Oracle Agent focuses primarily on availability issues and threshold alerts. It offers no administrative interface for the DBA and little that aids database performance analysis and tuning. OEM AND PERFORMANCE PACK
As you'd expect, Oracle's tools have a different scope than those of its competitors; they monitor Oracle DBMSs only. However, they accomplish this task across different operating systems (and do an excellent job of masking operating system differences).
| ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
|
OEM's GUI enables four major functions: job scheduling (for scheduling and overseeing of remote jobs), event management (for remotely monitoring database events and raising alerts if thresholds are exceeded or problems occur), service discovery (for automating setup of the environment by discovering Oracle services throughout the network), and utilities (for database management, maintenance, and optimization). See Table 5 for a list of individual GUI tools.
Oracle emphasizes the fact that its OEM console permits the use of third-party plug-in tools and allows customers to integrate their own applications. Beyond OEM, Oracle offers a separate license for its Performance Pack, its basic tool for dynamic performance monitoring and tuning (see Table 6). This add-in integrates so seamlessly into OEM that many DBAs are not aware that a separate license is involved. Finally, Oracle offers additional systems management applications for specific database functions, such as Replication Manager, Media Server Manager, Parallel Server Manager, WebServer Manager, and Biometric Manager for sign-on via fingerprint.
MORE TO COMEThis brief review of two major products for monitoring, managing, and tuning Oracle databases illustrates how far such products have come in the past three years and how far they have yet to go. Even the largest, most aggressive vendors are challenged by the tremendous scope of the tasks before them in meeting customer needs. One thing's certain: If we gave Rip Van Winkle another three-year nap, he'd wake up even more astounded by how far we'll have traveled toward automated monitoring of highly distributed heterogeneous systems. Howard Fosdick is an independent consultant who works with Oracle, SQL Server, and
DB2 under various Unix flavors and Windows NT; he specializes in performance and availability
issues. You can reach Howard at [email protected].
This is a copy of an article published @ http://www.oreview.com/ | ||||||||||||||||||||||||||||||||||||||||||||