DBMS

Backing Up the Oracle Enterprise. Strategies for managing the backup and recovery of Oracle databases.  By Derek Ashmore.
DBMS May, 1998

With the advancement and proliferation of client/server systems over the past few years, the number of database servers in most enterprises has grown enormously. The management of a small number of servers is not problematic, but as the number of servers grows, it becomes more and more difficult to administer databases at an individual level. Furthermore, during the migration from mainframe servers to other types of servers, backup procedures have become less standardized and often nonexistent. This article concentrates on ways to get a portion of these administrative duties (specifically backups and recoveries for Oracle databases) under control at an enterprise level. As much has been written about the technical aspects of backing up Oracle databases, I will concentrate on using those specifics to help develop and implement enterprise-wide strategies for backing up Oracle databases.

For technical background on how backup and recovery operations can be performed on Oracle databases, please see the accompanying technical sidebar.

In this article, I�ll discuss new features provided in Oracle8. I will then identify major characteristics of backup strategies that should be considered when forming an enterprisewide plan. I�ll identify qualities that an enterprisewide strategy should possess in order to be effective.

Planning Oracle Backup Strategies at an Enterprise Level

Oracle backup strategies should be conceived and implemented at an enterprise level. With the exponential growth in the number of servers that has occurred at most shops over the years, it is no longer possible to address backup concerns at a server level. Enterprise backup strategies for Oracle databases should be consistent, reliable, extensible, and usable.

Consistency is a key ingredient in an effective enterprisewide backup strategy. The scripts or programs used to conduct all kinds of physical backups and full exports must be identical (minor differences between different operating systems excepted) for all databases and servers, or the learning curve for new administrators will be tremendous. Consistency also reduces overall staff hours required to administrate backups. Backup scripts developed in house can be made consistent across servers by making them data and/or parameter driven.

Reliability is also essential. Backups from which you can�t recover are useless. Backups also need to be monitored for errors; ideally, this monitoring should be automated. Any errors need to be brought to the attention of an administrator immediately. To insure reliability, I recommend periodic and random restoration tests for all production databases.

Extensibility is critical, and refers to the ability to include new servers and/or new databases into the backup schedule easily. Many of my clients have created utilities that sense when a new database has been added to a server and automatically include it in the backup schedule. Without extensibility, the manual labor (and probability of error) involved in incorporating a new database into the backup schedule would increase. Database changes should be automatically detected by the backup package as well.

Usability is desirable from a resource standpoint. The more difficult the backup software for new people to learn and understand, the more it will cost to train people to maintain it.

The backup apparatus should also be robust enough to handle different scheduling requirements for each database. Speed of backup and application load are key determinants as to when an Oracle database can be backed up.

Speed of backup depends on the speed of the tape, network, and/or disk on which the backup is dependent. It is also dependent on the software performing the backup (for example, whether or not parallel streams are possible).

Application load (and server surplus capacity) needs to be identified and prioritized. For large databases, the backup will probably place a noticeable load on the server. The backup should be scheduled in such a way that peak transaction periods are avoided. If the application has periods where processing is more critical than others, these times should also be avoided.

It is possible with many systems to schedule a hot backup during online hours, as online processing is often much lighter than batch processing. Usually, there is a lot of surplus capacity when the online applications are active. Furthermore, most online applications issue fewer updates, deletes, and inserts than their batch counterparts. As a good rule of thumb, most production servers should be backed up once per day.

Resources, including personnel and hardware, must be dedicated to periodically testing backups and restores. Typically, a server to test the integrity of the backup does not have to have too much memory or CPU power. It does, however, need to have enough disk to restore your largest database.

Backup strategies should be driven by business requirements. Business requirements determine availability requirements for an application and database. Business requirements also determine database size, which in conjunction with hardware/software configuration, determines restoration time. I consider an application to have high availability requirements when there is no window for cold physical backups. Usually, this only happens when there is a high dollar cost to downtime.

Restoration time needs to be considered, as it contributes to lost service time. Your hardware/software configuration determines restoration time.

Characteristics of Backup Strategies

Backup strategies differ along the following dimensions: locality, storage media, toolset, database size and availability requirements. All my comments in this section pertain equally to Oracle7.x and Oracle8, except where specifically noted.

Locality

Locality refers to whether the backup occurs on the server where the database resides (local) or whether the backup occurs on a remote server via a network. Small shops with few servers tend to perform local backups. They are easier to create and administer. On a small number of servers, there are few benefits to centralizing backup hardware and software.

Remote backups are initiated and performed largely by a separate server via the network. Shops with large numbers of servers tend to have remote backups centralized on a few servers. Administration costs of remote backups are less as costs can be shared over many applications. Typically, you must use a high-capacity tape jukebox as the amount of data backed up by a centralized server far exceeds the capacity of most tapes.

I recommend that you install a second high-speed network for remote backups. As the volume of data transmitted through the network is likely to be large, online applications should not be made to compete with backups for network bandwidth. At most shops, all servers are typically in close proximity. This means that costs for a second network would probably be comparatively small.

Storage Media

Storage media refers to how backups will be stored in the long term. Most shops store backups on tape. In some cases, depending on transaction volume, recovery requirements, and/or size of the underlying database, the most recent backup may be placed on disk, which is typically faster to write to than tape. You may recall that archiving is inflated during hot backups. The disk backup typically shortens the time when the volume of the inflated archive is unmanageable. Usually, a tape copy of the disk copy is taken as well.

For local tape backups, each server has a tape drive, and operators manually change tapes periodically. For larger databases whose backups span several tapes, some sort of tape jukebox or carousel is helpful. Whatever tape device or subsystem is used, consistent clerical procedures to label and track correctly and to unload and load tapes periodically are required. I�ve seen many clients who do not standardize or establish procedures for cataloging and storing tapes. This is a dangerous practice.

Toolset

The toolset used to perform backups can be developed in house or purchased. As with most decisions, there are advantages and disadvantages to each of these approaches.

In my experience, utilities developed in house are usually cheaper to develop and maintain compared to purchasing a toolset (for example, Legato System Inc.�s Networker, IBM�s ADSM), although purchase price does not necessarily indicate an ultimate cost. Coding these utilities requires in-house expertise (scripting and DBA skills), however, which many companies don�t have. As homegrown utilities are typically tested less than commercial products and, thus, have a higher chance of error, the total cost of ownership should in some way reflect this increased risk.

The pricing structure for backup and recovery packages varies among vendors. Vendors I�m familiar with base price on a combination of the following: hardware platform, number of servers and amount of data in backup. Recently, for one of my clients, I priced the following configuration: four AIX Servers, two Novell Servers, two NT Servers, three Oracle databases, one SQL Server database, 30GB of data to be backed up, and one tape jukebox. I received prices ranging from $25,000 to $79,000 from several vendors. By contrast, homegrown scripts can be fully developed and tested in a few days to two weeks depending upon the skill of the developer with maintenance costs significantly below support fees charged by most vendors.

Because most developers try to minimize development time, most homegrown scripts are less robust than toolsets (although I�ve seen exceptions to this). Beware of the assumptions coded into homegrown scripts; I�ve seen several nerve-wracking assumptions coded into backup utilities developed in house. For instance, I�ve seen homegrown scripts that mistakenly assume that a tape is in the tape device (i.e. -- no return code or status check for cpio or dd utilities). I�ve also seen database file names hardcode into backup scripts. If a database administrator adds a datafile to a database and forgets to add that file to the backup scripts, the backup is incomplete.

I make the following recommendations for developing your backup scripts in house:

In my experience, two popular players on the backup market are Legato and IBM. Both products can use the Enterprise Backup Utility (EBU) for Oracle7.x databases to provide information about what physical files and volumes require backup. Both products also require a database extension, which interfaces with EBU at an API level. Under Oracle8, EBU is functionally implemented as �Recovery Manager.� As mentioned above, GUI toolsets are provided with Enterprise Manager to make these features easier to use.

It is worth noting that DataTools (which was recently acquired by BMC) has an alternative to EBU called �SQL Backtrack.� SQL Backtrack will interface with Legato and IBM products as well as with Oracle database software to perform backups. I�ve seen many shops use SQL Backtrack as opposed to EBU. This product should be considered if you are planning to purchase backup and recovery software.

Using commercial products enforces the consistency and reliability key to an effective enterprisewide backup strategy. As some toolsets can back up a database in parallel streams, backups can be more efficient than the less sophisticated homegrown scripts.

Both products require significant configuration and testing time. Configuration of both Legato�s Networker and IBM�s ADSM are not as simple as running setup.exe. Both products allow for �user exit� scripts for pre- and post-processing tasks and most implementations I�ve seen rely on this feature.

Do not assume that the purchase of a package means you do not need to test your restore capability. You must be aware of software version dependencies. For example, Oracle v7.0.16 is no longer supported by some toolsets. For databases which are v7.x, specific versions of EBU may be required.

I make the following recommendations for purchasing a commercial product to perform backups:

Database Size

Backup strategies also differ by size. Very large database (VLDB) strategies tend to be problematic in terms of backup or recovery time. VLDBs typically take more than 100GB of space. It takes hours with most types of disk or tape subsystems to back up and restore VLDBs; this makes finding a backup window problematic for most VLDBs.

If a VLDB has high availability requirements (for example 24 X 7 X 365), recovery time can be catastrophic. Due to the long length of recovery time for VLDBs, they are more likely to require a parallel server implementation to decrease the probability of a system crash, and thus, the need for a recovery. One way to decrease the possibility of a disk crash (one of the more common types of system crashes) is to employ some type of disk redundancy such as mirroring or RAID 5. I always recommend some type of disk redundancy for all production applications, including parallel server implementations.

I�ve seen some shops attach the server to an IBM mainframe (frequently referred to as an ESCON channel or �Channel attach�) and perform backup recovery operations over the channel to manage the large volume. It is interesting to note that this approach uses existing methods and procedures for backups that are already established in most mainframe implementations.

An incremental backup feature that drastically reduces time and space requirements for backups (see technical background sidebar, pg. XX) was incorporated in Recovery Manager. However, recovery using incremental backups requires a restoration of all incremental backups up to and including the latest full backup. This is because incremental backups only contain changed blocks. During recovery, all blocks would need to be restored. Hence for most platforms, no improvement in recovery time should be expected from incremental backups.

Availability Requirements

Backup strategies also differ by availability requirements. A system which must be available only during normal business hours is much easier to backup than databases that must be available 24 hours a day, 7 days a week. We can meet the unique needs of high-availability systems by altering our hardware/software configuration.

One way to ensure higher availability is hardware redundancy: duplicate some or all pieces of the server and increase the odds that two identical pieces won�t break at the same time. As disk crashes occur more often than other types of system crashes, one of the easiest and cheapest ways to increase availability is through disk redundancy. That is, use some type of disk mirroring or RAID so that there are multiple ways to access the same data.

Clustering technology allows multiple servers to share memory and disk. This type of technology provides redundancy for all other machine hardware besides disks (for example, controllers, backplane, memory, etc.). As with disk redundancy, we play the odds that all resources of one type (for example, controllers) will not fail at the same time. Oracle�s Parallel Server option takes advantage of clustering technology from a database perspective.

Some applications have an alternate server that is only used in the event of a system crash. Neither memory nor disk is shared between the production server and the failover server. Oracle supports this type of server with the �standby database� feature available in versions 7.3 and later. As implementing the standby database feature is complicated and requires sophisticated user-developed scripts, I recommend clustering technologies to achieve high availability instead.

Life Cycle

Backup strategies can be differentiated by production, development, or test environments (life cycle ) as well. For example, many shops back up development environments by doing full exports nightly. In many development (and event testing) environments, logical backups are sufficient, as it is usually more important to retain schema definitions (tables, indexes, etc.) than the actual data in those objects. Point-in-time recovery is rarely needed for development environments.

Recommendations for Disaster Prevention

The most common forms of failure that necessitate recovery are disk or controller hardware failures and human error. Many of my recommendations are directed toward preventing these two types of errors.

As disk failure is by far the most common type of failure, you should use disk redundancy of some type for most production databases. Mirroring is often handled at an operating system and/or hardware level. Mirroring involves having multiple copies of the same volume on different disks. If one of the copies becomes unavailable (because of disk failure), the operating system shifts all read/write request to the remaining copy(s). Mirroring, however, obviously increases your disk requirements.

RAID 5 is also handled at an operating system and/or hardware level. RAID 5 involves storing data on multiple disks (often called a RAID group). However, the data is stored in a way that the contents of a disk can be recomputed by the operating system in the event of a disk failure. RAID 5 has lower space requirements than mirroring, but, it is notoriously slow on write operations. This is a good alternative for applications which contain primarily read operations.

In order to achieve point-in-time recovery, it is necessary for at least one copy of a control file and one copy of any file in the active redo log group to survive a system crash. Hence, most production databases should have two to four copies of the files on separate physical disks. However, it should be noted that Recovery Manager can recreate the control file if you use a recovery catalog.

It is also necessary to test backups periodically. There have been many cases where a flaw in the backup procedure was discovered only during a recovery, a crisis time when personnel shouldn�t be learning contingency procedures for the first time. Some off-site arrangement for backups is recommended. An example of a company that provides off-site storage of backups is Data Base Inc. (www.dbi.com).

Safe and Sound

The new features provided with Oracle8 should favorably affect backup and recovery strategies across the enterprise. The introduction of incremental backup sets should make it easier to incorporate VLDBs into an enterprisewide backup strategy. Furthermore, the corrupted block detection inherent in backups managed by recovery manager will provide useful information to database administrators.

Even with the introduction of Oracle8, the fundamental goals and issues regarding Oracle backup and recovery strategies and the need to plan them at an enterprise level has not changed. The fundamental goals of an enterprisewide backup strategy are the following: consistency, extensibility, and usability. Without these three properties, backup and recovery procedures will be a constant source of aggravation, lost data, and lost money.


Derek C. Ashmore is a consultant at Delta Vortex Technologies Inc. He has more than 10 years of experience with design, tuning, development, and administration of relational databases. He has supported and developed Oracle-based applications for more than 5 years. You can contact him at [email protected].



This is a copy of an article published @ http://www.dbmsmag.com/