By Bruce McCay
Oracle Corporation introduced the ability to replicate tables via snapshots in the first production release of Oracle7. This simple form of asymmetric, asynchrounous replication required the distributed and procedural options, and was fairly narrow in scope, limiting the breadth of its usefulness. Today, we find ourselves presented with a more dynamic, robust and flexible complement of replication features, substantially extending their usefulness and functionality.
This article outlines some of the new terminology associated with Oracle7's replication features, and provides a high-level overview of the configurations and capabilities of those features. While the symmetric replication facility also provides for job scheduling and the use of deferred transactions, the focus here remains on the replication aspects.
According to relational theory, among the requirements of a distributed database management system (DDBMS) is the ability to create and maintain multiple copies of source data on different CPUs. This is known as the replication characteristic, and there are two types -- synchronous and asynchronous. Synchronous replication (REAL TIME) is the ability to maintain more than one copy of the same data in different locations, and to keep the data identical at all times. Synchronous replication in pre Oracle7.1.6 releases required the use of database triggers or application code, and implementation/maintenance was sometimes problematic.
Asynchronous replication also maintains multiple copies of data, but they are updated as more of a batch process executed manually or at timed intervals. Versions of Oracle prior to 7.1.6, handle asynchronous replication via snapshots, which are read-only tables pointing to some table, view, other snapshot, or any combination of these objects. They are particularly useful for those objects whose values change infrequently. Snapshots are most often created on a local database in a client/server configuration to take advantage of the processing power of that CPU. Queries/reports are issued against the snapshot rather than the master object(s), increasing performance by bypassing the normal network traffic and overhead associated with those operations.
Snapshots can also be horizontally or vertically fragmented representations of some master object(s), essentially pre-screening non essential data for a given user, report or application. Being read-only, this kind of snapshot is sometimes known as "one-way," because it cannot update its source. This implementation is also called asymmetric replication.
Another kind of implementation is symmetric replication, which essentially represents an "update-anywhere" model, and is sometimes called N-way replication. The data source (MASTER), can actually be updated by its replicated subsets (SLAVES). While Oracle Corporation�s new symmetric replication is designed to implement N-way replication, other excellent features are also included in the product.
- Multiple copies of source data can be created and maintained at different sites throughout a distributed environment.
- Not only are data level changes propagated between sites, but structural schema changes such as altering a column or changing procedure code, can be propagated as well.
- Also included are conflict detection mechanisms for automating system supplied or user defined conflict resolution routines.
- Finally, data consistency between remote databases is guaranteed, including referential integrity, even if a remote site becomes unavailable.
The benefits of symmetric replication include avoidance of network traffic during updates, and the ability to continue to make local updates should the remote, master site become unavailable. It offers a means of providing some fault tolerance to applications by allowing for multiple master source sites. Changes continue to be maintained at the local sites until they are successfully propagated forward (usually as a separate, batch transaction). Within the replication facility exist many potential design configurations and combinations of configurations depending on the functional requirements for a given application.
Replication Components
Previously, we defined a snapshot as a read-only table pointing to some table, view, other snapshot, or any combination of these objects. This definition applies to a configuration called �primary site replication,� where the master site owns the data, and is the only site which can update that data. Snapshots may be simple -- pointing to a single master data set or master subset, or be complex -- combining master objects. Snapshots can be updated (refreshed) from the primary site master automatically at a specified time or time interval, or manually, by executing the DBMS_SNAPSHOT.refresh procedure or the DBMS_REFRESH.refresh procedure. Automatic refreshes require at least one snapshot process on the instance where the snapshot resides, and are controlled via the following init.ora parameters:
- snapshot_refresh_processes: Specify an integer designating the total number of processes to be allocated Process names are SNP1, SNP2, etc. Default is 0, maximum is max 10.
snapshot_refresh_interval: Specify (in seconds) how often the process(es) should wake up to see if refreshes are scheduled. Default is 60, range is 1 - 3600.
- snapshot_refresh_keep_connections: Default is FALSE which signifies that remote connections made in order to refresh snapshots are to be closed after all refreshes for that wakeup cycle are completed.
To avoid complete rebuilds of a snapshot whenever it�s refreshed, a snapshot log can be created which results, essentially, in an incremental update of the snapshot ("fast" refresh). Snapshot logs must reside on the instance containing the master table and may point only to a single table. Snapshot logs may not be created on tables with no rowids (fixed tables) or on tables owned by SYS. Since the tables underlying snapshots and snapshot logs are �real� tables in the database, they should be accounted for in the physical model and implementation/distribution plan.
Snapshot logs implicitly create an AFTER ROW trigger on the master table (TLOG$_snapshot_name).. For pre 7.1 versions of Oracle this requires merging any similar trigger types for affected tables.
Distributed transaction processing (DML commands, commits, rollbacks, etc) requires additional mechanisms and cannot take advantage of snapshots in pre 7.1 versions. This is, however, accounted for with symmetric replication. Snapshot names should be limited to 19 characters or less to avoid name truncation, and support user-defined uniqueness. Since complex snapshots require complete refreshes, set PCTFREE to 0 and PCTUSED to 100.
Time related specifications in the CREATE SNAPSHOT command must use fractions.
Sample Snapshot Create Statement
If creating snapshot logs, do them first (on same machine as base object):
CREATE SNAPSHOT LOG ON tablename: creates a table "MLOG$_tablename" limit: one snapshot log per table. TABLESPACE logs: assign to a tablespace (defaults to "DEFAULT TABLESPACE") Create snapshot in instance where master object is to be replicated: CREATE SNAPSHOT testshot: name snapshot TABLESPACE users PCTFREE: first assignment, storage & space utilization parameters (assumes defaults if not cited) STORAGE (INITIAL 5M PCTINCREASE 0) REFRESH FAST: refresh type (fast/complete - required) START WITH TRUNC (sysdate) + 11/24: first refresh time after initial create - snapshots are always created and populated when the CREATE SNAPSHOT statement is first executed. �START WITH� specifies the first refresh time. Use TRUNC to strip the time out of SYSDATE & fractions to designate the time (this example says to do 1st refresh at 11:00 am). NEXT sysdate + (4*60)/24*60): subsequent refresh intervals. "NEXT" tells Oracle how often to execute each subsequent request (this example says every 4 hours). AS SELECT * FROM miaco.student@testremote: snapshot content (ANY valid SQL statement). WHERE degree_program = 'U' Remote database names must be fully qualified, and for automatic refreshes, database links must have been created using both the CONNECT TO and USING clauses. The CREATE SNAPSHOT command results in the creation of a table called SNAP$_snapshotname, and a view with the same name as the snapshot. To create snapshots, required privileges include CREATE SNAPSHOT, CREATE VIEW, CREATE TABLE, SELECT for any master objects not within your schema, and for simple snapshots, CREATE INDEX.
To create snapshots in another schema, you must have the CREATE ANY SNAPSHOT system privilege, SELECT privilege on master objects, and the schema owner must have the other privileges previously cited.
Read-only snapshots are separate from the symmetric replication facility and do not require any additional support. Snapshots can also be created using procedures in the DBMS_REPCAT package.
Snapshot Refresh Groups
Properly synchronizing two or more snapshots whose master objects have referential integrity relationships requires the use of snapshot refresh groups (production 7.1+). The DBMS_REFRESH package is supplied to handle setup and coordination of refresh groups. Integrity constraints must first be enabled for the master parent and child objects.
Use the MAKE procedure to group snapshots together for refresh:
make ( name IN VARCHAR2, list IN VARCHAR2, next_date IN DATE, interval IN VARCHAR2, implicit_destroy IN BOOLEAN DEFAULT FALSE, lax IN BOOLEAN DEFAULT FALSE, job IN INTEGER DEFAULT 0, rollback_seg IN VARCHAR2 DEFAULT NULL )
NAME: [Schema.]Unique group name (30 characters max). LIST: Comma delimited list of snapshots (entire list quoted). NEXT_DATE: Same syntax as START WITH for regular snapshots. INTERVAL: Same syntax as NEXT for regular snapshots. IMPLICIT_DESTROY: Deletes memberless refresh groups when SUBTRACT procedure is executed. LAX Set to TRUE only when moving a snapshot to a new refresh group (can belong to only one group at a time). JOB: May not be functional yet (use DEFAULT value). ROLLBACK_SEG: Explicitly assigns refreshes to a rollback segment. Following is an example:
make('MIACO1','testshot, bruce.bigshot',sysdate,'sysdate + 1');
Use the ADD procedure to add new snapshots to the refresh group:
add (name IN VARCHAR2, list IN VARCHAR2, lax BOOLEAN DEFAULT FALSE)
SUBTRACT removes snapshots from a refresh group:
subtract ( name IN VARCHAR2, list IN VARCHAR2 )
DESTROY removes an entire group:
destroy ( name IN VARCHAR2 )
Refresh group information can be found in USER_REFRESH (refresh groups and refresh intervals) and USER_REFRESH_CHILDREN (group member information). To manually refresh groups, use the refresh procedure in the dbms_refresh package:
DBMS_REFRESH.REFRESH('group_name');
Symmetric Replication Configurations
A "replicated environment" can be viewed as a logical group which includes a replicated schema, replicated objects, one or more master sites containing a copy of the replicated schema and any related snapshots. Each replicated environment must have a single "master definition site," which acts as an administrative control point for that environment. Additional master sites can be created, but must have the same name (schema) and contain the same objects as the master definition site. Replicated objects include indexes, synonyms and packages in addition to tables and views.
A "snapshot site" can contain the read-only snapshots previously discussed or updateable snapshots. Updatable snapshots must be simple snapshots, use AS SELECT * FROM syntax, have no GROUP BY or CONNECT BY clauses, not use subqueries in the snapshot definition, nor use joins or set operators, and may not reference LONG columns. Snapshots are defined as updateable by adding �FOR UPDATE� following the snapshot name (i.e., CREATE SNAPSHOT student FOR UPDATE).
The CREATE SNAPSHOT command for updateable snapshots results in the creation of a table called SNAP$_snapshotname, a table called USLOG$_snapshotname (to store rowid and timestamp of changed rows in snapshot), an after row trigger called USTRG$_snapshotname on the base table to insert changed rows into the snapshot log, and a view with the same name as the snapshot.
Conflict Resolution
A conflict detection utility is supplied as part of the replication facility. It is capable of identifying conflicts involving deletes, uniqueness and updates. Built in routines are supplied for resolving update and uniqueness conflicts. Other situations would require user defined programs. Built in resolution routines for updates depend on the replication configuration. For example, a single master site configuration with multiple, updateable snapshot sites could specify that conflicting values should be averaged, that snapshot site values should be discarded, or that master site values should be overwritten.
For environments with two master sites the default resolution could be to:
- use the minimum or maximum value
- define site priorities and use the one with the highest or lowest priority
- use the data with the earliest or latest timestamp
For configurations with any number of master sites:
- use the earliest or latest timestamp
- for numerics, add difference between old and new values to current value
- use minimum or maximum column values
Resolving uniqueness conflicts for single master sites include:
- discarding the transaction from the remote site
- appending global name to value
- appending sequence number to value
Resolving uniqueness conflicts for multiple master sites requires user defined programs. Multiple resolution routines can be specified within the limitations cited.
Design and Setup
Prior to implementing a replicated environment:
- Identify schemas which qualify for replication
- Identify schema objects to be replicated
- Identify master sites and snapshot sites
- Decide whether to partition objects to avoid update conflicts, or select a conflict resolution method (supplied or user defined).
- Create a "replication administrator" user.
- Use the DBMS_REPCAT_ADMIN package to assign privileges to the account.
- Create fully qualified database links between replicated sites using the administrator account.
- Create a "surrogate administrator" account at each remote site.
- Create private database links to each master site using the surrogate user.
Creating a master site or multiple master sites requires using procedures in the DBMS_REPCAT package. Decide on the master definition site.
- Define an empty, replicated schema for an existing schema at that site using the CREATE_MASTER_REPSCHEMA procedure.
- Define the objects for the replicated schema using the CREATE_MASTER_REPOBJECT procedure.
- Identify conflict resolution routines for tables, where appropriate, using the ADD_CONFLICT_RESOLUTION procedure.
- Invoke the ADD_MASTER_DATABASE procedure from the master definition site to create additional master sites.
- Schedule propagation changes using DBMS_DEFER_SYS.SCHEDULE_EXECUTION.
- Run the RESUME_MASTER_ACTIVITY procedure to activate replication environment.
Changes are propagated between sites using database triggers, deferred transaction calls and Oracle's two-phase commit. Propagating schema changes (DDL) requires using the ALTER_MASTER_REPOBJECT procedure from the master definition site. To add a master site to an active replicated environment requires using the SUSPEND_MASTER_ACTIVITY procedure in DBMS_REPCAT before invoking the ADD_MASTER_DATABASE procedure. Similarly, propagating DDL changes requires "quiescing" the environment (SUSPEND_MASTER_ACTIVITY) prior to initiating a change.
Replication features are set-up and administered via the 90 procedures (ten packages) which are activated by being licensed for the Replication Option and then executing $Oracle_HOME/rdbms/admin/repcat.sql. Thirty dictionary views are also provided for managing and monitoring all aspects of the replication facility.
Summary
Oracle has expanded its distributed capabilities significantly by introducing symmetric replication features with the advanced replication option. Not only can snapshots be made updateable, but multiple master sites can be created and maintained using a new group of special packages and by copying replication information across remote locations.
About the Author
Bruce McCay has been a consultant with Miaco Corporation for more than six years. He specializes in all aspects of database DBA, development and design with emphasis in Oracle products. He has consistently spoken at conventions and user groups in the US He can be reached at: +1.303.741.0381. Find MIACO on http://www.miaco.com for more information.
This is a copy of an article published @ http://www.ioug.org/