Object Relational Reality Check

Seth Grimes

After the universal server debacle, object/relational proponents are rethinking their development and marketing strategies. But in the end, the availability of tools and applications will continue to determine O/R viability

Are object/relational DBMSs (ORDBMSs) "the next great wave"? I like them and use them myself, but my answer is no. They may wash up on the beach over the next few years, but they're no tsunami. The explanation lies in the nature of object/relational models as they exist in the context of a broader computing trend that emphasizes distributed objects.

No one would claim that object/relational (O/R) database technology is mature, but even so, it hasn't advanced far beyond promising. Several major vendor offerings appeared in 1997--the first notable commercial ORDBMS, Illustra, was launched in 1993--but large holes remain in all of them. Each of these ORDBMSs--Oracle8, IBM DB2 Universal Database, and Informix Universal Server--implements object extensions to the relational model incompletely and differently. Few O/R-capable design and development tools are available, and a mainstream market for O/R-based applications has been slow to develop. As a result, O/R vendors are rethinking their marketing and development strategies in light of the Internet-fueled boom in distributed object computing, while pure RDBMS vendors (such as Sybase and Microsoft) have chosen to hang back or deploy object support via middleware outside the DBMS.

In this article, I'll briefly explain how object/relational concepts are implemented in various products and describe some alternative approaches that provide O/R views of relational data. I'll then go on to survey ORDBMS tools and some key application areas.

CONCEPTS AND IMPLEMENTATIONS

RDBMSs displaced antecedent flat-file, hierarchical, and network database systems because their flexibility and interfaces insulated application code from physical storage schemes. Object/relational systems are based on familiar relational tables, design principles such as normalization, methodologies such as entity-relationship diagramming, SQL, and call interfaces such as ODBC and JDBC. They seek to move beyond relational systems by extending them with user-defined types and functions that model business objects more naturally and encapsulate business rules with data. Extended base types include large objects, rows types and nested tables, collection types, and reference pointers. O/R systems implement features in compliance with ANSI SQL-92 and emerging SQL 3 standards. Key O/R concepts are thoroughly covered elsewhere, starting with Michael Stonebraker's seminal Object/Relational DBMSs: The Next Great Wave (Morgan Kaufmann, 1996).

There are, of course, alternative paths to RDBMS modernization. Object DBMSs (ODBMSs) are one, although many dismiss them as elegant underachievers that will never displace relational and O/R systems. Furthermore, distributed object computing centered on application/

transaction servers may relegate database services, however they are provided and whatever their database models, to a lower-profile role.

Which of these technologies will be the "path of enlightenment" to object/

relational views: ORDBMSs, ODBMSs, or distributed objects? The answer is based on the four keys to market acceptance of database technology: implementation, performance, tools, and applications. (It's a bit of a stretch, but let's pretend we live in an ideal world where superior marketing of inferior products isn't enough.) But first, let's review the current implementations on the object/

relational side of the equation.

Oracle. Oracle's comprehensive vision of the network computing architecture (NCA) is compelling: distributed object computing through CORBA object extensions at the client, the application-server middle tier, and the back-end server. Does it matter that only the application server middle tier has been fully realized thus far, and that Oracle has been plagued by missteps such as Sedona--the never-was paragon of applications tools--or the more-expensive-than-a-PC network computer? No, because controlling nearly half the server RDBMS market gives Oracle nearly the same breathing room it would enjoy if it owned 90 percent of the desktop operating system market.

Oracle's approach to developing a "universal server" in the form of Oracle7.3 was to bundle several diverse data servers under a single name. For Oracle8, its ORDBMS entry, these diverse servers have been transformed into data cartridges--"cartridge" being Oracle's term for an object extension module that plugs into the NCA data or application tier. Currently, Oracle offers five data cartridges: ConText, Image/VIR, Spatial, Time Series, and Video. For general object support, Oracle adopted the tactics of its desktop-monopolist kindred spirit Microsoft: announce, deliver just enough and hope that no one notices the emperor's new clothes are a bit thin, and plan to deliver down the road what has been promised--but only if the market bears out.

According to David Ensor and Ian Stevenson in their book Oracle8 Design Tips (O'Reilly, 1997), "At this point in time, we can't recommend the use of Oracle8's current object support within any normal enterprise applications. The support itself is incomplete; the syntax is complex; and very few of the commonly used toolsets can cope with it even in pass-through mode (let alone actively support it)." Indeed, Oracle8 incompletely realizes the O/R model: It offers encapsulation, but only through PL/

SQL methods. Nested tables are limited to one level--a problem for applications that might model, for example, customers who place orders that contain line items--and it does not support inheritance. Functions can be overloaded, but there is no dynamic polymorphism. External procedures, written exclusively in C, can be called from PL/SQL but not SQL. And calls to external procedures are, of course, much more expensive than calls to procedures linked to the server, but Oracle does not offer the latter for supposed security reasons. Improvements, including a Java Virtual Machine (VM), are expected in version 8.1, slated for late-'98 release.

Informix. Antoine de Saint-Exupery's The Little Prince opens with a "true story" about a boa constrictor that swallows an elephant. Thus did Informix, the upstart that displaced Sybase a while back as Oracle's main client/server database rival, swallow Illustra, Michael Stonebraker's pioneering ORDBMS. This deal appeared at the time to be a marriage made in heaven: a union of an innovative architecture for object extensions with the fastest, most scalable mainstream relational engine then available.

Illustra's optimizer handles user-defined types, C, SQL server functions, and r-tree spatial indexes as well as conventional relational data; the system supports DataBlade extensions integrating full-text, geospatial, media, time series, and Web data management. Prior to the merger, Illustra had been selling like hotcakes for financial and Web applications, the latter due to its (for the mid-'90s) cutting-edge Web-DB Applications DataBlade. And Informix's mainsteam Online Dynamic Server (now Informix-Dynamic Server 7.x) addresses symmetric multiprocessing and parallelization better than any other mainstream RDBMS.

Informix-Dynamic Server is the best object/relational database technology currently available, although key features are missing from the current 9.13 release such as replication, multiple inheritance, Java server extensions, and robust support for collection types. I've also struggled mightily with the optimizer to get ported Illustra code to run as well as it did natively. These problems should be fixed with the 9.2 release, due in the summer of 1998, and interesting new query-interface and facilities providing data-management and transaction services to external objects are supposedly on their way. But unification of the O/R Universal Data option with the warehousing-oriented Extended Parallel option is further away.

IBM. IBM is the only company with the technology and resources to challenge Oracle's dominant position in the DBMS server market. Through new technology releases, ports to Windows NT and non-IBM Unix variants, and aggressive marketing of DB2 Universal Database version 5, IBM is achieving a respectable position in the open-systems DBMS market.

Unfortunately, I can write about DB2 Universal Database only based on its reputation as a solid ORDBMS implementation that rivals that of Informix. The only complaint I've heard from a developer is the absence of support for SQL callbacks from user-defined functions.

DISTRIBUTED OBJECTS AND VIRTUAL ORDBMSs

Strictly speaking, Sybase doesn't offer an ORDBMS. Rather, its database strategy, as implemented in Adaptive Server and the Adaptive Component Architecture, has three parts: First, Adaptive Server will federate diverse data management engines behind a unified interface and embed a Java VM in the DBMS. Sybase will implement rich data types as Java classes. The Java objects will be stored in conventional relational tables but managed by the Java VM; the server will be accessed through JDBC. This approach has been proposed as part of SQL3. (See Jim Melton's "Coming Attractions: The SQL/Java Interface," March 1998, page 72.) A compliant server essentially will act as a container for Java objects by internalizing the Java VM, layering an object/relational view on a conventional RDBMS. Presumably, the fact that Adaptive Server is a virtual ORDBMS will be transparent to application-development tools.

Second, third-party engines such as the FAME time-series manager, Vision International's Spatial Query Server, Virage's Image Engine, and Verity full-text search will provide extended data types. Sybase claims that its DirectConnect bridges Adaptive Server to dozens of external data stores.

Third, Sybase is not only grappling with management of database objects, it is also endorsing the Jaguar Component Transaction Server (CTS) as a middleware server combining transaction monitor and application server functions. I have not seen any information, however, explaining how Sybase intends to make the three parts of its strategy--embedded Java objects, externally managed complex data, and external transaction services--work together.

Microsoft's approach is similarly based on an object view of relational data. The company has a lot on its plate and no announced plans to create a "true" ORDBMS. Its immediate task (in the DBMS realm) is to launch SQL Server 7.0, which is expected (at press time) to bundle the Plato OLAP server with the most robust version of the SQL Server relational database engine yet available. However, selling SQL Server for large-scale, enterprise deployment beyond the organizations that use it by default will be a big marketing job. Its success will be tied to upgrading Windows NT for 64-bit, larger-scale SMP and clustered computing.

I don't get the impression that many users find SQL Server or Windows NT's allure irresistible for media management, time series, or geospatial processing--the most compelling O/R applications--although Microsoft is making significant progress on the Web. Given Microsoft's attempts to extend the Component Object Model (COM) for heterogeneous platforms, distributed computing via Microsoft Transaction Server--and inferring from the path taken by SQL Server's birth parent, Sybase--my guess is that Microsoft will eschew moving beyond SQL Server's ability to call external user code toward true object/relational support. Microsoft will instead continue to promote the OLE DB API for "universal data access," extending it to new types like text, rather than attempting to provide universal data management.

The near- and medium-term effects of Microsoft's strategy on the direction of O/R are slight. In the longer run, the success of O/R databases vs. O/R views may well be decided by the scope and penetration of COM and OLE DB, which in turn will be affected by the success of JavaBeans, JDBC, and Java-embedded SQL.

OBJECT AND POST-RELATIONAL DBMSs

What about ODBMSs? In theory, O/R (and relational) databases can be modeled and managed by them, so why bother with the more restrictive models?

Object databases didn't displace RDBMSs in the late-'80s and early-'90s because, at that early stage in their lives, they were little more than persistent stores for C++ and Smalltalk programs, and their APIs were limited to bindings to object-oriented programming languages. Eventually ODBMSs "grew up," gaining SQL interfaces as well as applications tools. But during this same time, vendors were rewriting RDBMS engines as multiprocess, multithreaded, parallel-query powerhouses, and the impedance mismatch between relational result sets and record-oriented procedural programming--and the incongruity of relational tables and application objects--were bridged by many application tools, that mitigating the data modeling advantage of ODBMSs. Furthermore, recent RDBMS and ORDBMS releases offer bitmapped, r-tree, and other indexes of great usefulness in data warehousing and management of geospatial data in addition to the usual b-trees and hashes. They now sport cost-based optimizers and allow data spaces and data tables to be partitioned (fragmented) across devices and even across servers.

Stonebraker likes to call Computer Associates a "home for retired software," perhaps an expression of disappointment that his Ingres RDBMS faded into relative obscurity with CA. Ingres never realized his vision of an object-extended relational DBMS despite the innovative object and knowledge management modules delivered in the early '90s, modules that constituted the first mainstream attempt at an ORDBMS. Object management, in particular, let users extend Ingres with user-defined types and methods, although the complexity of the C code required for implementation made them all but unusable. But CA is not the complacent cash-cow milker suggested by Stonebraker's characterization: its Unicenter-TNG enterprise-management offering and new Jasmine object database are hardly slacker products.

Ardent Software--recently formed by the merger of O2, UniData, and Vmark--is another vendor worth discussing. UniData is characterized as a "non-first normal form" (non-FNF) or "nested relational" database supporting nonatomic attributes--which would normally be disallowed by the first normal form--in the shape of repeating groups, hierarchies, and arrays. ORDBMSs support these nested relations as row types or nested tables, offering scalability far beyond UniData's. Ardent also sells Vmark's UniVerse RDBMS, which is oriented toward migration from Picklike non-FNF DBMSs like UniData to the relational model, and an object DBMS, O2. It's pretty clear that UniData and UniVerse are not ORDBMS killers--although O2, marketed as a "universal object server," is quite an impressive product.

InterSystems's Cach� is another example of a post-relational DBMS. Cach� looks and acts like a relational system but is based on a multidimensional data model optimized for complex transaction processing. Cach� has garnered enough standard analyst raves to merit a mention but nonetheless suffers a mindshare problem outside the healthcare arena.

I've reviewed a number of object and post-relational DBMS alternatives to ORDBMSs, but despite strong technology, standardized interfaces, relational compatibility, and the availability of a number of compelling applications, I don't see these alternatives supplanting relational or O/R systems. They don't provide significant functional advantages, certainly not enough to displace entrenched vendors like Oracle, IBM, and Informix. Distributed objects, however, present a more viable alternative.

DISTRIBUTED OBJECTS AND PERFORMANCE

I've claimed here that application-centric, distributed object computing systems can potentially relegate database services to a lower-profile role. This claim stems from the ascendancy of object-oriented programming languages and methodologies and visual, object-based tools. I discussed two companies that are betting on this direction, Sybase and Microsoft, and both Oracle and IBM are hedging their ORDBMS investments. (Oracle's object-extensibility has in fact been most completely realized in the application server middle tier.) All four are, surely not coincidentally, major programming tool vendors.

Performance is the chief argument for ORDBMSs and OODBMSs. As Jim Gray and Andreas Reuter write in their classic Transaction Processing: Concepts and Techniques (Morgan Kaufmann, 1993), "Crack open any system engineering book. Look at the relative costs of operations like reading memory and doing interprocess communication. Following a pointer in core memory takes about a microsecond. Doing a lightweight RPC has about a thousand times more overhead. Would you rather conduct a conversation in the same room or through U.S. mail?" In other words, as David Jordan of AT&T and the Object Data Magagement Group (ODMG) told me, "Does the operation get sent to the object, presumably managed by an object server somewhere in the network (the ORB approach)? Or does the object get moved into the application and processed locally (the database approach)? There is a performance trade-off with each approach, moving the object 'here' vs. sending the operation 'there.'... The decisions about whether to use an object database or an ORB are not mutually exclusive. An object server within an ORB environment may actually use an object database for storing the objects that it manages." You can add to the equation the fact that, as Informix researcher Paul Brown explained to me, the communication necessary to synchronize servers grows by the square of the number of servers involved.

Performance considerations point to an ORDBMS approach. But databases are not ends in themselves: they exist to support application programs. Therefore, the availability of tools and applications will determine the viability of O/R approaches.

HAMMERING THE TOOLS

From a tools and applications point of view, object/relational models and interfaces matter more than the implementation details, and I would question whether an O/R feature is directly implemented or only simulated as a (relative) detail. Database design and software development tools are key to helping DBAs and programmers work with O/R models.

Database design. Several database design tool vendors have made the leap to object/relational, including Platinum, which recently acquired LogicWorks and the O/R-Compass tool; Silverrun, publisher of Universal Modeler; and Visio, which acquired InfoModelers and the InfoModeler tool. Other object-application tools, Oracle Object Database Designer and Rational Rose among them, create O/R database schemas. Other vendors support only the relational portions of Oracle8 and DB2, preferring to wait for the market to develop.

I have two criticisms of available modeling methodologies and tools. First, most database design tools do not adequately capture business processes and rules, which would probably require that they move to the Object Management Group's Unified Modeling Language (UML). Second, no current methodologies or tools assist users in deciding whether to deploy code capturing business rules (beyond strictly data specific rules such as referential constraints and domains) as ORDBMS user defined functions or externally in application objects.

Application development. Traditional database-applications development is based on 4GLs, embedded SQL or call interfaces such as ODBC or JDBC, middleware APIs, or abstract data objects layered on a lower-level interface. The O/R tool story, like the server story, is one of partial successes and promising developments.

Sun Microsystems's JavaBlend, in beta at press time and slated for a June '98 release, maps Java objects and query methods to database objects and SQL over JDBC and creates Java classes for relational tables. Sun claims that the JavaBlend programmer's interface is specifically designed to conform to the ODMG standard for object/relational mappings and object databases. JavaBlend will supposedly even "automatically infer inheritance relationships among tables or create them for Java subclasses." According to product manager Sally Sheward, the second release will map to O/R object extensions.

Here's a sample JavaBlend code fragment showing how database transactions and objects are mapped to Java constructs:

Transaction t = Transaction.create();
Customer c = Customer.selectElement("custID=123");
SalesRep s = c.rep; /* follow foreign key */
c.address = newAddress;
s.sales = s.sales + thisOrder;
t.commit(); /* write c & s back to database */

JavaBlend is similar to Ardent Software's Java Relational Binding (JRB), which maps Java classes to relational tables over JDBC, and Informix's Data Director for Java (DDJ), an application tool for the creation of O/R data-aware Java applications. There are several implementation differences--DDJ uses Java RMI between the client and application server, while JavaBlend will use OQL--and it appears that JavaBlend's transaction and query management support will be modest compared to DDJ's. Also, DDJ and JRB are available now.

Formida Fire software, including the Universal Development Environment (UDE), is the only O/R-capable third-party application development package available of which I'm aware. That Formida is the only vendor offering these capabilities is evidence of the O/R market's immaturity. Unfortunately, I don't anticipate commodity tools like Crystal Reports having any similar ORDBMS links any time soon.

APPLICATIONS

Users will migrate to ORDBMSs if in-house or third-party applications require object extensions. A number of important application domains have embraced ORDBMSs, but they do not include the domains that are the relational model's bread and butter.

Success stories. The areas where ORDBMSs have done well include management of media such as images, audio, video, and text, time series, geospatial data, and Web applications.

Media management requires the application of conceptually simple methods to unstructured data. By contrast, management of time series and geospatial data requires relatively complex data structures and analytically oriented methods. Web applications lie somewhere between, with unstructured data in the form of dynamic page templates but complicated methods to manage variables, page logic, and database queries. The adoption of extensible markup language (XML) will help by adding more structure to Web page templates and other documents.

Before the advent of ORDBMSs, time series, textual, and geospatial data were typically managed by specialized software with proprietary, nonstandard interfaces and query languages. Media (and frequently text) were and are stored in file systems but indexed by specialized engines. ORDBMSs brought facilities to all these application domains that let you manipulate their data with a standard query language that integrate queries on fielded and exotic data.

OLAP and data warehousing. OLAP is interactive, exploratory data analysis that centers on slicing and dicing multidimensional cubes and aggregating along dimensions. More advanced OLAP involves relating individual values to aggregates, as in ranking and proportions, and to other values, as in time-dependent calculations such as month-on-month change. OLAP is defined by what's done with data rather than how it's stored: you may store cubes in a specialized multidimensional database or an RDBMS, in the latter case represented in multidimensional form via a relational OLAP (ROLAP) tool.

Each of the ORDBMS vendors has an OLAP tool: IBM is integrating Arbor Software's Essbase with DB2 and bundling Cognos tools with Visual Warehouse, Informix sells the Metacube ROLAP tool based on technology acquired from the Stanford Technology Group, and Oracle offers Express (bought from IRI Software in 1995), whose roots stretch back to 1970. The goal of embedding OLAP functions into ORDBMS engines, however, appears distant.

Additions to native DBMS analytical capabilities beyond those supplied by, for example, geospatial and time series extension modules are spotty at best. There are good examples such as the Fillmore Group's SQL Expander DB2 Extender, which provides financial, date, numerical, conversion, and character functions for fielded DB2 data, and the Informix S-Plus DataBlade from MathSoft, which provides statistical data analysis, modeling, and visualization capabilities. Otherwise, I don't know of any OLAP vendor--IBM/Arbor, Informix, and Oracle included--moving its code into ORDBMS functions.

Perhaps vendors have no incentive, perceiving that their current relational and multidimensional DBMSs serve their applications well. But there may also be technical problems, such as writing efficient code to manage large, sparse, multidimensional data arrays with multiple dimensional hierarchies and a need for uniform, fast, cross-dimensional calculations. Similar perceptions and technical difficulties may hold for data warehouses, which are typically based on dimensional data models captured as a star schema or a variant such as a snowflake or constellation. These models rely on bitmapped indexes, which are available in conventional RDBMSs. (Gary Dodge and Tim Gorman's Oracle8 Data Warehousing (Wiley, 1998) claims that it "fully covers Oracle7 and Oracle8," but you won't find object types, nested tables, or any other type of Oracle8 object support mentioned in the entire book.)

ERP AND TRANSACTIONAL SYSTEMS

Enterprise resource planning (ERP) encompasses several applications critical to large-scale organizations: financials, human resources, supply-chain management, and so on. The ERP market is dominated by a only a few vendors--SAP, Baan, PeopleSoft, and Oracle among them. Their software is typically layered on top of an RDBMS.

ERP applications perform many transactional functions, which rely on normalization to ensure high concurrency and throughput by minimizing disk access and locking. Non-ERP, transactional applications such as reservations, sales, and customer service are another major RDBMS application area. In neither case have I heard of any vendor planning to move to ORDBMSs. Formida's Universal Database Enabler provides a way of wrappering SAP R/3 objects and integrating them into Formida applications, but this extension does not imply SAP uses ORDBMS facilities. The only news that raises questions for me is Baan's participation in the development of JavaBlend.

My conclusion: current ORDBMSs may be universal servers, but they are and will remain far from universally used. They can manage new types of data, but enterprise-application vendors seem quite happy with the handful of types and limited extensibility offered by RDBMSs.

WHAT ABOUT DISTRIBUTED OBJECT COMPUTING?

Distributed object computing, fueled by the explosive growth of the Internet, is the prevailing computing trend of the mid- and late-'90s: the true next great wave. Today's battle isn't over desktop operating systems or office productivity software--Microsoft will own those arenas until the next paradigm shift--and it's not over browsers or DBMSs either. The real battle is over object models: COM vs. JavaBeans vs. (if it's not too late) CORBA/IIOP.

There's actually a potential role for ORDBMSs at the center of a distributed network of objects. Given the robust user and transaction management facilities, scalability, and CORBA, COM, and Java RMI and Beans interfaces now or imminently available in ORDBMSs, they appear to be a natural choice for brokering object requests and managing heterogeneous, distributed transactions. There are signs that IBM and Informix are moving to position their DBMSs to provide these services. Doing so in order to wait out development of the ORDBMS applications marketing is especially critical for Informix. The results should be interesting.

Seth Grimes is a principal of Alta Plana Corp., a consultancy specializing in database design and software development for Internet and decision-support applications. You can contact him through altaplana.com or at [email protected].

Two Highly Subjective Scorecards

Here I offer two highly subjective scorecards, the first comparing object/relational technology with alternatives, and the second rating O/R implementations. The game's far from over, however, and the scores aren't final. I've assigned one, two, or three points to each cell.

Technology

Technology	Implementers	Mindshare	Tools	Applications	Projected relative performance
RDBMS	3	3	3	3	2
ORDBMS	3	2	2	1	3
ODBMS	2	1	2	2	3
O/R and object middleware	2	3	3	3	1

Implementations

Company	Product(s)	Engine	Object extensions	Object-extended tools and apps	Company position
Computer Associates	Jasmine	2	2	3	2
IBM	DB2 Universal Database v5	3	3	2	3
Informix	Dynamic Server/Universal Data Option	3	3	3	2
Microsoft	SQL Server, OLE DB	2	1	1	3
Oracle	Oracle8	3	2	1	3
Sybase	Adaptive Server	3	1	1	2

This is a copy of an article published @ http://www.dbpd.com/