Although the Internet has existed for over 30 years, it was generally regarded as the exclusive playground of the military, research and academic worlds until very recently, when the World Wide Web and NCSA Mosaic emerged. The World Wide Web is a set of techniques, protocols, technology and software that greatly simplify access to the vast amount of information available on the Internet. Together with NCSA Mosaic, the first graphical interface to the World Wide Web, it sparked the incredible growth of the Internet because it made navigating through the Internet as simple as pointing and clicking with a mouse.
The Challenge of Managing Web-based Information
The exponential growth of the World Wide Web has not gone unnoticed in the corporate world. Corporations have started to realize its potential for information dissemination and electronic commerce and are scrambling to set up Web servers. Unfortunately, while putting up a pilot Web server is a relatively simple task, a number of challenges arise as these sites grow and start to be deployed in large-scale commercial projects. For instance, how does one store, manipulate, manage and retrieve large volumes of data? How does one provide secure access to information without compromise when there are 20 million potential users of this information? As corporations conduct electronic commerce over the Internet, how does one perform secure transactions and store results? As Web sites expand, how does one keep up with this growth and prevent their site from becoming an administrative nightmare?
The Oracle WebServer
Oracle is the world's largest information management company. We have been solving similar problems for corporate data since the early 1980s, before the World Wide Web was even conceived. Since Web-based data has a lot in common with corporate data, we feel that the Oracle database can be just as effective at solving these problems. To address this need, we have developed the Oracle WebServer.
The Oracle WebServer is a World Wide Web server with a tightly-integrated Oracle7 database. It is targeted at users whose Web sites have grown too large and too complex to be managed effectively using traditional servers such as the NCSA and CERN servers. It consists of the following components: the Oracle Web Listener, the Oracle Web Agent, and the Oracle7 database. The Oracle WebServer provides a number of benefits beyond those of the traditional Web servers in many areas, the most important of which are:
Easy Access to Oracle Data - There is a large amount of data stored in Oracle databases that customers want to make available via the World Wide Web. Realizing this, we have provided a mechanism by which data stored in an Oracle7 database can be easily and efficiently transformed into HTML documents for distribution via the World Wide Web. This feature, called the Oracle Web Agent, will be available in Release 1.0 of the Oracle WebServer. High-Performance, Scalable, Portable, Commercial-quality Web Server - The Oracle WebServer will include the Oracle Web Listener, a full-fledged HTTP server engine that implements version 1.0 of the protocol as described in the RFC from the IETF HTTP Working Group. The server engine is designed to be a very reliable and high-performance solution suitable for high-traffic and mission-critical applications. Utilization of the Database for All Aspects of Web Server Management - In Release 2.0, which we have been developing in parallel with Release 1.0, a key differentiating factor between the Oracle WebServer and traditional Web servers will be the ability to store Web objects, such as HTML documents and GIF files, or pointers to Web objects, in the database. Besides storing Web objects, we will also store the access log information and error log information in the database. Once the Web objects and other Web-related information are inside the database, Oracle's expertise in managing information can be put to work.
Although this paper will focus on providing technical information on the Oracle Web Listener and the Oracle Web Agent, two components of the Oracle WebServer Release 1.0, it will also touch briefly on Release 2.0 to give the reader a peek into the future of Web technology as we at Oracle see it.
The Oracle Web Listener
Why use a commercial Web server when free ones are available from CERN and NCSA? This is a question that anyone planning to put up a Web site is sure to ask at some stage in the planning process. The most obvious reason: a commercial site that is running a mission-critical application needs a commercially-supported Web server because any bugs encountered in the server need to be fixed in a reasonable amount of time.
Of course, the decision to go with a commercial Web server begs the question: which commercial Web server is most suitable to the application? In most cases, this can only be answered by looking closely at the architecture of the Web servers under consideration. This is exactly what we did when we were shopping around for a third-party commercial Web server to include in the Oracle WebServer. The server we ultimately chose met our criteria for high-performance, scalability, portability and manageability.
For maximum performance and scalability, the Oracle Web Listener is designed to run as a single process with a single thread, as opposed to most servers which start a new thread or process whenever a new connection is made. This design improves performance in two ways: First, there is less overhead at the operating system level because the need for task switching and memory swapping is minimized. Second, the act of starting a new process or thread is time consuming and can significantly affect the latency between when a connection is established and when the server can begin to manipulate the data.
Oracle has always been known for the portability of its products, so this was a key feature for us. The Web server we ultimately chose was designed with portability in mind. Over 75% of the source code in the Oracle Web Listener is shared between the Windows NT and UNIX versions. The code was also designed to be easily portable to Windows95, should we decide to release a version on that platform in the future.
Since we are targeting large, mission-critical Web sites, manageability was also an important factor. The Oracle Web Listener will include Simple Network Management Protocol (SNMP) support. In the first release, this will be limited to basic "I'm alive" messages. However, we will implement the different portions of the Management Image Base (MIB) for HTTP servers as it gets standardized.
The Oracle Web Agent
With approximately 35% of the relational database market, there is a huge amount of data stored in Oracle databases today. A significant percentage of this data would be even more useful and valuable if it could be accessed via the World Wide Web. To assist our customers with this task, we have developed the Oracle Web Agent. The Web Agent is a component of the Oracle WebServer that facilitates the creation of dynamic HTML documents via a language already familiar to most Oracle users: PL/SQL.
In addition to existing corporate data, the Oracle database could also be used to store data meant exclusively for the Web. If this data resides in tables in a relational database and simply gets transformed into dynamic HTML documents when accessed, it will be much simpler to manage and manipulate. Furthermore, dynamic documents provide a greater degree of customization, making it possible to tailor a document to the specific user requesting it. The Oracle Web Agent can be used to create these dynamic HTML documents quickly and easily in PL/SQL.
The Oracle Web Agent consists of the following:
A "C" program named owa which serves as the actual CGI program that the Web Listener executes when a POST or GET request is received for a dynamic document. This program makes OCI calls to log into the database and execute the user's PL/SQL stored procedures.
A PL/SQL package called wow which contains utility functions to set up the CGI environment variables, obtain the value of a specific environment variable, and extract the output generated by the user's PL/SQL code from a PL/SQL table.
An optional, but strongly recommended set of PL/SQL packages known as the WebServer Developer's Toolkit that ease the generation of HTML tags from within the user's PL/SQL code. HTP and HTF, which are two of the packages in this toolkit, automatically generate the HTML tags and put the document into the PL/SQL table where it gets retrieved by the OCI program called owa.
A set of HTML forms and corresponding code to ease the task of configuring the CGI applications written in PL/SQL with the Web Agent.
A PL/SQL development environment which allows the programmer to compile and submit code into the database directly from within an HTML form. This is a Release 2.0 feature.
The Oracle Web Agent provides an easy-to-use environment for building CGI applications in PL/SQL, which are then stored in the Oracle7 database. A programmer who has chosen PL/SQL as the language in which to implement a CGI application need only worry about implementing the logic specific to the application itself, because the Oracle Web Agent provides utilities which take care of the repetitive tasks associated with deploying a CGI application.
Why chose PL/SQL in the first place? Two reasons: First, PL/SQL is compiled and stored inside the database, thus PL/SQL code tends to execute faster than regular SQL code that needs to be parsed and interpreted before executing. Second, since PL/SQL code resides inside the database, it is completely portable to any platform that Oracle7 runs on. Porting PL/SQL code is simply a matter of moving the PL/SQL packages from one database to another in the same way that moving regular database data is accomplished. Step 2 Step 3 Step 5 Step 4 Step 1 Step 6
Web client
The figure above illustrates the sequence of events that take place in the Oracle WebServer when a typical GET request is received for a dynamic HTML document produced via the Oracle Web Agent. .The Web client issues a GET request for an HTML document which is actually a dynamic document produced by PL/SQL code residing in the database. .The Web Listener spawns a "C" stub program called owa. .owa makes OCI calls that log into the database and execute the PL/SQL procedure which produces the HTML document. It knows which PL/SQL procedure to execute because the name of the PL/SQL procedure is embedded in the URL. .owa takes the HTML document produced by the PL/SQL procedure from a PL/SQL table where the document has been placed by the HTP/HTF calls made by the PL/SQL procedure. .owa passes it back to the Web Listener via standard output. .The Web Listener passes it back to the client via the HTTP protocol.
To get a better technical understanding of the Oracle Web Agent, let's examine a number of it's key features in greater detail.
I. CGI Stub
Since PL/SQL procedures are stored inside the database, there is no way for a Web server to directly execute them when the request from the browser is received. Furthermore, there is no way for the PL/SQL code to send the HTML document generated to standard output, which is where the Web server is expecting it to be. Thus, there is a need for a program which handles these tasks. The Oracle Web Agent includes a "C" program called owa which does the following using OCI calls : .Extracts the database service name embedded in the URL and logs on to the database using the userid/password defined by the service. Database services are defined in a file called owa.ora. .Sets up the parameters passed from the Web server so that they are accessible from within the database through a PL/SQL function call similar to getenv() in UNIX. .Executes the user's PL/SQL function which extracts the relevant data from the database and formats it into an HTML document that is stored in a PL/SQL table. .Checks the return code of the user's PL/SQL function. If it is zero, it takes the formatted HTML document from the PL/SQL table and passes it back to the web server via standard output. If it is non-zero, it returns the standard error HTML document supplied by the user or a default if the user doesn't supply one.
Thus, it is not necessary for the PL/SQL programmer to be intimately familiar with how CGI works. owa takes care of all those details and allows the programmer to concentrate on developing the logic to extract the proper data from the database. The user's PL/SQL code need only make function calls to extract the values of the necessary CGI environment variables, query the database, and format the results returned from the database into an HTML document. Everything else is taken care of by owa.
Both the database service name and the user's PL/SQL function name are embedded in the URL, which is of the following format:
http://myhostname:port/.../myservicename/owa/myplsqlname?parameterlist
URL section Description
| http | denotes the protocol used to obtain the document from the WebServer |
| myhost | denotes the hostname or IP address of the machine the WebServer resides in |
| port | an optional parameter, this specifies the TCP port number that the Web Listener is listening on. If none is specified, it defaults to 80 |
| myservicename | denotes the database service name to use when logging on to Oracle7. The database service name immediately precedes 'owa' but doesn't have to directly follow the hostname/port section of the URL. |
| owa | must always exist. This instructs the Web Listener to spawn the owa executable |
| myplsqlname | denotes the name of the PL/SQL procedure that gets spawned when owa logs on to Oracle7 |
| parameterlist | If a '?' exists, any text following it will be passed by owa to the PL/SQL procedure it just invoked. This will only exist if the GET method is used. |
II. HTML tags
HTML documents are generated by inserting the necessary HTML tags around the text output which Web browsers can interpret. The HTP/HTF PL/SQL packages which are part of the Oracle WebServer Developer's Toolkit greatly simplify this task. The PL/SQL programmer need only call these functions at the appropriate place in the code with the actual text as a parameter. Then, the HTP (HyperText Procedure) and corresponding HTF (HyperText Function) take care of generating the correct tags around the text and putting it into a PL/SQL table.
It is important to note that the HTP/HTF packages do not eliminate the need for the PL/SQL programmer to know HTML syntax. The programmer must still realize that an anchor tag is needed to create a hyperlink, for example. What the HTP/HTF packages do is automate the process of creating these tags and putting the document in the PL/SQL table.
III. Development Environment
PL/SQL code is typically developed using SQL*Plus or an embedded development environment in tools such as Oracle Forms 4.5. Since CGI application programmers may not have access to these tools or may not be familiar with them, the Oracle Web Agent will provide a development environment accessible via a Web browser. The Oracle Web Agent will include an HTML form which will allow programmers to type in PL/SQL code. The form will have a push button to compile the code and submit it into the database. If there are errors in the compilation phase, these will be returned to the browser as an HTML document. This is a Release 2.0 feature.
IV. Error Handling
A CGI application written in PL/SQL can generate two broad types of errors: application errors and system errors. Application errors are specific to the application, and are generally meaningful to the end user. For example, if the query returns no rows, the PL/SQL programmer could add code to generate an HTML document that said, "The item you ordered is temporarily out of stock. Please try again later." These errors are transparent to owa because owa does not read HTML documents to determine their content. As far as owa is concerned, if the user's PL/SQL code generates an HTML document, the operation was successful. owa determines whether an HTML document has been generated by checking the return code from the original call to execute the user's PL/SQL program. If the return code is zero, an HTML document has been generated in the PL/SQL table.
System errors are errors encountered by owa itself. These are errors that prevent owa from executing the user's PL/SQL code, or failure of the user's PL/SQL code to generate an HTML document. A user's PL/SQL code signals it's failure to generate an HTML document by returning an error to owa.
An example of the first type of system error would be the failure of owa to log on to the database. An example of the second type of system error would be an error in the SQL query because the table does not exist. In the latter case, the programmer could write code to format an HTML document with the specific reason for the error, but it would be more convenient for the programmer if there were a standard HTML document that could be returned for errors such as these. The Oracle Web Agent provides a mechanism for doing just that. To avail himself or herself of this mechanism, the programmer returns an error to owa (see Section I, step 4). owa will either return a default error message, or return a predefined error message when it gets the non-zero return code.
The Oracle WebServer: The Future
Although the Oracle Web Agent greatly simplifies the development of CGI applications that need access to data stored in an Oracle7 database, we feel that Web servers can benefit from even tighter integration with the database. By this we mean the following:
Direct access to PL/SQL stored procedures
Although CGI works pretty well for small Web servers with relatively light loads, it doesn't provide adequate performance and doesn't scale well for Web servers that are heavily accessed. This is because CGI requires the Web server to spawn a separate process for each document requested. When Web servers start experiencing hundreds of simultaneous requests, the number of spawned processes can easily bog down the machine. Realizing this, in Release 2.0 of the Oracle WebServer, we will remove the Oracle Web Agent's reliance on CGI. owa will no longer be a separate "C" executable that gets spawned by the Web server. It will be linked directly into the Web server and will pass on the name of the PL/SQL procedure to be executed to a configurable number of processes that are always connected to the Oracle7 database. Besides providing a much more scaleable solution, this method will also enjoy increased performance, because spawning a process and logging on to the Oracle7 database are some of the slowest operations in a typical transaction.
It is important to note, however, that we are not removing the CGI interface altogether. We realize that it is important to continue to support CGI because it is an open standard that will allow our Web server to support any application written for any other Web server that supports this standard. We are merely eliminating the need to go through CGI for applications written with the Oracle Web Agent to provide increased performance and scalability.
Replacing the File System Itself with the Database
Anything that a traditional Web server would use the file system for, the Oracle WebServer Release 2.0 will use the database. This will include storage of static HTML documents and other Web objects, storage of access log information, and storage of error log information. The benefits of storing logging information in the database are immediately apparent: it will be possible to formulate much more sophisticated queries using SQL to generate reports on the actual usage and hit rate of the Web server. However, the benefits of storing actual Web objects in the database are not as obvious. Thus, I'd like to focus on this area for the rest of this section.
Although the Oracle Web Listener, which is a component of the Oracle WebServer, is a full-fledged, industrial-strength HTTP server that can serve HTML documents off the file system like any other Web server, most of the benefits of the Oracle WebServer will not be realized until the database is utilized to store Web objects. Storing Web objects in the database has the following advantages:
Since a list of objects or the objects themselves reside in the database, it will be possible to use the database to enforce additional security using the concept of database roles. The web server administrator will be able to restrict groups of objects to certain users and have these users administer their own groups. The designated administrator for a particular group will be able to add users and objects to the group, but will not be able to tamper with objects in a group with a higher privilege. This allows a finer granularity of security than is available on traditional web servers.
It will be possible to write SQL and PL/SQL code to retrieve information about the objects in the Web server. This opens up a whole range of possibilities such as the ability to maintain different versions of objects much more easily, the ability to locate specific objects more easily, etc.
Searching and retrieving Web objects based on their content will be possible. This is slow and quite cumbersome to do with flat files.
Realizing that for large Web sites it will be a big task to load all existing documents into the database, we are providing the option to leave the documents in the file system and simply create pointers to them in the database. Creating pointers in the database will still bring you the first two advantages described above. Furthermore, for extremely large documents, storing them in the file system with pointers in the database may be preferable from a performance standpoint. Fetching a 10 megabyte document from the database, for example, could be significantly slower than fetching it from the file system. For smaller documents, this performance hit is small enough that the ability to do sophisticated searches based on the content of the document and increased security would outweigh the performance degradation.
Integration with Oracle ConText
The World Wide Web's greatest asset is the wealth of information available to anyone at the click of a mouse button. However, the Web may be the classic case of "too much of a good thing" because it is becoming increasingly difficult to find the information needed due to the sheer volume of information available. Oracle ConText, which will be integrated into the Oracle WebServer, is the language-processing core of Oracle's solution to this information overload.
ConText is a natural language technology that uses the linguistic information contained in every word, sentence, and paragraph to identify the themes and content of any text. With this knowledge, ConText can facilitate automatic classification, visual browsing, and easy management of ever-increasing volumes of electronic information. Its analysis delivers insights into documents that statistical methods cannot:
Main themes representing a selection of text form a "document fingerprint" for classifying or retrieving.
Prominent ideas can be highlighted while details are hidden, for summaries or abstracts.
Areas of text discussing similar themes can be identified, collapsed, or indexed, for hyperlink navigation or finding information quickly.
ConText is unique in that it appears to understand the English language: it can distinguish between a document that merely contains the term "Computer Science" 100 times in the document, and a document that has Computer Science as its main theme, for example. The actual words, "Computer Science", may not even appear anywhere in the latter document.
This capability is extremely useful for large Web sites that want to be able to automatically classify their documents to assist users in finding what they need quickly and easily. After all, information is only valuable if one can find it when it is needed.
Conclusion
We at Oracle believe that in the future, it will be difficult to envision a commercial World Wide Web site without a backend database to manage Web content as well as other information about the Web site, such as access statistics. The Oracle WebServer is only the first product from Oracle aimed at addressing this need. Stay tuned for more!