COGITATIVE CODING

Steven Feuerstein
OTJ, Winter 1996

Climbing the Spiral to PL/SQL Excellence, Part 2

In part two of this two-part series, the PL/SQL master reveals how you can improve both your code and your qualify of programming life.


In part one of this two-part series (see the Fall issue), I explored the "spiral toward excellence," using the building of a basic and (on the surface) trivial PL/SQL utility as an example. I created a very robust function called twice, which doubled any string passed to it.

After reading part one, you might have thought we'd be done with the twice function now. It is well structured, handles errors gracefully, and offers a reasonable amount of flexibility. It has come a long way from its original one-line version. A few days of programming go by, however, and I encounter a very interesting requirement: Take a string and repeat it three times, not just twice. I, of course, instantly think of the twice function and how it would be easy to create another function called thrice that performs an additional concatenation - but which is otherwise unchanged. Then I take a coffee break and realize in my moment away from the screen that tomorrow I could run into a need for four or five repetitions.

The twice function is finished - but only within its limited scope. It would be even more helpful to have a function that lets me perform any number of duplications, as specified by the user.

First, because I will let the users specify the number of repetitions, I need to (1) change the name of the function and (2) add a third parameter. Following is the new header for my new function:

 

CREATE OR REPLACE FUNCTION repeated
(string_in IN VARCHAR2,
action_in IN VARCHAR2
DEFAULT 'N',
num_in IN INTEGER DEFAULT 1)
RETURN VARCHAR2

The name of the function (repeated) reflects its general utility. It returns a string repeated any number of times. The third parameter, num_in, indicates the number of times to repeat the string. Notice that the default is one, which means a single repetition - thereby matching the functionality of twice. Otherwise the parameter list is the same.

It doesn't take much thought to realize two things about the implementation of the repeated function:

There is only one answer to this question: I don't know. Each different user may want or expect a different outcome. As the creator of the repeated function, either I can build the function to handle these two scenarios and other case conversion options, or I can simply decide that the repeated function will only offer one option.

For the purposes of this article, I provide the implementation for the repeated function whose case conversion is limited to applying the first half of the conversion to the input string and the second half of the conversion to all of the repetitions of that string. The following examples show what the repeated function will do:

SQL> exec DBMS_OUTPUT.PUT_LINE -
>(repeated ('abc','UL',3));
ABCabcabc
SQL> exec DBMS_OUTPUT.PUT_LINE -
>(repeated ('abc','LU',3));
abcABCABC

I will leave it to you, the reader, to discover the implementation of the repeated function that offers other patterns (or all patterns). Please send your solutions to me via email at [email protected]. The full implementation of the repeated function is shown in Listing 1; I step through that implementation below.

The first thing I want to do in the repeated function is assert the validity of all of my assumptions. I have the same assumption for action as I did in the twice function, but I have another assumption as well: The num_in argument will not be negative. So the repeated function will add the following call to assert:

assert
  (num_in >= 0,
   'Duplication count must be at
    least 0.');

After I know that my arguments are okay, I can proceed to my algorithm. With my new approach to case conversion, I have two different kinds of strings for repetition: the initial string and the repetition string. The cases of these two strings must be set separately, based on the action code. You can set the cases with the following IF statement:

 

IF v_action = 'UL'
THEN
initval := UPPER (string_in);
nextval := LOWER (string_in);
ELSIF v_action = 'LU'
THEN
initval := LOWER (string_in);
nextval := UPPER (string_in);
ELSE
initval := string_in;
nextval := string_in;
END IF;

Once I have set the initial and repetition (or next) strings, I can set the initial value for the return value and then use a for loop to generate the repeated string:


v_retval := initval;

FOR dup_ind IN 1 .. num_in

LOOP

   v_retval := v_retval || nextval;

END LOOP;


The return value variable is then ready to be RETURNed by the function.

Now that the repeated function is coded, let's walk through that code for some specific argument values to see if my logic holds up.

WHEN NUM_IN ARGUMENT IS 0

This is a "boundary check." Zero is the lowest allowable value for num_in. Any test case that lies on the boundary of a range of values is a prime candidate for failure. How often have you written an algorithm that works fine in general but breaks down exactly on the low or high end or another type of special case that is perfectly valid? When num_in is zero, the FOR loop does not execute. Therefore, the return value is set to the initial value, and that is what is returned: the string passed in by the user, converted as specified, repeated zero times. The repeated function checks for num_in equal to zero. Of course, I should and do execute the function for this case as well, but the code walk-through should come first. You should be able to deduce logically that your code will run fine even before you run it. Here goes:


SQL> exec DBMS_OUTPUT.PUT_LINE

 (repeated ('abc','UL',0));

ABC

WHEN STRING_IN IS NULL

This is another special case. The twice function handled it smoothly because NULL concatenated to NULL is still NULL. Will the repeated function act any differently? The answer is no. It might execute more concatenations, but it will still return NULL when a NULL string is passed to it for the first argument. In addition to my code walk-throughs for these cases, I executed the repeated function for a variety of different inputs and found that it works fine. I now have a very generic function to generate string repetitions with case conversion.

CONSIDERING IMPLEMENTATION OPTIONS

That's right. We're still not finished. As you build more and more complex programs, you should always remember the following piece of wisdom: In PL/SQL, just like in SQL, there is always more than one way to implement a requirement. SQL is notorious (at least to me, and I am far from a SQL guru) for having many different potential solutions for a single data request. Usually 99.95 percent of these solutions perform horribly. The situation is similar though perhaps less extreme for the PL/SQL language. You can usually find at least two or three ways to solve a given problem. And it is usually worth the trouble at least to consider more than one implementation. Why? Because even though your first, instinctive approach may work fine, another technique might be even better - from the standpoints of performance, maintainability, or readability. In addition, by coming at the problem from another angle, you may uncover logical flaws, such as unconsidered cases, in your prior implementation(s).

Let's look at the repeated function with these issues in mind. Actually, as I wrote the last sentence of the previous section, I was immediately reminded of another implementation. That sentence read, in part:

"I now have a very generic function to generate string repetitions. . ."

I thought to myself: "Big deal. I mean, PL/SQL itself has one or two very generic functions to generate repetitions of a string." These built-in functions don't perform case conversion, so I don't feel downright stupid about writing my own string-repeater. However, is it worth considering built-ins for use within the repeated function.

Which functions do I refer to as "string-repeaters"? RPAD and LPAD. These pad functions are commonly used to pad on the left or right with spaces, which is simply the default mode of operation for these functions. You can pad to the specified length with any pattern of character you want. The following use of LPAD, for example, pads the string "Eli" with the words "My son" to a length of 20 characters:

SQL>exec DBMS_OUTPUT.PUT_LINE (LPAD

 ('Eli', 20, 'My son '));

My son My son My Eli

Notice that it stuck "My" in three times. That's because this LPAD function will pad as far as possible to fill the 20 characters and then stop. This built-in repeater can work quite easily in the repeated function. The only trick is to calculate the total length of the string I want to generate. Listing 2 contains the full implementation of the RPAD version of the repeated function. It is identical to the for loop version, except that in place of the loop I use the following line:

v_retval := RPAD (initval, LENGTH

 (string_in) * (num_in+1), nextval);

The total length of the returned value is the length of the specified string multiplied by the number of repetitions plus one. So if a user specifies zero repetitions, the total length is the same as the original string and RPAD does nothing. If a user wants one repetition, the total length is double the original, leaving enough room for RPAD to pad initval on the right with nextval just once, resulting in twice the original string. This pattern works for additional multiples as well.

The RPAD approach requires fewer lines of code than the loop version. For example, with RPAD I don't even need to initialize the return value variable to initval. The single assignment covers the num_in = 0 case as well as the nontrivial repetitions. So which technique should I use? More to the point, which should I make available to others to use? The deciding factor should be: Which is more efficient? This is a low-level utility. It might be called many times deep down in the bowels of an application. So a minor difference in performance between the two implementations could have a multiplying effect on overall performance of the application.

CHOOSING THE BEST PERFORMER

If there is a difference in execution times for RPAD and LPAD, it will not be a big difference. In order to compare the cumulative difference, I must execute the functions many times. The best way to calculate the elapsed time of PL/SQL code execution is with the GET_TIME function of the DBMS_UTILITY built-in package. I have encapsulated GET_TIME inside the PLVtmr package (PL/Vision Timer) to make it easier to use. Listing 3 shows the type of script I used (by the way, this code was generated mostly with the PLVgen package) to compare the performance of repeated, rep_rpad, and also a recursion-based implementation of repeated (called "recrep"; the code for this implementation is not included in this article).

The SQL*Plus script (named timerep.sql for the purposes of this article) shown in Listing 3 takes three arguments. The first (&1) accepts the number of times to execute each function. The second (&2) accepts a string that is to be duplicated. I ran the script several times and got the following results:

SQL>@timerep 100 abc 1

duprpad Elapsed: .77 seconds.

 Factored: .0077 seconds.

duploop Elapsed: .66 seconds.

 Factored: .0066 seconds.

recrep Elapsed: .71 seconds.

 Factored: .0071 seconds.



SQL>@timerep 100 abc 10

duprpad Elapsed: .71 seconds.

 Factored: .0071 seconds.

duploop Elapsed: .99 seconds.

 Factored: .0099 seconds.

recrep Elapsed: 1.54 seconds.

 Factored: .0154 seconds.

I ran each of these tests several times to let the numbers stabilize. The results are interesting and certainly reinforce the need for a careful test plan. When repeating the string only once, the recursion-based implementation is superior. Upon reflection, this result should not be a surprise. The recursion-based implementation handles a single repetition as a single case: an unmediated concatenation of two strings. The loop-based implementation comes in second, but all of the timings are close. When I moved to multiple repetitions of the string, however, the recrep function became extremely slow; again, I would expect that behavior because of the extra work performed by the PL/SQL runtime engine to manage a recursive program. The big news from the results of the multiple repetitions of the string, however, is that the RPAD implementation of the repeated function clearly establishes itself as the fastest technique.

Of course, I also need to compare the performance for different types of strings. I ran the same timer script to see how each function handled NULL values, and I got the following results:


SQL>@timerep 200 null 10

duprpad Elapsed: 1.59 seconds.

 Factored: .00795 seconds.

duploop Elapsed: 2.03 seconds.

 Factored: .01015 seconds.

recrep Elapsed: 2.91 seconds.

 Factored: .01455 seconds.

In this scenario, the RPAD implementation was considerably faster than the loop and recursion techniques (although, once again, I found that if the number of repetitions was set to one, the recrep function was faster).

Finally, I greatly increased the number of string repetitions and it all became clear:


SQL>@timerep 100 abc 100

duprpad Elapsed: .77 seconds.

 Factored: .0077 seconds.

duploop Elapsed: 4.28 seconds.

 Factored: .0428 seconds.

recrep Elapsed: 5.22 seconds.

 Factored: .0522 seconds.

The RPAD technique offers a much more stable solution than the FOR loop technique. Regardless of the number of repetitions, RPAD takes about the same amount of time. With the FOR loop and recursion approaches, as the repetitions increase, the performance degrades. That is not a sign of a healthy algorithm.

Given the results, it would make sense to implement the repeated function using the RPAD technique. You might be able to optimize it further by using the for loop approach for small numbers of repetitions and then switching to RPAD for larger repetitions. The gain, however, is minimal and probably not worth the trouble.

I was glad to see that the RPAD approach is faster. You should always use a built-in PL/SQL function if it exists, rather than build your own. The FOR loop technique arose quite naturally from the way in which I expanded the scope of the twice function. It turned out, however, that it was not the optimal path to take. As for recursion, well, it is always an interesting phenomenon to watch and puzzle out, but it rarely offers the best implementation (except when it is the only implementation feasible).

DON'T FORGET BACKWARD COMPATIBILITY

Now that I have stabilized a version of the repeated function that performs best, I have one more issue to consider: What about all of those calls to the twice function? The repeated function (whichever implementation I use) handles the same requirement as the twice function. I would rather not have several different functions floating around in my environment, especially because they duplicate much of the same logic. For example, if I decide to add yet another type of case conversion, such as InitCap, I would have to enhance both the twice and the repeated functions (which would be a maintenance nightmare).

On the other hand, I don't necessarily want to delete the twice function. It is already used in a number of programs, some of which are in production. I would much rather leave the calls to the twice function in place and thereby minimize the disruption to existing code. But I need a path that offers backward compatibility while at the same time avoiding a maintenance nightmare.

The solution is a direct translation to code of that stated need: Keep the header to the twice function the same, but completely gut and replace its internals with a call to repeated! This approach follows:


CREATE OR REPLACE FUNCTION twice 
   (string_in IN VARCHAR2, 
    action_in IN VARCHAR2
     DEFAULT 'N')
RETURN VARCHAR2
IS
BEGIN
   RETURN (repeated (string_in,
    action_in, 1));
END;

I could leave off the third argument of 1, because that parameter value is the default and I explicitly designed the function so that the default would match the current functionality of the twice function. However, this approach is dangerous. What if the default changes? I am much better off being explicit - especially considering that I do not want the default value; I want a single repetition. That parameter value just happens to be the default today.

Now all programs that call the twice function will work as is with no changes required. Yet any changes I make to the repeated function will also automatically carry over into the twice function.

OBLITERATING THE LITERALS

Two things still bother me about the repeated function: First, the function is not defined in a package, and second, a user of the repeated function must know the correct literals to pass to it to get the right type of conversion action. On general principle I believe that everything built in PL/SQL should be placed inside a package. This construct is the cornerstone of programming in the PL/SQL language and offers many advantages, descriptions of which exceed the scope of this article. My second concern about literals can be answered by creating a package, so I will show you how to convert the standalone repeated function into a package.

I do not believe that users of my code should have to remember the specific characters to pass in a string literal. Is it "UL" or "BS"? Is it "n" for "no action" or "l" for "leave alone"? With the repeated function as implemented throughout this article, there is no way for developers to know at compile time if they called the repeated function properly.

Beyond this difficulty, applications would be much better off if their creators avoided using hard-coded literals in their code. Each time the repeated function is called, some string literal is being hard-coded into a program. If the repeated function is ever modified to expand the scope of action and different literals are used, all of those other programs could go haywire. A much better approach would be to provide named constants in the place of hard-coded strings so that: (1) at compile time a developer would know if the call to the function is correct, and (2) the actual string values for the action codes can be hidden from view (and changed as often as is necessary).

The best way (really, the only way) to create named constants for use throughout a PL/SQL application is to put these constants and the code with which they are used into a package. The stg package shown in Listing 4 offers the same functionality as the repeated function, with the additional benefit of named constants. Now instead of having a standalone repeated function, I have a dup function in the stg package and the following constants:

stg.ulIndicates you want an UPPER-lower case conversion
stg.luIndicates you want a lower-UPPER case conversion
stg.nIndicates you do not want any case conversion

So when I want to duplicate or repeat the string "abc" 10 times with UPPER-lower conversion, I execute the following statement:

stg.dup ('abc', stg.ul, 10);

By referencing the stg.ul constant, I discover at compile time if I am using a valid action code for case conversion. Notice that I placed the dup function within a very generic "string" package. I did this to anticipate future requirements for string processing. By creating this package, I have established a repository in which I can place other related functions and procedures, all of which will be called with the "stg" prefix, indicating that they are oriented to string processing.

GLANCING BACKWARD, LOOKING UPWARD

I started with a simple solution to what seemed to be a very simple request, and I ended up with a generic, well-structured function that handles the simple request and many others as well. Along the way, I applied many "best practices" for module construction. With each successive change to the twice function (and then to the repeated function), I traveled through another turn along the spiral that represents the rise in quality of my PL/SQL coding techniques. I now have a polished function with proven performance and wide applicability. Take a look at the final version of my repeater function (the dup package). Could you have predicted this endpoint from the first version of the twice function? I certainly could not have. But I found many improvements to make to the twice function.

And so we come face to face with one of the most extraordinary characteristics of the programming spiral. It's not like a Slinky. That toy has the right shape, but it has a beginning and an end. The spiral toward excellence for developers has a beginning (although you will probably have to make an arbitrary choice to locate it), but it certainly has no end. You can always find ways to improve your code, your coding philosophy, and your quality of programming life.


Steven Feuerstein is the author of Oracle PL/SQL Programming and Advanced Oracle PL/SQL: Programming with Packages (O'Reilly and Associates, 1995 and 1996, respectively) and director of the Oracle Practice for SSC, a systems management consulting firm based in Chicago (www.saraswati.com). Steven is a senior technology officer for RevealNet and coauthor of the Reveal for PL/SQL knowledge server (www.revealnet.com). He is also the developer of PL/Vision, an extensive library of PL/SQL packages available through RevealNet. He writes regular columns on PL/SQL for several Oracle-related newsletters and magazines. You can email Steven at [email protected].


LISTING 1. The Repeated Function



CREATE OR REPLACE FUNCTION repeated 
   (string_in IN VARCHAR2, 
    action_in IN VARCHAR2 DEFAULT 'N',
    num_in IN INTEGER DEFAULT 1)   
RETURN VARCHAR2
IS
   v_action VARCHAR2(10) := UPPER (action_in);   
   initval VARCHAR2(32000);
   nextval VARCHAR2(32000); 
   v_retval VARCHAR2(32000) := string_in;
   
BEGIN
   assert
      (v_action IN ('UL', 'LU', 'N'),
       'Please enter UL LU or N');
   assert
      (num_in >= 0,
       'Duplication count must be at least 0.');

   IF v_action = 'UL'
   THEN
      initval := UPPER (string_in);   
      nextval := LOWER (string_in);
   ELSIF v_action = 'LU'
   THEN
      initval := LOWER (string_in);      
      nextval := UPPER (string_in);
   ELSE
      initval := string_in;      
      nextval := string_in;
   END IF; 

   v_retval := initval;
   FOR dup_ind IN 1 .. num_in-1
   LOOP
      v_retval := v_retval || nextval;
   END LOOP;
   RETURN v_retval;
EXCEPTION
   WHEN OTHERS
   THEN
      RETURN NULL;      
END repeated;
/

LISTING 2. An RPAD-Based Implementation of Repeated



CREATE OR REPLACE FUNCTION rep_rpad
   (string_in IN VARCHAR2, 
    action_in IN VARCHAR2 DEFAULT 'N',
    num_in IN INTEGER DEFAULT 1)
RETURN VARCHAR2
IS
   v_action VARCHAR2(10) := UPPER (action_in);  
   initval VARCHAR2(32000);
   nextval VARCHAR2(32000); 
   v_retval VARCHAR2(32000);
   
BEGIN
   assert
      (v_action IN ('UL', 'LU', 'N'),
       'Please enter UL LU or N');
   assert
      (num_in >= 0,
       'Duplication count must be at least 0.');

   IF v_action = 'UL'
   THEN
      initval := UPPER (string_in);   
      nextval := LOWER (string_in);
   ELSIF v_action = 'LU'
   THEN
      initval := LOWER (string_in);      
      nextval := UPPER (string_in);
   ELSE
      initval := string_in;      
      nextval := string_in;
   END IF; 

   v_retval := RPAD (initval, LENGTH (string_in) * (num_in+1), nextval);

   RETURN v_retval;      
EXCEPTION
   WHEN OTHERS
   THEN
      RETURN NULL;
END rep_rpad;
/

LISTING 3. A Performance Comparison Script



 

DECLARE
a VARCHAR2(100) := '&2';
aa VARCHAR2(10000);
BEGIN
PLVtmr.set_factor (&1);
PLVtmr.capture;
FOR rep IN 1 .. &1
LOOP
aa := rep_rpad (a, 'UL', &3);
END LOOP;
PLVtmr.show_elapsed ('duprpad');
 
PLVtmr.set_factor (&1);
PLVtmr.capture;
FOR rep IN 1 .. &1
LOOP
aa := repeated (a, 'UL', &3);
END LOOP;
PLVtmr.show_elapsed ('duploop');
 

 

PLVtmr.set_factor (&1);
PLVtmr.capture;
FOR rep IN 1 .. &1
LOOP
aa := recrep (a, 'UL', &3);
END LOOP;
PLVtmr.show_elapsed ('recrep');
END;
/



LISTING 4. A Duplicate String Package



CREATE OR REPLACE PACKAGE stg
IS
   lu CONSTANT VARCHAR2(1) := 'A';
   ul CONSTANT VARCHAR2(1) := 'B';
   n  CONSTANT VARCHAR2(1) := 'X';

   FUNCTION dup 
      (stg_in IN VARCHAR2, 
       action_in IN VARCHAR2 := n,
       num_in IN INTEGER DEFAULT 1)
   RETURN VARCHAR2;
END stg;
/        
CREATE OR REPLACE PACKAGE BODY stg
IS
   FUNCTION stg
      (string_in IN VARCHAR2, 
       action_in IN VARCHAR2 DEFAULT n,
       num_in IN INTEGER DEFAULT 1)
   RETURN VARCHAR2
   IS
      v_action VARCHAR2(10) := UPPER (action_in);  
      initval VARCHAR2(32000);
      nextval VARCHAR2(32000); 
      v_retval VARCHAR2(32000);
      
   BEGIN
      assert
         (v_action IN (lu, ul, n),
          'Please use the package constants: ul, lu or n');
      assert
         (num_in >= 0,
          'Duplication count must be at least 0.');

      IF v_action = ul
      THEN
         initval := UPPER (string_in);   
         nextval := LOWER (string_in);

      ELSIF v_action = lu
      THEN
         initval := LOWER (string_in);      
         nextval := UPPER (string_in);

      ELSE
         initval := string_in;      
         nextval := string_in;
      END IF; 

      v_retval := 
         RPAD (initval, LENGTH (string_in) * (num_in+1), nextval);

      RETURN v_retval;         
   END dup;
END stg;
/



This is a copy of an article published @ http://www.oreview.com/