Select Magazine - April 1998

Tuning SQL Statements That Use the IN Operator

Roger Schrag

The product manager, suspecting the customer's database was not properly tuned or configured, asked me to examine the system. Gaining access to the customer's database was at first politically touchy, but in the end I got excellent cooperation from the customer's DBA staff and found their database was in fine shape.

I quickly switched to my application developer hat as I traced sessions on an active database and ran Tkprof to collect information about what was going on inside the customer's Oracle server. After an hour of generating trace files and looking at the results, the problem was clear: The application was using the IN operator in its SQL to an extreme I had never seen before. Plus, the customer's Oracle 7.3 database didn't take kindly to this.

The software development company, which I'll refer to as "Acme Software," had used an Oracle 7.2 database in-house to build the application. It turned out that Oracle 7.2 handled this application's far-out use of the IN operator somewhat better than Oracle 7.3. Also clouding the issue was a flaw in Acme's testing strategy that made it appear as if the application ran very fast on the Oracle 7.2 database when in reality performance was still not so hot.

So at the end of the day Acme had built an application whose core queries ran in five seconds against an Oracle 7.2 database, 70 seconds against an Oracle 7.3 database, and under one second against an Oracle8.0 database. All three databases had the exact same data and ran on comparable hardware. My mission was to explain to the product manager why different releases of Oracle would give such varied results and, more importantly, help his developers get the application to run quickly on all releases of Oracle.

The IN Operator

The IN operator in SQL gives you the ability to allow for multiple conditions in one SQL statement. The following query, for example, will give you the order numbers of all orders that are either canceled or invalid:

	SELECT order_number
	FROM   orders
	WHERE  status IN ('canceled', 'invalid')
The list of values enclosed in the parentheses is called an inlist. The above example had an inlist with two values in it. The following two queries will list the names of all employees in departments 10, 20 and 30. Both queries yield the same results-the IN operator and the = ANY operator mean the same thing to the Oracle server:
	SELECT ename
	FROM   emp
	WHERE  deptno IN (10, 20, 30)

	SELECT ename
	FROM   emp
	WHERE  deptno = ANY (10, 20, 30)
In each of these queries, the IN operator has been used to select data based on multiple constant values (order status codes in the first query and department numbers in the second and third). The IN operator can also be used with subqueries. The query below retrieves names of all employees in departments where the name of the department starts with the word "Shoes":
	SELECT ename
	FROM   emp
	WHERE  deptno IN
	       (
	       SELECT deptno
	       FROM   dept
	       WHERE  deptname LIKE 'Shoes %'
	       )
IN operators used with subqueries can often be rewritten using a join between the two tables instead. For example, the previous query is identical to:
	SELECT emp.ename
	FROM   dept, emp
	WHERE  dept.deptname LIKE 'Shoes %'
	AND    emp.deptno = dept.deptno
The query optimizer in the Oracle server is well aware of this trick, and will usually process IN operators with subqueries as joins instead wherever possible. This has been true since the Oracle V6 days. Developers still use the IN operator with subqueries, though, because sometimes it makes the code easier to understand. Also, there are times when this use of the IN operator is not the same as an ordinary join.

How Oracle7 Processes IN Operators

So far we've looked at examples of how the IN operator can be used to handle multiple conditions in one query-either from a list of enumerated constants or a subquery that returns multiple values. But how does Oracle actually process the IN operator when executing a SQL statement? The answer can depend on which release of Oracle you are using, and whether the IN operator is being used with a subquery or enumerated values.

First let's look at the scenario where the IN operator is being used with a subquery. This is the more straightforward case, and seems to be consistent across all releases of Oracle. Fig. 1 shows the same query used in a previous example, along with its execution plan. You can see that the Oracle server performs the query as if it were a simple join between the dept and emp tables: First Oracle finds the departments with names starting with the word "Shoes" and then it finds all of the employees in these departments.

SELECT ename
FROM   emp
WHERE  deptno IN
       (
       SELECT deptno
       FROM   dept
       WHERE  deptname LIKE 'Shoes %'
       )
	   
	   
Execution Plan
-------------------------------------------------------
SELECT STATEMENT
 NESTED LOOPS
  TABLE ACCESS BY ROWID (DEPT)
   INDEX RANGE SCAN (DEPT_NAME_IDX)
  TABLE ACCESS BY ROWID (EMP)
   INDEX RANGE SCAN (EMP_FK_DEPT_IDX)
Fig. 1: A query using the IN operator with a subquery, and its execution plan. Next let's look at how Oracle processes IN operators where constant values are enumerated in the inlist instead of a subquery. Fig. 2 shows the same query used in the very first example along with its execution plan. In this query we wish to retrieve the order numbers of all orders that have been canceled or are invalid.
SELECT order_number
FROM   orders
WHERE  status IN ('canceled', 'invalid')

Execution Plan
-------------------------------------------------------
SELECT STATEMENT
 CONCATENATION
  TABLE ACCESS BY ROWID (ORDERS)
   INDEX RANGE SCAN (ORDER_STATUS_IDX)
  TABLE ACCESS BY ROWID (ORDERS)
   INDEX RANGE SCAN (ORDER_STATUS_IDX)
Fig. 2: A simple query using the IN operator with a list of constant values, and the resulting execution plan. If we take it as a given that the status column in the orders table is indexed, you might expect Oracle to simply scan the status index twice (once for the value "canceled" and once for the value "invalid"), and then combine the results. In a roundabout way, that is in fact exactly what Oracle is doing. First, the Oracle server has transformed the original query into the following:
	SELECT order_number
	FROM   orders
	WHERE  (status = 'canceled' OR status = 'invalid')
Next, Oracle transforms the query further as follows:
	SELECT order_number
	FROM   orders
	WHERE  status = 'canceled'
	UNION ALL
	SELECT order_number
	FROM   orders
	WHERE  status = 'invalid'
This second transformation should help you understand the execution plan shown in Fig. 2. Conceptually, the Oracle server is doing about what we would expect: It is looking up all canceled orders and all invalid orders and combining the results together. Oracle calls this combining action a concatenation. But suppose the query were more complex. Fig. 3 shows a more sophisticated query using the IN operator with enumerated values in the inlist, and the resulting execution plan. In this query we are retrieving more information about canceled and invalid orders than just the order number as in the previous example. Because of the way Oracle7 transforms the IN operator into an OR clause and subsequently into a compound SQL statement using the UNION ALL set operator, you can see that the execution plan doubles in size for a query with an IN operator that has two constant values in its inlist.
SELECT   ORD.order_id, ORD.order_number, 
         ORD.total, CUST.customer_name
FROM     orders ORD, customers CUST
WHERE    ORD.status IN ('canceled', 'invalid')
AND      CUST.customer_id = ORD.customer_id

Execution Plan
-------------------------------------------------------
SELECT STATEMENT
 CONCATENATION
  NESTED LOOPS
   TABLE ACCESS BY ROWID (ORDERS)
    INDEX RANGE SCAN (ORDER_STATUS_IDX)
   TABLE ACCESS BY ROWID (CUSTOMERS)
    INDEX UNIQUE SCAN (CUSTOMERS_PK)
  NESTED LOOPS
   TABLE ACCESS BY ROWID (ORDERS)
    INDEX RANGE SCAN (ORDER_STATUS_IDX)
   TABLE ACCESS BY ROWID (CUSTOMERS)
    INDEX UNIQUE SCAN (CUSTOMERS_PK)
	
	
Fig. 3: A more complex query using the IN operator with a list of constant values, and the resulting execution plan. As you can imagine, Oracle7's strategy for processing SQL statements with IN operators and constant values works well for small inlists, but is not very scaleable. In fact Oracle7 will return an ORA-01795 error if the inlist for an IN operator contains more than 254 values. With this in mind, let's go back to the core queries in Acme's application. These queries, it turns out, hinge on a four table join and an IN operator with 200 constant values enumerated in the inlist. The execution plan for this query on their Oracle 7.2 database was over 2000 lines long! Fig. 4a shows a query similar to one of Acme's core queries with the extreme IN operators. Fig. 4b shows the execution plan generated by an Oracle 7.2 database. Part of it anyway; you don't need to see all fifty pages to get the idea!
SELECT   ORD.order_id, ORD.order_number, 
         ORD.status, CUST.customer_name, 
         CAT.category, SHIP.description
FROM     orders ORD, customers CUST, 
         customer_categories CAT,
         shipping_methods SHIP
WHERE    CUST.customer_id = ORD.customer_id
AND      CAT.category_id = CUST.category_id
AND      SHIP.shipping_method_id = 
           ORD.shipping_method_id
AND      ORD.order_id IN 
         (
          385,  409,  507,  511,  601,  633,  641,  651,
          715,  799,  972,  984, 1038, 1074, 1122, 1135,
         1143, 1189, 1241, 1242, 1370, 1442, 1565, 1581,
         1593, 1614, 1803, 1881, 1890, 1899, 1943, 2015,
         2049, 2217, 2269, 2389, 2409, 2411, 2486, 2527,
         2576, 2696, 2704, 2759, 2768, 2846, 2924, 2927,
         2973, 3129, 3138, 3162, 3181, 3252, 3341, 3357,
         3383, 3503, 3551, 3738, 3741, 3753, 3822, 3864,
         3914, 4090, 4096, 4185, 4221, 4236, 4263, 4271,
         4278, 4305, 4396, 4465, 4485, 4570, 4578, 4607,
         4656, 4742, 4777, 4799, 4824, 4838, 4940, 5030,
         5134, 5137, 5199, 5302, 5360, 5531, 5569, 5592,
         5787, 5946, 5986, 6051, 6161, 6191, 6193, 6196,
         6261, 6392, 6465, 6597, 6611, 6725, 6782, 6793,
         6813, 6822, 6843, 6896, 6995, 7037, 7076, 7141,
         7305, 7307, 7327, 7374, 7387, 7410, 7424, 7442,
         7443, 7473, 7488, 7496, 7579, 7637, 7662, 7664,
         7748, 7830, 7891, 7902, 7947, 7997, 8008, 8092,
         8130, 8180, 8224, 8304, 8351, 8400, 8466, 8475,
         8480, 8491, 8509, 8572, 8592, 8704, 8766, 8775,
         8860, 8897, 8965, 9024, 9034, 9052, 9083, 9089,
         9102, 9107, 9230, 9269, 9358, 9412, 9423, 9429,
         9490, 9507, 9530, 9581, 9591, 9730, 9768, 9770,
         9771, 9772, 9784, 9791, 9854, 9858, 9929, 9964,
         9974, 9982, 9987, 9996, 9998,10003,10004,10092
         )
Fig. 4a: A query using the IN operator with a long inlist of constant values.
Execution Plan
-------------------------------------------------------
SELECT STATEMENT
 CONCATENATION
  NESTED LOOPS
   NESTED LOOPS
    NESTED LOOPS
     TABLE ACCESS BY ROWID (ORDERS)
      INDEX UNIQUE SCAN (ORDERS_PK)
     TABLE ACCESS BY ROWID (SHIPPING_METHODS)
      INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)
    TABLE ACCESS BY ROWID (CUSTOMERS)
     INDEX UNIQUE SCAN (CUSTOMERS_PK)
   TABLE ACCESS BY ROWID (CUSTOMER_CATEGORIES)
    INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
  NESTED LOOPS
   NESTED LOOPS
    NESTED LOOPS
     TABLE ACCESS BY ROWID (ORDERS)
      INDEX UNIQUE SCAN (ORDERS_PK)
     TABLE ACCESS BY ROWID (SHIPPING_METHODS)
      INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)
    TABLE ACCESS BY ROWID (CUSTOMERS)
     INDEX UNIQUE SCAN (CUSTOMERS_PK)
   TABLE ACCESS BY ROWID (CUSTOMER_CATEGORIES)
    INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
  . . .
Fig. 4b: The execution plan generated by an Oracle 7.2 database for the query shown in Fig. 4a.

With release 7.3, Oracle apparently altered the optimizer's strategy for transforming SQL statements with large inlists. Instead of transforming one query into 200 queries and using UNION ALL to bring the results together, Oracle 7.3 will often ignore the relevant index and instead perform one full table scan. In this scenario, Oracle reads every row of data from the table and compares each to all values listed in the inlist of the IN operator. If the underlying table is large, this full table scan can clobber performance.

To illustrate this point, Fig. 4c shows the execution plan generated by an Oracle 7.3 database for the query shown in Fig. 4a. Instead of looking up each of the 200 order_id values in the primary key index on the orders table, the Oracle 7.3 server has elected to perform a full scan of the orders table, finding all rows that have any of the 200 values in the order_id column, and then join the results to the other three tables.

Execution Plan
-------------------------------------------------------
SELECT STATEMENT
 NESTED LOOPS
  NESTED LOOPS
   NESTED LOOPS
    TABLE ACCESS FULL (ORDERS)
    TABLE ACCESS BY ROWID (CUSTOMERS)
     INDEX UNIQUE SCAN (CUSTOMERS_PK)
   TABLE ACCESS BY ROWID (CUSTOMER_CATEGORIES)
    INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)
  TABLE ACCESS BY ROWID (SHIPPING_METHODS)
   INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)
Fig. 4c: The execution plan generated by an Oracle 7.3 database for the query shown in Fig. 4a.

Its not clear exactly where Oracle 7.3 draws the line. If the inlist is quite small (only a few constant values) then Oracle 7.3 will use the same transformation strategy as Oracle 7.2. But when the inlist is large (over perhaps 30 or 40 enumerated values) the full table scan seems to be used instead. You can force Oracle to use the transformation strategy, regardless of software release and inlist size, by using the USE_CONCAT optimizer hint.

Enter Oracle8

Oracle8 seems to handle IN operators with subqueries in much the same way that Oracle7 and V6 did. This was already very efficient and needed no improvement. But for IN operators with many constant values in the inlist, Oracle8 introduces the new inlist iterator data access path. This new method in the query optimizer makes handling of large inlists of constants extremely efficient and intuitive. Fig. 4d shows the execution plan generated by an Oracle8.0 database for the query shown in Fig. 4a.

Execution Plan
-------------------------------------------------------
SELECT STATEMENT                                                                
 NESTED LOOPS                                                                   
  NESTED LOOPS                                                                  
   NESTED LOOPS                                                                 
    INLIST ITERATOR CONCATENATED                                                
     TABLE ACCESS BY INDEX ROWID (ORDERS)                                       
      INDEX RANGE SCAN (ORDERS_PK)                                              
    TABLE ACCESS BY INDEX ROWID (CUSTOMERS)                                     
     INDEX UNIQUE SCAN (CUSTOMERS_PK)                                           
   TABLE ACCESS BY INDEX ROWID (CUSTOMER_CATEGORIES)                            
    INDEX UNIQUE SCAN (CUSTOMER_CATEGORIES_PK)                                  
  TABLE ACCESS BY INDEX ROWID (SHIPPING_METHODS)                                
   INDEX UNIQUE SCAN (SHIPPING_METHODS_PK)                     
Fig. 4d: The execution plan generated by an Oracle8.0 database for the query shown in Fig. 4a.

With the inlist iterator of Oracle8, there is no transforming of the SQL statement into something far more complex than the original. And there is also no setting aside indexes in favor of full table scans. Instead, the Oracle8 server just does what you would expect: It walks down the list of constant values spelled out in the inlist and looks each one up in the index, collects matching rows from the table, and moves on.

There is no hint to turn on the inlist iterator data access path. The optimizer will simply use it when appropriate. If for some reason you need Oracle8 to mimic the Oracle7 behavior, you can use the USE_CONCAT hint to force the UNION ALL transformation.

Back to the Original Puzzle

Using the query shown in Fig. 4a one can now explain the puzzling results Acme Software observed. Their in-house Oracle 7.2 database transformed the core queries in their application into extremely complex statements as demonstrated in Fig. 4b. It took their Oracle server about five seconds to parse such a query and build the enormous execution plan. Actually executing and fetching the query, though, proved fast because only unique index scans and concatenations were involved.

The first time Acme ran a regression test of their application, the core query took five seconds because of the long parse. Subsequent runs took under one second because the SQL statement was already in the shared pool and did not need to be parsed. So Acme had concluded that the query ran very fast in their regression test. (This was actually a flawed conclusion because in real life the criteria would be slightly different on each invocation, and so the five second parse would apply each time an end-user initiated a search.)

On an Oracle 7.3 database, meanwhile, the query always took approximately 70 seconds. This was directly attributable to the full table scan of the orders table-one of the largest tables in their database-as demonstrated in Fig. 4c.

And finally on the Oracle8.0 database, the query always ran in less than one second because the new inlist iterator data access path put an end to complex transformations and full table scans.

All of this served to explain why Acme's application performed differently on different Oracle releases. But the real interest was in getting the application to run fast on any release of Oracle. One approach would be to add a USE_CONCAT optimizer hint to the query in Fig. 4a to cause Oracle releases 7.2, 7.3, and 8.0 to all process the query in the same way (as shown in Fig. 4b). This could benefit Oracle 7.3 users, but would significantly harm Oracle8 users. And at five seconds per query, the performance would not be very satisfying.

A better solution was to do away with the enormous inlists altogether. After some research, it turned out that the application in question was a PowerBuilder screen where an end-user could query order information from the database based on many different types of criteria. Instead of building one query to retrieve order information based on the end-user's criteria, the application worked by retrieving order_ids of orders meeting the criteria and then retrieving the orders themselves using a second query along the lines of the query shown in Fig. 4a.

Breaking the operation into two separate SQL statements like this perhaps made the application easier for the developer to build, but offered no other benefits. At the same time, this strategy came at a high performance cost for Oracle 7.2 and 7.3 users. Acme ultimately got the application to run faster on all releases of Oracle by merging the two SQL statements together and replacing queries like the one in Fig. 4a with dynamically generated queries such as the one shown in Fig. 5.

SELECT   ORD.order_id, ORD.order_number, 
         ORD.status, CUST.customer_name, 
         CAT.category, SHIP.description
FROM     orders ORD, customers CUST, 
         customer_categories CAT,
         shipping_methods SHIP
WHERE    CUST.customer_id = ORD.customer_id
AND      CAT.category_id = CUST.category_id
AND      SHIP.shipping_method_id = 
           ORD.shipping_method_id
AND      ORD.order_date BETWEEN :date_lo AND :date_hi
AND      ORD.status = :order_status
AND      ORD.total BETWEEN :total_lo AND :total_hi 
Fig. 5: A query to efficiently retrieve orders based on ad hoc search criteria.

Conclusion

The IN operator allows one SQL statement to accommodate multiple possible conditions. These conditions could come from a subquery that retrieves multiple values or from an enumerated list of constant values. IN operators with subqueries are a very powerful tool in the SQL language. IN operators with relatively small numbers of constant values can also be very useful.

The Oracle server has handled IN operators that use subqueries in a very efficient manner since the days of Oracle V6. Oracle7 handles IN operators with a small set of constant values in a reasonably efficient manner, but it is not until Oracle8 that you can use arbitrarily large numbers of constant values in an inlist without paying performance penalties.

Until I began troubleshooting Acme Software's application, the idea of using the IN operator with large numbers of constant values in the inlist had never occurred to me. As was the case with Acme's application, I believe that the large inlists are often unnecessary. But if such uses of the IN operator are something you feel you cannot live without, then Oracle8 has you covered. Still using Oracle7? Then try to eliminate your giant inlists or consider an upgrade to Oracle8.

References
Oracle7 Server SQL Reference, Oracle Corporation
Oracle7 Server Tuning, Oracle Corporation
Oracle8 Server Tuning, Oracle Corporation
Oracle Performance Tuning, O'Reilly & Associates, Inc.

About the Author
Roger Schrag has been an Oracle DBA and application architect for over nine years, starting out at Oracle Corporation on the Oracle Financials development team. He is currently a senior consultant at Database Specialists, Inc., a small consulting group specializing in business solutions based on Oracle technology. You can visit Database Specialists on the web at www.dbspecialists.com, and you can reach Roger by calling +1.415.841.1429 or via email at [email protected].



This is a copy of an article published @ http://www.ioug.org/