subquery factoring in oracle 9i
Subquery factoring, also known as the WITH clause, provides a convenient and flexible way for us to define subqueries and in-line views in Oracle 9i. The primary purpose of subquery factoring is to reduce repeated table accesses by generating temporary datasets during query execution. However, even if we do not take advantage of this internal query optimisation, we can use subquery factoring to structure our complex SQL statements in a more logical and understandable format. In this article we will look at how we can benefit from this new syntax and related optimisation.
syntax overview
The syntax for subquery factoring is as follows.
WITH subquery_name AS ( SELECT ... ) , another_subquery_name AS ( SELECT ... ) SELECT ... FROM subquery_name sq1 , another_subquery_name sq2 WHERE sq1... = sq2...
As stated in the introduction to this article, subquery factoring is invoked via the new WITH clause. The WITH keyword is the first in the syntax diagram for SELECT, as we can see in the syntax example above. This example defines two subqueries and then joins them together in the "main body" of the SELECT statement. This is the semantic equivalent of the following pseudo-select.
SELECT ... FROM (SELECT ...) inline_view_name , (SELECT ...) another_inline_view_name WHERE ...
The following syntax example is possibly more common in everyday use. We define two named subqueries, but the second builds on the first.
WITH subquery_name AS ( SELECT ... FROM table ) , another_subquery_name AS ( SELECT ... FROM subquery_name ) SELECT ... FROM another_subquery_name WHERE ...
This form of subquery factoring is the semantic equivalent of the following pseudo-SQL.
SELECT ... FROM (SELECT ... FROM ( SELECT ...))
This is where subquery factoring is at its most useful. Using subqueries in this way helps to break down complex SQL statements and structure them more "procedurally" without having to rely on multiple levels of nested in-line views. This is particularly important when using analytic functions which often need at one least level of nesting. Oracle, meanwhile, manages the optimisation of such "procedural SQL" statements through its query merging techniques, such that what we define in our SQL is not necessarily what Oracle runs under the covers.
a simple optimisation example
We have seen subquery factoring introduced above as a convenient SQL-writing option (and its use as such cannot be understated). Its primary objective, however, is as a performance optimisation technique. Many of the subqueries we write using the WITH clause will be treated exactly as an-line view by Oracle. Sometimes, however, Oracle will go a stage further and "materialise" a temporary resultset from the subquery and use this in later parts of the SQL statement. It does this by creating an "on-the-fly" temporary table for use in the execution of that SQL statement only (subsequent uses of the same cursor will each generate a new temporary table).
In the following example, we will see this optimisation in operation. First, we will create a SALES table, using the ALL_OBJECTS view to generate 1 million rows of dummy sales data.
SQL> CREATE TABLE sales 2 NOLOGGING 3 AS 4 SELECT a1.owner AS region 5 , a1.object_type AS product 6 , a1.object_id AS order_amt 7 FROM all_objects a1 8 , all_objects a2 9 WHERE ROWNUM <= 1000000;
Table created.
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'sales');
PL/SQL procedure successfully completed.
Next we will write a query that will tell us which sales regions accounted for more than one-third of all sales revenue. We will begin with a standard method (involving two accesses of SALES) and measure resource and time costs using Autotrace and sqlplus timing.
SQL> set autotrace on SQL> set timing on SQL> SELECT region 2 , total_sales 3 FROM ( 4 SELECT region 5 , NVL(SUM(order_amt),0) AS total_sales 6 FROM sales 7 GROUP BY 8 region 9 ) ilv 10 WHERE total_sales > (SELECT SUM(order_amt)/3 AS one_third_sales 11 FROM sales);
REGION TOTAL_SALES ------------------------------ ----------- PUBLIC 6842060686 SYS 6008003545 2 rows selected. Elapsed: 00:00:03.02 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2433 Card=2 Bytes=22) 1 0 FILTER 2 1 SORT (GROUP BY) (Cost=2433 Card=2 Bytes=22) 3 2 TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000 Bytes=11000000) 4 1 SORT (AGGREGATE) 5 4 TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000 Bytes=5000000) Statistics ---------------------------------------------------------- 255 recursive calls 0 db block gets 6670 consistent gets 5883 physical reads 0 redo size 517 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 2 rows processed
This query took just 3 seconds to complete and used moderate levels of resources (including the hard-parse). However, we needed to access SALES twice in full to achieve this result.
Subquery factoring can reduce the SALES table access to a single scan by materialising the results of the first scan and use these to resolve the second subquery. We can see this below.
SQL> WITH regional_sales AS ( 2 SELECT region 3 , NVL(SUM(order_amt),0) AS total_sales 4 FROM sales 5 GROUP BY 6 region 7 ) 8 SELECT region 9 , total_sales 10 FROM regional_sales 11 WHERE total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales 12 FROM regional_sales);
REGION TOTAL_SALES ------------------------------ ----------- PUBLIC 6842060686 SYS 6008003545 2 rows selected. Elapsed: 00:00:02.01
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=28 Bytes=840) 1 2 RECURSIVE EXECUTION OF 'SYS_LE_2_0' 2 0 TEMP TABLE TRANSFORMATION 3 2 VIEW (Cost=2 Card=28 Bytes=840) 4 3 TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660C_16AFD3' (Cost=2 Card=28 Bytes=308) 5 3 SORT (AGGREGATE) 6 5 VIEW (Cost=2 Card=28 Bytes=364) 7 6 TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660C_16AFD3' (Cost=2 Card=28 Bytes=308)
Statistics ---------------------------------------------------------- 1002 recursive calls 10 db block gets 3649 consistent gets 2944 physical reads 1344 redo size 517 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 15 sorts (memory) 0 sorts (disk) 2 rows processed
This version of the query was much quicker and included hard-parse costs in addition to the cost of setting up a temporary table. Subsequent executions of both of the example SQL statements reduced each by 1 second, making the subquery version twice as fast. We can see from the statistics output that the amount of work done by Oracle to answer the query in this way is significantly less, in particular the physical I/O.
Looking at the execution plan, we can see that Oracle has introduced a new TEMP TABLE TRANSFORMATION step (Step 2) where it creates and loads a global temporary table from the first scan of SALES. It then uses this temporary dataset to determine the cut-off sales value needed to answer the overall question of which regions account for more than one-third of sales revenue (Step 7).
Note that the Autotrace plan makes no mention of the SALES table. There is clearly some missing information in our plan. We can use the DBMS_XPLAN package for a second opinion as follows (note that DBMS_XPLAN is an Oracle 9.2 feature).
SQL> EXPLAIN PLAN SET statement_id = 'SQ_FCT' 2 FOR 3 WITH regional_sales AS ( 4 SELECT region 5 , NVL(SUM(order_amt),0) AS total_sales 6 FROM sales 7 GROUP BY 8 region 9 ) 10 SELECT region 11 , total_sales 12 FROM regional_sales 13 WHERE total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales 14 FROM regional_sales);
Explained.
SQL> SELECT plan_table_output 2 FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','SQ_FCT'));
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 28 | 840 | 2 | | 2 | TEMP TABLE TRANSFORMATION | | | | | | 1 | RECURSIVE EXECUTION | SYS_LE_2_0 | | | | | 0 | INSERT STATEMENT | | 28 | 308 | 2433 | | 1 | LOAD AS SELECT | | | | | | 2 | SORT GROUP BY | | 28 | 308 | 2433 | | 3 | TABLE ACCESS FULL | SALES | 1000K| 10M| 320 | |* 3 | VIEW | | 28 | 840 | 2 | | 4 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6623_16AFD3 | 28 | 308 | 2 | | 5 | SORT AGGREGATE | | 1 | 13 | | | 6 | VIEW | | 28 | 364 | 2 | | 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6623_16AFD3 | 28 | 308 | 2 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("REGIONAL_SALES"."TOTAL_SALES"> (SELECT /*+ */ SUM("REGIONAL_SALES"."TOTAL_SALES")/3 FROM (SELECT /*+ CACHE_TEMP_TABLE("T1") */ "T1"."C0" "REGION","T1"."C1" "TOTAL_SALES" FROM "SYS"."SYS_TEMP_0FD9D6623_16AFD3" "T1") "REGIONAL_SALES")) Note: cpu costing is off 27 rows selected.
With this plan we see the full picture. The SALES table is scanned in full during the direct-path load of the temporary table. DBMS_XPLAN shows us the recursive plan that Autotrace omitted.
As noted earlier, Oracle will not always materialise subqueries in this way. Sometimes it will either merge the subquery into the main query or treat it as a simple in-line view. In either of these cases, the WITH clause behaves no differently to the more familiar in-line view. On large datasets, however, the savings from this optimisation can be quite significant as we have seen. On tiny datasets, the time involved in the temporary table setup can take longer than the original query itself so is not a particularly useful mechanism.
multiple subqueries
As described in the syntax overview earlier in this article, subqueries can reference preceding subqueries. This can be useful for very complex queries that need to build up results in stages (in fact, this makes SQL even more powerful, flexible and descriptive than ever).
We can see an example of this approach to building complex SQL statements in multiple stages below. This example continues with the original regional sales query but breaks down the regional totals by products. Note that for this particular requirement we should ideally be using analytic functions, but it highlights the subquery factoring principle quite well.
SQL> WITH regional_sales AS ( 2 SELECT region 3 , NVL(SUM(order_amt),0) AS total_sales 4 FROM sales 5 GROUP BY 6 region 7 ) 8 , top_regions AS ( 9 SELECT region 10 FROM regional_sales 11 WHERE total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales 12 FROM regional_sales) 13 ) 14 SELECT region 15 , product 16 , COUNT(*) AS product_units 17 , SUM(order_amt) AS product_sales 18 FROM sales 19 WHERE region IN (SELECT region FROM top_regions) 20 GROUP BY 21 region 22 , product;
REGION PRODUCT PRODUCT_UNITS PRODUCT_SALES ---------- ------------------ ------------- ------------- SYS TYPE 15575 409075100 SYS VIEW 74060 392488355 SYS INDEX 11095 24907715 SYS TABLE 12285 46731930 SYS CONTEXT 105 2375240 SYS LIBRARY 70 284305 SYS PACKAGE 5880 63982695 SYS SYNONYM 210 823095 SYS FUNCTION 1435 11199755 SYS OPERATOR 70 2189705 SYS DIRECTORY 210 6622035 SYS PROCEDURE 490 3671185 SYS JAVA CLASS 315262 4905172276 SYS JAVA RESOURCE 6194 138006359 SYS CONSUMER GROUP 70 269150 SYS EVALUATION CONTEXT 35 204645 PUBLIC SYNONYM 427040 6842060686 17 rows selected.
Note how the top_regions subquery builds on the regional_sales subquery to generate the overall picture of our top-performing regions. This subquery itself is then used in a visit back to SALES to generate the product-level summary.
subquery factoring and dml
Subquery factoring is part of the SELECT statement itself, which means it can be used anywhere a SELECT is used. For example, the following statement formats are all valid.
INSERT INTO table_name WITH subquery_name AS ( SELECT ... FROM ... ) SELECT ... FROM subquery_name; UPDATE table_name SET column_name = ( WITH subquery_name AS ( SELECT ... FROM ... ) SELECT ... FROM subquery_name ); DELETE FROM table_name WHERE column_name IN ( WITH subquery_name AS ( SELECT ... FROM ... ) SELECT ... FROM subquery_name ); INSERT INTO table_name VALUES ( (WITH subquery_name AS ( SELECT ... FROM ... ) SELECT ... FROM subquery_name) );
forcing materialisation
We have seen both forms of subquery factoring in this article so far; these being with and without materialisation. As stated, the latter is an optimisation technique that can provide good performance gains with large datasets. Sometimes we might wish to force a subquery to be materialised on the occasions when the CBO chooses not to. We can do this via the MATERIALIZE hint. Note that this hint is currently undocumented so there is always a risk that it will be either deprecated or will simply stop working.
In the following example, we will include a subquery that is unlikely to be materialised by Oracle and then attempt to force the temporary table execution using the MATERIALIZE hint. We will begin with a version of the query without hints.
SQL> set autotrace traceonly explain SQL> WITH inline_view AS ( 2 SELECT region 3 , SUM(order_amt) AS total_orders 4 FROM sales 5 GROUP BY 6 region 7 ) 8 SELECT * 9 FROM inline_view;
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2433 Card=28 Bytes=308) 1 0 SORT (GROUP BY) (Cost=2433 Card=28 Bytes=308) 2 1 TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000 Bytes=11000000)
It is not surprising that Oracle has chosen not to materialise this subquery. It is only used once so creating and loading a temporary table would be a waste of resources. However, we can force Oracle into taking this path by using the MATERIALIZE hint as follows.
SQL> WITH inline_view AS ( 2 SELECT /*+ materialize */ 3 region 4 , SUM(order_amt) AS total_orders 5 FROM sales 6 GROUP BY 7 region 8 ) 9 SELECT * 10 FROM inline_view;
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=28 Bytes=840) 1 2 RECURSIVE EXECUTION OF 'SYS_LE_2_0' 2 0 TEMP TABLE TRANSFORMATION 3 2 VIEW (Cost=2 Card=28 Bytes=840) 4 3 TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660A_16AFD3' (Cost=2 Card=28 Bytes=308)
This time we have the temporary table plan that we saw in previous examples (remember that Autotrace omits the recursive part of the plan that describes the direct-path load of the temporary table from the SALES table).
restrictions
Finally, there are a small number of restrictions with subquery factoring. These are outlined below.
unreferenced subqueries
Any subqueries we define must be referenced at least once otherwise Oracle will raise an exception as follows.
SQL> WITH unused_subquery AS ( 2 SELECT 'X' AS column_alias 3 FROM dual 4 ) 5 SELECT SYSDATE 6 FROM dual;
FROM dual * ERROR at line 3: ORA-32035: unreferenced query name defined in WITH clause
subqueries in views
Subqueries can be included in views but there is a small "gotcha". Some DDL-generators wrap the view's SQL in parentheses and this raises an exception when subqueries are used, as follows.
SQL> CREATE VIEW view_with_subquery 2 AS 3 ( 4 WITH subquery_name AS ( 5 SELECT SYSDATE AS date_column 6 FROM dual 7 ) 8 SELECT date_column 9 FROM subquery_name 10 );
) * ERROR at line 10: ORA-32034: unsupported use of WITH clause
The correct way to code this is to remove the outer parentheses from the SQL, as follows.
SQL> CREATE OR REPLACE VIEW view_with_subquery 2 AS 3 WITH subquery_name AS ( 4 SELECT SYSDATE AS date_column 5 FROM dual 6 ) 7 SELECT date_column 8 FROM subquery_name;
View created.
nesting subqueries
Subqueries can reference preceding subqueries (as we saw earlier), but they cannot be nested inside each other. The following example attempts to nest a subquery.
SQL> WITH outer_subquery AS ( 2 WITH nested_subquery AS ( 3 SELECT SYSDATE AS date_column 4 FROM dual 5 ) 6 ) 7 SELECT date_column 8 FROM outer_subquery;
WITH nested_subquery AS ( * ERROR at line 2: ORA-32034: unsupported use of WITH clause
subqueries and object types
This is a slightly more esoteric restriction, but if we use any object columns in our subqueries, Oracle will not materialise the dataset, even if we try to force it. In the following example, we create a small object type and try to materialise a dataset that includes it.
SQL> CREATE TYPE object_type AS OBJECT 2 ( attr1 VARCHAR2(30) 3 , attr2 VARCHAR2(30) 4 ); 5 /
Type created.
SQL> set autotrace traceonly explain SQL> WITH subquery_with_object AS ( 2 SELECT /*+ materialize */ 3 object_type(region,product) 4 FROM sales 5 ) 6 SELECT COUNT(*) 7 FROM subquery_with_object;
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=320 Card=1) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000)
We can verify that the subquery can be materialised without the object type as follows.
SQL> WITH subquery_without_object AS ( 2 SELECT /*+ materialize */ 3 region 4 , product 5 FROM sales 6 ) 7 SELECT COUNT(*) 8 FROM subquery_without_object;
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1) 1 2 RECURSIVE EXECUTION OF 'SYS_LE_2_0' 2 0 TEMP TABLE TRANSFORMATION 3 2 SORT (AGGREGATE) 4 3 VIEW (Cost=2 Card=1000000) 5 4 TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660F_16AFD3' (Cost=2 Card=1000000 Bytes=15000000)
For more information read the subquery factoring clause in the SQL Reference.
source code
The source code for the examples in this article can be downloaded from here.
Adrian Billington, December 2005 (updated June 2007)
Back to Top