pl/sql optimisation in 10g

Oracle has made several claims that PL/SQL is now faster "out of the box" in 10g, due largely to a re-development of its underlying algorithms. Oracle also provides a new parameter for us to control how much of this optimisation we wish to adopt when running PL/SQL in 10g.

In addition to this, various elements of the language have also undergone improvements. For example, all integer arithmetic is now machine-based. Floating point arithmetic is possible in Oracle for the first time and provides for some impressive performance gains in heavily-computational code (even without native compilation).

This article will, however, concentrate solely on the underlying PL/SQL optimisation rather than the performance gains with using new datatypes as described above. This article will be divided into three sections. The first section will briefly highlight the performance improvements in 10g PL/SQL. The second section demonstrates a specific optimisation dedicated to cursor-for-loops and the final section is a new addition that describes some potential issues (bugs) with the hidden optimisation.

about the examples in this article

Most of the original examples in this article compare the performance of fairly trivial PL/SQL blocks between 10g Release One (10.1.0.1) and 9i Release Two (9.2.0.5). The databases both reside on the same twin CPU Linux x86 server and the general setups are as comparable as two different releases of Oracle can be. The examples in the new section on optimisation issues have been tested on 10.1.0.5, 10.2.0.3 and 11.1.0.6.

plsql_optimize_level

For the first time, Oracle has provided a parameter to control how much optimisation we wish our PL/SQL modules to undergo. There are three levels of optimisation, as follows.

As stated above, level 2 is the default and it is unlikely that we would ever wish to set it otherwise. Should we wish to change the level we can either set it instance-wide in our spfile/pfile or for our session only with an ALTER SESSION statement.

simple optimisation

To begin, we'll simply compare a computational PL/SQL block in 10g and 9i and time it by the "wall-clock". We'll begin by running the block on 10g. This runs 1 million iterations of a meaningless calculation.

10g> set timing on

10g> DECLARE
  2     c  NUMBER := 1;
  3     pi NUMBER := 3.142;
  4     r  NUMBER := DBMS_RANDOM.RANDOM;
  5  BEGIN
  6     FOR i IN 1 .. 1000000 LOOP
  7        /* Use a random, variable calculation... */
  8        c := pi*(r**2) + (MOD(r,c)*pi+i);
  9     END LOOP;
 10  END;
 11  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.13

Now we can run exactly the same block in 9i and see how long it takes to run.

 9i> set timing on
 
 9i> DECLARE
  2     c  NUMBER := 1;
  3     pi NUMBER := 3.142;
  4     r  NUMBER := DBMS_RANDOM.RANDOM;
  5  BEGIN
  6     FOR i IN 1 .. 1000000 LOOP
  7        /* Use a random, variable calculation... */
  8        c := pi*(r**2) + (MOD(r,c)*pi+i);
  9     END LOOP;
 10  END;
 11  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.15

Oracle 10g is roughly twice as fast with a simple task such as that above (repeated runs gave the same results). This is impressive given that there are no obvious re-write optimisations available on a variable calculation such as this. In the following example, we shall see the difference that re-writes can make to our timings. We'll run a large loop that contains only a constant assignment. Remember that optimisation level 2 in 10g will relocate constant assignments that appear inside loops.

This time, we'll start by running the block on 9i, again with timing on.

 9i> DECLARE
  2     c PLS_INTEGER;
  3  BEGIN
  4     FOR i IN 1 .. 100000000 LOOP
  5        c := 1;
  6     END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:16.56

Now we can compare the performance of 10g for the same block. This is at optimisation level 2.

10g> DECLARE
  2     c PLS_INTEGER;
  3  BEGIN
  4     FOR i IN 1 .. 100000000 LOOP
  5        c := 1;
  6     END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.94

This time, we've seen some impressive performance gains for no effort (though admittedly this is quite a contrived example). Using the PL/SQL Profiler, we can easily demonstrate the effects of the optimisation. We will run the previous example at optimisation levels 1 and 2 but with 10,000 loop iterations (because the PL/SQL Profiler writes its trace to tables). We will begin with level 1 as follows.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 1;

Session altered.

10g> DECLARE
  2     c PLS_INTEGER;
  3  BEGIN
  4     DBMS_PROFILER.START_PROFILER('OPTIMIZE LEVEL 1');
  5     FOR i IN 1 .. 10000 LOOP
  6        c := 1;
  7     END LOOP;
  8     DBMS_PROFILER.STOP_PROFILER();
  9  END;
 10  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03

The PL/SQL Profiler report for this example is as follows (the SQL is provided in the download at the end of this article).


RUN_COMMENT          UNIT_NAME           LINE# TOTAL_OCCUR             TOTAL_TIME
-------------------- -------------- ---------- ----------- ----------------------
OPTIMIZE LEVEL 1     <anonymous>             5       10001              511829296
OPTIMIZE LEVEL 1     <anonymous>             6       10000              507050756
OPTIMIZE LEVEL 1     <anonymous>             8           1                 826641

3 rows selected.

We can see that line 6 (the assignment) occurs 10,000 times so clearly has not been compiled out. We will repeat the example at level 2 below.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;

Session altered.

10g> DECLARE
  2     c PLS_INTEGER;
  3  BEGIN
  4     DBMS_PROFILER.START_PROFILER('OPTIMIZE LEVEL 2');
  5     FOR i IN 1 .. 10000 LOOP
  6        c := 1;
  7     END LOOP;
  8     DBMS_PROFILER.STOP_PROFILER();
  9  END;
 10  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01

The Profiler report for this anonymous block under level 2 optimisation is as follows.


RUN_COMMENT          UNIT_NAME           LINE# TOTAL_OCCUR             TOTAL_TIME
-------------------- -------------- ---------- ----------- ----------------------
OPTIMIZE LEVEL 2     <anonymous>             5       10000              468979157
OPTIMIZE LEVEL 2     <anonymous>             8           1                 617676

2 rows selected.

At optimisation level 2, the static assignment at line 6 has been removed from the loop.

pl/sql optimisation and cursor-for-loops

For most of us, the PL/SQL we write is a mix of some computational logic and heavy data-access. In many legacy systems, cursor-for-loops prevail, despite the SQL mantra being practiced by many developers in recent years. An online article by Jonathan Lewis (see the link at the end of this article) describes a cursor-for-loop optimisation included in the 10g compiler. At level 2 (default), static cursor-for-loops will be re-written during compilation as bulk fetches with a limit of 100 elements. This is quite a significant optimisation that can give 15-20% performance improvements to existing code.

To see the benefits of this optimisation, we'll run a simple cursor-for-loop on 10g with SQL trace enabled. We'll run it first at level 0 and then again at level 2. We'll compare the wall-clocks and then examine the trace files for detail.

10g> exec DBMS_MONITOR.SESSION_TRACE_ENABLE;

PL/SQL procedure successfully completed.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;

Session altered.

10g> set timing on

10g> DECLARE
  2     i PLS_INTEGER := 0;
  3  BEGIN
  4     FOR r IN ( SELECT * FROM dba_source level_0 ) LOOP
  5        i := i + 1;
  6     END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:22.14

Optimisation at level 0 behaves the same as previous versions of Oracle. To loop through the records in DBA_SOURCE (nearly 290,000 records on the test database) took over 22 seconds. We can run the same example back at level 2 (default) as follows.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;

Session altered.

10g> DECLARE
  2     i PLS_INTEGER := 0;
  3  BEGIN
  4     FOR r IN ( SELECT * FROM dba_source level_2 ) LOOP
  5        i := i + 1;
  6     END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.71

This time the loop completes in less than 4 seconds for truly impressive performance gains. We'll format the trace files with tkprof for evidence of why this is the case (note from the zero cache misses that this was not the first time these PL/SQL blocks ran. This was deliberate to avoid "first run interference" in the timings).


********************************************************************************

SELECT * 
FROM
 DBA_SOURCE LEVEL_0 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch   289739      5.54       6.82       4894     290593          0      289738
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   289741      5.54       6.82       4894     290593          0      289738

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 57     (recursive depth: 1)

********************************************************************************

SELECT * 
FROM
 DBA_SOURCE LEVEL_2 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     2898      1.65       3.78       3711       8397          0      289738
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2900      1.65       3.78       3711       8397          0      289738

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 57     (recursive depth: 1)

The statistics for level 2 demonstrate that Oracle is fetching in batches of 100. The CPU time (largely due to the reduced context switching) has dropped significantly.

There are two important points worth making about the cursor-for-loop optimisation. Firstly, it only applies to cursor-for-loops and not explicit "fetch loops". So PL/SQL of the following structure will not be optimised, nor will ref cursors (regardless of whether they are static or dynamic).

OPEN cursor_name;
LOOP
   FETCH cursor_name INTO record_name;
   EXIT WHEN cursor_name%NOTFOUND;
   <<do some processing>>
END LOOP;
CLOSE cursor_name;

The second important point to note is that Oracle does not optimise any DML statements within the loop (i.e. it does not turn them into FORALL behind the scenes). We can demonstrate this quite easily. We'll create a copy of DBA_OBJECTS and re-run the previous example but insert each fetch into the demonstration table. We'll then look at the resulting trace file.

10g> CREATE TABLE t
  2  NOLOGGING
  3  AS
  4     SELECT *
  5     FROM   dba_source
  6     WHERE  ROWNUM < 1;

Table created.

10g> exec DBMS_MONITOR.SESSION_TRACE_ENABLE;

PL/SQL procedure successfully completed.

10g> BEGIN
  2     FOR r IN ( SELECT * FROM dba_source ) LOOP
  3        INSERT INTO t VALUES r;
  4     END LOOP;
  5  END;
  6  /

PL/SQL procedure successfully completed.

We can see from the following tkprof output that the fetches have been optimised as expected but the inserts have not.


********************************************************************************

SELECT * 
FROM
 DBA_SOURCE 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     2898      1.89       6.03       5341       8397          0      289738
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2900      1.90       6.05       5341       8397          0      289738


********************************************************************************

INSERT INTO T 
VALUES
 (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ) 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 289738     12.90      24.33          2       5976     334593      289738
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   289739     12.90      24.33          2       5976     334593      289738

optimisation bugs

We have seen that the PL/SQL compiler in 10g provides good performance gains without any additional effort by us and that one of the methods Oracle uses to achieve this is code re-organisation. However, the fact that the compiler is re-organising our code can raise issues, especially when the algorithms we use are complex or when we have "hidden" some logic (we will see an example of this below). Adrian Christie has provided two good examples of PL/SQL optimiser bugs he has discovered and we will investigate these below. Note that these bugs are reproducible on both major releases of 10g but are fixed in 11g.

example one: collections and loops

The first issue we will see is a surprisingly simple one. We will extend a collection variable to ten elements and loop through it, trimming ("popping") one element on each iteration. We therefore expect ten loop iterations. We will first run the example at PL/SQL optimisation level 0 (i.e. no optimisation), as follows.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;

Session altered.

10g> DECLARE
  2     TYPE element_ntt IS TABLE OF PLS_INTEGER;
  3     v_elements element_ntt := element_ntt();
  4  BEGIN
  5     v_elements.EXTEND(10);
  6     FOR i IN 1 .. v_elements.COUNT() LOOP
  7        DBMS_OUTPUT.PUT_LINE(v_elements.COUNT());
  8        v_elements.TRIM();
  9     END LOOP;
 10  END;
 11  /
10
9
8
7
6
5
4
3
2
1

PL/SQL procedure successfully completed.

The simple output demonstrates the expected behaviour and this is repeated if we test using level 1 optimisation. We will now run the same example under level 2 optimisation, as follows.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;

Session altered.

10g> DECLARE
  2     TYPE element_ntt IS TABLE OF PLS_INTEGER;
  3     v_elements element_ntt := element_ntt();
  4  BEGIN
  5     v_elements.EXTEND(10);
  6     FOR i IN 1 .. v_elements.COUNT() LOOP
  7        DBMS_OUTPUT.PUT_LINE(v_elements.COUNT());
  8        v_elements.TRIM();
  9     END LOOP;
 10  END;
 11  /
10
9
8
7
6
5

PL/SQL procedure successfully completed.

Oracle has given a different result this time. We know from our tests and the documentation that Oracle will re-organise, relocate and optimise our code, but it's difficult to see how it has generated just five loop iterations. If we profile the examples at level 0 and 2 (using DBMS_PROFILER) and compare the results, we see the following output (the query is available in the accompanying download).


RUN_COMMENT               LINE#   L0_OCCUR   L2_OCCUR
-------------------- ---------- ---------- ----------
Level 0 only                  3          1
Both levels                   5          1          1
Both levels                   6         11          7
Both levels                   7         10          6
Both levels                   8         10          6
Both levels                  10          1          1

6 rows selected.

According to these results, the initialisation of the collection variable on line 3 has been removed under level 2 optimisation. We can also see fewer loop operations under level 2, which seems to indicate that the collection has been initialised with too few elements. One of the workarounds we can use for this is to capture the initial collection count into a variable and use the variable to constrain the loop, as follows.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;

Session altered.

10g> DECLARE
  2     TYPE element_ntt IS TABLE OF PLS_INTEGER;
  3     v_elements element_ntt := element_ntt();
  4     v_count    PLS_INTEGER;
  5  BEGIN
  6     v_elements.EXTEND(10);
  7     v_count := v_elements.COUNT;
  8     FOR i IN 1 .. v_count LOOP
  9        DBMS_OUTPUT.PUT_LINE(v_elements.COUNT());
 10        v_elements.TRIM();
 11     END LOOP;
 12  END;
 13  /
10
9
8
7
6
5
4
3
2
1

PL/SQL procedure successfully completed.

We can only speculate the cause from these results. Either the collection was initialised with too few elements, the collection assignment was relocated or the loop boundary was incorrectly computed. It is more likely to be related to the initialisation of the collection, with the introduction of the v_count variable preventing Oracle from re-organising code in such a way that it incorrectly generates five elements.

example two: global variables and functions with side effects

The second example is slightly more complex, but it illustrates a common technique used in PL/SQL programs; namely "control-break logic" in a loop. With control-break logic, we continually compare a current record's value with the previous record to see if it has changed. In the example below, we will use the following logic in a small loop:

This is a contrived yet simple example of control-break logic. We can see clearly from these steps that the second time we will capture will be two seconds greater than the first. However, the code we will use to perform this logic will have some surprises and complexity. We will run the example at level 0 optimisation below and follow with a description of the code.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;

Session altered.

10g> DECLARE
  2
  3     TYPE record_type IS RECORD
  4     ( attribute PLS_INTEGER );
  5
  6     g_global record_type;
  7
  8     --------------------------------------------------
  9     FUNCTION function_with_side_effects
 10        RETURN PLS_INTEGER IS
 11     BEGIN
 12        g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP);
 13        RETURN EXTRACT(SECOND FROM SYSTIMESTAMP);
 14     END function_with_side_effects;
 15
 16     --------------------------------------------------
 17     PROCEDURE optimisable_procedure IS
 18
 19        v_last_time PLS_INTEGER := g_global.attribute;
 20
 21     BEGIN
 22
 23        FOR i IN 1 .. 5 LOOP
 24
 25           DBMS_LOCK.SLEEP(2);
 26
 27           DBMS_OUTPUT.PUT(
 28              'Last time=[' || v_last_time || ']; ' ||
 29              'This time=[' || function_with_side_effects() || ']'
 30              );
 31
 32           IF v_last_time != g_global.attribute THEN
 33              DBMS_OUTPUT.PUT_LINE('; The time has changed.');
 34              v_last_time := g_global.attribute;
 35           ELSE
 36              DBMS_OUTPUT.PUT_LINE('; The time has not changed.');
 37           END IF;
 38
 39        END LOOP;
 40
 41     END optimisable_procedure;
 42
 43     --------------------------------------------------
 44
 45  BEGIN
 46     g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP);
 47     optimisable_procedure();
 48  END;
 49  /
Last time=[50]; This time=[52]; The time has changed.
Last time=[52]; This time=[54]; The time has changed.
Last time=[54]; This time=[56]; The time has changed.
Last time=[56]; This time=[58]; The time has changed.
Last time=[58]; This time=[60]; The time has changed.

PL/SQL procedure successfully completed.

The results are as we predicted. In every loop iteration, we follow a capture-sleep-capture pattern, so the times clearly differ by the sleep interval. Some points to note about the code used for this example are as follows.

The example is written as an anonymous block but uses forward declarations to simulate the structure of a package body. In general, the code is a contrived combination of inconsistent techniques and logic, but most developers can probably find equivalent examples in their legacy systems! For example, note the variety of ways in which the global variable is referenced or assigned and the repetitive calls to EXTRACT(SECOND FROM SYSTIMESTAMP). Of course, the point of this example is to show how "sloppy-programming" can hide the developer's intentions from the PL/SQL optimising compiler in such a way that it eventually makes a wrong decision. We will see this below as we repeat the PL/SQL block at optimisation level 2.

10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;

Session altered.

10g> DECLARE
  2
  3     TYPE record_type IS RECORD
  4     ( attribute PLS_INTEGER );
  5
  6     g_global record_type;
  7
  8     --------------------------------------------------
  9     FUNCTION function_with_side_effects
 10        RETURN PLS_INTEGER IS
 11     BEGIN
 12        g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP);
 13        RETURN EXTRACT(SECOND FROM SYSTIMESTAMP);
 14     END function_with_side_effects;
 15
 16     --------------------------------------------------
 17     PROCEDURE optimisable_procedure IS
 18
 19        v_last_time PLS_INTEGER := g_global.attribute;
 20
 21     BEGIN
 22
 23        FOR i IN 1 .. 5 LOOP
 24
 25           DBMS_LOCK.SLEEP(2);
 26
 27           DBMS_OUTPUT.PUT(
 28              'Last time=[' || v_last_time || ']; ' ||
 29              'This time=[' || function_with_side_effects() || ']'
 30              );
 31
 32           IF v_last_time != g_global.attribute THEN
 33              DBMS_OUTPUT.PUT_LINE('; The time has changed.');
 34              v_last_time := g_global.attribute;
 35           ELSE
 36              DBMS_OUTPUT.PUT_LINE('; The time has not changed.');
 37           END IF;
 38
 39        END LOOP;
 40
 41     END optimisable_procedure;
 42
 43     --------------------------------------------------
 44
 45  BEGIN
 46     g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP);
 47     optimisable_procedure();
 48  END;
 49  /
Last time=[47]; This time=[47]; The time has not changed.
Last time=[49]; This time=[49]; The time has not changed.
Last time=[51]; This time=[51]; The time has not changed.
Last time=[53]; This time=[53]; The time has not changed.
Last time=[55]; This time=[55]; The time has not changed.

PL/SQL procedure successfully completed.

This time the results are clearly incorrect. Despite our intended sleep between the capture of the previous and current time, the results show that there is no difference between the two values. Our control-break logic is broken! On first glance at the results it appears as though the sleep call on line 25 has been factored out. We can compare PL/SQL Profiler results as we did for the first example and the results are as follows.


RUN_COMMENT               LINE#   L0_OCCUR   L2_OCCUR
-------------------- ---------- ---------- ----------
Both levels                  12          5          5
Both levels                  13          5          5
Both levels                  14          5          5
Both levels                  19          1          1
Both levels                  23          6          6
Both levels                  25          5          5
Both levels                  27          5          5
Both levels                  32          5          5
Level 0 only                 33          5
Level 0 only                 34          5
Level 2 only                 36                     5
Both levels                  41          1          1
Both levels                  46          1          1
Both levels                  47          1          1
Both levels                  48          1          1

15 rows selected.

According to the PL/SQL Profiler report, the only differences are in the control-break logic (due to the second run taking a different decision path). Obviously, the profiler data doesn't show the interpreted representation of our code, and in this case offers us no clues.

Incidentally, there are any number of ways that this bug can be avoided with the above example. For example, we can change the global record variable to a simple scalar variable, or we can change all direct assignments to/from the global variable to use a single consistent method. We can even move the initialisation of the global variable into the OPTIMISABLE procedure. In other words, do it properly!

Of course, this optimisation bug has serious implications for upgrade testing (although we would have to be extremely unlucky to hit it). It will probably be very difficult to spot incorrect behaviour due to optimisation bugs and might be the last place that developers will look (especially those who are unaware of this 10g feature). As ever, careful regression testing will be critical in any upgrade to 10g and lowering the default optimisation level is always an option. However, for those systems that are unaffected by this bug (probably most), the original premise of this article remains: that performance gains can be quite significant.

further reading

For more background to the re-development of the PL/SQL compiler, see these Oracle white papers. For Jonathan Lewis' article on cursor-for-loop optimisation, see DBAZine.

acknowledgements

Thanks to Adrian Christie for bringing the optimisation bugs to my attention and providing good test cases. I've adapted these for this article.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, November 2004 (updated March 2008)

Back to Top