subprogram inlining in 11g
The release of Oracle 10g brought with it the first optimising compiler for PL/SQL. As discussed in this oracle-developer.net article, Oracle added two levels of optimisation to provide some impressive performance gains without changing a line of code.
Oracle 11g has taken compiler optimisation further, in particular with the concept of subprogram inlining. With subprogram inlining, Oracle will replace a call to a subroutine (such as a function) with the subroutine code itself during compilation. One of the benefits of this is that we can continue to write well-structured, modular code without any performance penalties. For SQL-intensive PL/SQL programs, the gains from inlining might be marginal, but for procedural code, inlining might provide some greater optimisation benefits, as we will see in this article.
a simple example
We will begin with an extremely simple example of subprogram inlining. In the following example, we request that Oracle inlines function F. We do this with the new PRAGMA INLINE syntax as follows.
SQL> DECLARE 2 3 n PLS_INTEGER; 4 5 FUNCTION f RETURN PLS_INTEGER IS 6 BEGIN 7 RETURN 10; 8 END f; 9 10 BEGIN 11 12 PRAGMA INLINE(f, 'YES'); 13 n := f(); 14 DBMS_OUTPUT.PUT_LINE(n); 15 16 END; 17 /
10 PL/SQL procedure successfully completed.
The PRAGMA INLINE syntax for a subprogram can take either a 'YES' or 'NO' value and precedes the first call to the subprogram. It will be effective for subsequent calls to the same subprogram within a statement (unless a subsequent 'NO' pragma overrides it). In the PL/SQL Language Reference, Oracle states:
When the INLINE pragma immediately precedes one of the following statements, the pragma affects every call to the specified subprogram in that statement:
- Assignment
- Call
- Conditional
- CASE
- CONTINUE-WHEN
- EXECUTE IMMEDIATE
- EXIT-WHEN
- LOOP
- RETURN
There is a wide range of permutations for the behaviour of the PRAGMA INLINE directive which can be found in the documentation (a link is provided at the end of this article).
To reiterate, with subprogram inlining, Oracle will re-write our code to replace a call or calls to a subprogram with the executable code of the subprogram itself. We can imagine that our example above would be re-ordered during compilation to resemble the following pseudo-code (or rather, what would be compiled if we wrote the following "manually-inlined" code).
SQL> DECLARE 2 n PLS_INTEGER; 3 BEGIN 4 n := 10; --<-- this might be moved to line 2 5 DBMS_OUTPUT.PUT_LINE(n); --<-- this might include the constant 10 6 END; 7 /
In the "optimised" version of our original program, we can see several further opportunities for optimisation and it is possible that Oracle will remove the variable "n" entirely. Of course, Oracle doesn't re-order or re-write the PL/SQL itself, just the compiled representation of it (either C or "p-code" depending on our compilation method). The PL/SQL we issue remains unchanged in the USER/ALL/DBA_SOURCE views.
investigating subprogram inlining
So far, we have seen a single example of the PRAGMA INLINE syntax and have stated that the PL/SQL compiler will re-write our code during compilation to include the subprogram's logic inline. But how can we investigate this without access to the re-written (and compiled) PL/SQL? Fortunately, there is a new application available in 11g named the PL/SQL Hierarchical Profiler. This new profiler (invoked via a new built-in package, DBMS_HPROF) records and reports the execution statistics for SQL and PL/SQL organised by subprograms (i.e. a hierarchy of function and procedure calls).
For the remainder of this article, we will use the Hierarchical Profiler to investigate when our subprograms are "optimised out" by the new PL/SQL compiler. We will begin by repeating our original example and demonstrating that the F function was inlined. To do this, we must include a couple of calls to DBMS_HPROF (to start and stop the trace collection) as follows.
SQL> DECLARE 2 3 n PLS_INTEGER; 4 5 FUNCTION f RETURN PLS_INTEGER IS 6 BEGIN 7 RETURN 10; 8 END f; 9 10 BEGIN 11 12 DBMS_HPROF.START_PROFILING( location => 'LOG_DIR', 13 filename => 'inline.trc' ); 14 15 PRAGMA INLINE(f, 'YES'); 16 n := f(); 17 DBMS_OUTPUT.PUT_LINE(n); 18 19 DBMS_HPROF.STOP_PROFILING; 20 21 END; 22 /
10 PL/SQL procedure successfully completed.
We start the trace by specifying a directory and trace file to write to (LOG_DIR is a directory created for a previous article). We stop profiling when the code we wish to trace is complete. The trace file itself is reasonably legible, but we can analyse it using the DBMS_HPROF.ANALYZE API. This will write the profiler data to a set of tables (to create these run ?/rdbms/admin/dbmshptab.sql in the schema that owns the code to be profiled). We analyse the trace file as follows.
SQL> DECLARE 2 n NUMBER; 3 BEGIN 4 n := DBMS_HPROF.ANALYZE('LOG_DIR','inline.trc'); 5 DBMS_OUTPUT.PUT_LINE('Runid = ' || n); 6 END; 7 /
Runid = 56 PL/SQL procedure successfully completed.
The RUNID gives us the key to accessing the profiler data. There are three tables, prefixed DBMSHP_, with a variety of information, some of which helps us to see the effects of subprogram inlining, as follows.
SQL> SELECT function 2 , line# 3 , calls 4 , subtree_elapsed_time AS sub_ela 5 , function_elapsed_time AS func_ela 6 FROM dbmshp_function_info 7 WHERE runid = 56;
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA ---------------- ---------- ---------- ---------- ---------- STOP_PROFILING 53 1 0 0 NEW_LINE 117 1 5 5 PUT 77 1 47 31 PUT_INIT 67 1 16 16 PUT_LINE 109 1 64 12 5 rows selected.
We can see a number of subprogram calls (including the DBMS_OUTPUT module calls) but no "F" function. This is because it was optimised out (i.e. inlined) by the compiler. If we run the same example, but without the PRAGMA INLINE directive (or with it set to 'NO'), we see the F function appear in the trace, as follows.
SQL> SELECT function 2 , line# 3 , calls 4 , subtree_elapsed_time AS sub_ela 5 , function_elapsed_time AS func_ela 6 FROM dbmshp_function_info 7 WHERE runid = 57;
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA -------------------- ---------- ---------- ---------- ---------- __anonymous_block.F 5 1 4 4 STOP_PROFILING 53 1 0 0 NEW_LINE 117 1 4 4 PUT 77 1 30 16 PUT_INIT 67 1 14 14 PUT_LINE 109 1 43 9 6 rows selected.
As a tool to help us understand the execution profile of our code in terms of subprograms, the new Hierarchical Profiler is extremely useful. It tells us the time spent inside a module and also the time spent in subprogram calls (i.e. the module we are tracing might not actually be the problem: it might be one of the subprogram calls). It also clearly demonstrates the effects of subprogram inlining and we will use this method for the remaining examples in this article. We will not discuss the Profiler itself in any further detail, however, as the online documentation gives a good overview of the reporting capabilities of the tool (see Further Reading below).
inlining multiple subprograms
When we supply the PRAGMA INLINE instruction for a given subprogram, this does not cascade to the further subprogram calls contained within that module. In the following example, we specify function F3 is to be inlined. This function makes calls to functions F1 and F2, which will not be inlined. First we profile the code, as follows.
SQL> DECLARE 2 3 n1 PLS_INTEGER := 0; 4 5 FUNCTION f1 RETURN PLS_INTEGER IS 6 BEGIN 7 RETURN DBMS_RANDOM.VALUE(1,100); 8 END f1; 9 10 FUNCTION f2 RETURN PLS_INTEGER IS 11 BEGIN 12 RETURN DBMS_RANDOM.VALUE(10,50); 13 END f2; 14 15 FUNCTION f3 (p IN PLS_INTEGER) RETURN PLS_INTEGER IS 16 BEGIN 17 RETURN p + f1() + f2(); 18 END f3; 19 20 BEGIN 21 22 DBMS_HPROF.START_PROFILING('LOG_DIR', 'inline.trc'); 23 24 FOR i IN 1 .. 10000 LOOP 25 PRAGMA INLINE(f3,'YES'); 26 n1 := n1 + f3(i); 27 END LOOP; 28 29 DBMS_HPROF.STOP_PROFILING; 30 31 DBMS_OUTPUT.PUT_LINE('Value of n1 is ' || n1); 32 33 END; 34 /
Value of n1 is 50810766 PL/SQL procedure successfully completed.
The profiler report below shows that F3 was inlined but F1 and F2 were not as they did not have their own PRAGMA INLINE instructions.
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA ---------------------- ---------- ---------- ---------- ---------- __anonymous_block.F1 5 10000 131431 47645 __anonymous_block.F2 10 10000 132004 48282 STOP_PROFILING 53 1 0 0 VALUE 69 20000 68686 68686 VALUE 103 20000 167508 98822 5 rows selected.
There is a mechanism available to automatically inline as many of our subprograms as possible, however, as we will see below.
plsql_optimize_level
Subprogram inlining is not enabled by default. In our earlier examples, we asked Oracle to inline specific modules, but there is a way to enable subprogram inlining globally. Most readers will be aware of the PLSQL_OPTIMIZE_LEVEL parameter (it is discussed in the background article referenced earlier). This parameter controls the depth of the optimisations that the PL/SQL compiler will undertake. Oracle has extended the scope of this parameter in 11g, so it now has the following settings.
- Level 0: no compiler optimisations (PL/SQL compiled as is);
- Level 1: high-level optimisations (such as moving constants out of loops);
- Level 2: default level. Aggressive optimisations (such as rewriting cursor-for-loops as array fetches) and in 11g, also inlining any subprograms that we request with PRAGMA INLINE;
- Level 3: most aggressive level: New in 11g, this will inline all subprograms where possible (excluding those contained in built-in packages).
Our examples so far have been at the default level of 2, which is why we needed to request inlining for our subprograms using the PRAGMA INLINE syntax. At level 3, however, we do not need this syntax as the compiler will automatically attempt to inline all subprograms where possible. We can see this below, where we will remove the pragma and run an earlier example at optimisation levels 2 and 3. We will begin with level 2 (default).
SQL> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;
Session altered.
SQL> DECLARE 2 3 v_int PLS_INTEGER; 4 c_dir CONSTANT VARCHAR2(10) := 'LOG_DIR'; 5 c_trc CONSTANT VARCHAR2(10) := 'inline.trc'; 6 7 FUNCTION f RETURN PLS_INTEGER IS 8 BEGIN 9 RETURN 10; 10 END f; 11 12 BEGIN 13 14 DBMS_HPROF.START_PROFILING(c_dir, c_trc); 15 v_int := f(); 16 DBMS_HPROF.STOP_PROFILING; 17 18 DBMS_OUTPUT.PUT_LINE( 19 'Runid = ' || DBMS_HPROF.ANALYZE(c_dir, c_trc) 20 ); 21 22 END; 23 /
Runid = 61 PL/SQL procedure successfully completed.
Having analyzed the trace file above, the Hierarchical Profiler report shows that (unsurprisingly) the function F is not inlined at level 2 without the PRAGMA INLINE instruction.
SQL> SELECT function 2 , line# 3 , calls 4 , subtree_elapsed_time AS sub_ela 5 , function_elapsed_time AS func_ela 6 FROM dbmshp_function_info 7 WHERE runid = 61;
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA --------------------- ---------- ---------- ---------- ---------- __anonymous_block.F 7 1 4 4 STOP_PROFILING 53 1 0 0 2 rows selected.
We will now run the same code under level 3 optimisation, as follows.
SQL> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 3;
Session altered.
SQL> DECLARE 2 3 v_int PLS_INTEGER;
<<...snip...>> 18 DBMS_OUTPUT.PUT_LINE( 19 'Runid = ' || DBMS_HPROF.ANALYZE(c_dir, c_trc) 20 ); 21 22 END; 23 / Runid = 62 PL/SQL procedure successfully completed.
SQL> SELECT function 2 , line# 3 , calls 4 , subtree_elapsed_time AS sub_ela 5 , function_elapsed_time AS func_ela 6 FROM dbmshp_function_info 7 WHERE runid = 62;
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA ------------------- ---------- ---------- ---------- ---------- STOP_PROFILING 53 1 0 0 1 row selected.
At optimisation level 3, our subprogram is inlined automatically. To avoid this behaviour for a particular subprogram (should we wish to), we need to use the PRAGMA INLINE syntax with an instruction of 'NO'. Oracle states that it would be extremely rare for inlining to cause a performance degradation, however, so we might never have the cause to disable it.
performance benefits
So far we have concentrated on the mechanics of subprogram inlining and as stated earlier, the documentation has a wide range of permutations and rules regarding the PRAGMA INLINE syntax and behaviour. We will complete this article with a look at the performance improvements we might expect from this new optimisation.
pl/sql-only subprograms
We will begin our performance investigations by testing a simple loop with multiple calls to a PL/SQL-only subprogram (i.e. there is no SQL in the subprogram). We will measure the performance gains of inlining using a simple timer (note that the Hierarchical Profiler timings could also be used for this purpose).
SQL> DECLARE 2 3 n1 PLS_INTEGER := 0; 4 5 FUNCTION f (p IN PLS_INTEGER) RETURN PLS_INTEGER IS 6 BEGIN 7 RETURN DBMS_RANDOM.VALUE(1,1000); 8 END f; 9 10 BEGIN 11 12 timer.snap(); 13 FOR i IN 1 .. 1000000 LOOP 14 PRAGMA INLINE(f,'NO'); 15 n1 := n1 + f(i); 16 END LOOP; 17 timer.show('FUNCTION'); 18 19 timer.snap(); 20 FOR i IN 1 .. 1000000 LOOP 21 PRAGMA INLINE(f,'YES'); 22 n1 := n1 + f(i); 23 END LOOP; 24 timer.show('INLINE '); 25 26 END; 27 /
[FUNCTION] 2.82 seconds [INLINE ] 2.75 seconds PL/SQL procedure successfully completed.
We can see that the gains from inlining are marginal (the test was run several times and showed the same results from the second run onwards; the first run had "interference" associated with parsing/optimisation and DBMS_RANDOM initialisation). This is perhaps to be expected. The calls to F will comprise a very small proportion of the work carried out by Oracle. The majority of the time will be spent in DBMS_RANDOM (which cannot be inlined).
For most systems, the previous results will be less than inspirational. However, in scenarios where the call to the subprogram accounts for a larger proportion of the run time, the performance gains from inlining might be more significant. We will slightly modify our previous example to demonstrate this. In the loop (scaled by an order of magnitude to exaggerate the results), we will assign the result of the function call to the numeric variable rather than keep a running total. In addition, we will remove the calls to built-in packages (DBMS_RANDOM in the previous example). The example is as follows.
SQL> DECLARE 2 3 n1 PLS_INTEGER; 4 n2 PLS_INTEGER; 5 6 FUNCTION f (p IN PLS_INTEGER) RETURN PLS_INTEGER IS 7 BEGIN 8 RETURN p; 9 END f; 10 11 BEGIN 12 13 timer.snap(); 14 FOR i IN 1 .. 10000000 LOOP 15 n1 := i; 16 PRAGMA INLINE(f,'NO'); 17 n2 := f(n1); 18 END LOOP; 19 timer.show('FUNCTION'); 20 21 timer.snap(); 22 FOR i IN 1 .. 10000000 LOOP 23 n1 := i; 24 PRAGMA INLINE(f,'YES'); 25 n2 := f(n1); 26 END LOOP; 27 timer.show('INLINE '); 28 29 END; 30 /
[FUNCTION] 2.40 seconds [INLINE ] 0.46 seconds PL/SQL procedure successfully completed.
This time we have impressive performance gains from inlining. A repeat of the test with the Hierarchical Profiler running and using 1 million iterations per loop instead of 10 million shows the following results.
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA ---------------------- ---------- ---------- ---------- ---------- __anonymous_block.F 6 1000000 1991944 1991944 STOP_PROFILING 53 1 0 0 NEW_LINE 117 2 7 7 PUT 77 2 24 13 PUT_INIT 67 1 11 11 PUT_LINE 109 2 43 12 GET_TIME 289 4 23 23 ELAPSED 38 2 29 17 REFORMAT 44 2 37 37 SHOW 63 2 197 88 SNAP 26 2 48 37 11 rows selected.
As we requested, the second call to function F was inlined and we can see this above (i.e. 1 million calls from 2 million loop interations in our example overall). It might appear as though the second loop was removed altogether, but we can use the PL/SQL Profiler (available since 8i) to confirm that the inlined version retains the loop. The following report is a summary of the profiler data from a re-run of the above example with the PL/SQL Profiler running.
RUNID UNIT_NAME LINE# TOTAL_OCCUR TOTAL_TIME ---------- -------------- ---------- ----------- ---------------- 24 <anonymous> 8 1000000 54193775745 24 <anonymous> 9 1000000 75635759014 24 <anonymous> 13 1 11127391 24 <anonymous> 14 1000001 57348801593 24 <anonymous> 15 1000000 54923122352 24 <anonymous> 16 1000000 0 24 <anonymous> 17 1000000 145286542284 24 <anonymous> 19 1 2864051 24 <anonymous> 21 1 350323 24 <anonymous> 22 1000001 54518495227 24 <anonymous> 23 1000000 49953127003 24 <anonymous> 24 1000000 0 24 <anonymous> 25 1000000 44516146884 24 <anonymous> 27 1 3561625 24 <anonymous> 28 1 1035606 24 TIMER 30 2 8896382 24 TIMER 31 2 421003 <<...snip...>> 24 TIMER 130 2 1948292 37 rows selected.
We can see in lines 16 and 24 that the PRAGMA INLINE instructions cost nothing in terms of time. We can also see quite clearly that both loops have been retained (lines 14 and 22) and they took approximately the same amount of time. The performance gain inside the second loop is attributable largely to inlining: the 1 million calls to subprogram F (line 17) took 3 times longer than the 1 million inlined direct assignments on line 25.
subprograms containing sql
Finally, we will measure the performance benefits of inlining a subprogram with embedded SQL. From our observations so far, we would expect these gains to be marginal. In the following example, we will make multiple calls to a small lookup function and measure the impact of inlining.
SQL> DECLARE 2 3 n1 PLS_INTEGER; 4 n2 PLS_INTEGER; 5 6 FUNCTION f (p IN PLS_INTEGER) RETURN PLS_INTEGER IS 7 n PLS_INTEGER; 8 BEGIN 9 SELECT MOD(p,100) INTO n FROM dual; 10 RETURN n; 11 END f; 12 13 BEGIN 14 15 timer.snap(); 16 FOR i IN 1 .. 100000 LOOP 17 n1 := i; 18 PRAGMA INLINE(f,'NO'); 19 n2 := f(n1); 20 END LOOP; 21 timer.show('FUNCTION'); 22 23 timer.snap(); 24 FOR i IN 1 .. 100000 LOOP 25 n1 := i; 26 PRAGMA INLINE(f,'YES'); 27 n2 := f(n1); 28 END LOOP; 29 timer.show('INLINE '); 30 31 END; 32 /
[FUNCTION] 3.86 seconds [INLINE ] 3.64 seconds PL/SQL procedure successfully completed.
Not surprisingly, the gains from inlining are tiny. The main consumer of time in this example is the SQL itself, as we can see if we repeat the example using the Hierarchical Profiler. The report is as follows.
FUNCTION LINE# CALLS SUB_ELA FUNC_ELA -------------------------- --------- ---------- ---------- ---------- __anonymous_block.F 6 100000 4375478 714279 STOP_PROFILING 53 1 0 0 NEW_LINE 117 2 6 6 PUT 77 2 26 15 PUT_INIT 67 1 11 11 PUT_LINE 109 2 46 14 GET_TIME 289 4 25 25 ELAPSED 38 2 28 16 REFORMAT 44 2 46 46 SHOW 63 2 206 86 SNAP 26 2 48 35 __static_sql_exec_line27 27 100000 3599126 3599126 __static_sql_exec_line9 9 100000 3661199 3661199 13 rows selected.
The embedded SQL statement in F (__static_sql_exec_line9) accounted for 84% of the elapsed time of the function overall, so it's clearly unlikely that moving this SQL statement around would make a difference. The benefits of inlining function F would have to be carved from the 0.7 seconds spent in the function but not executing the SQL. Some systems require this level of "tuning to the nth degree" but for most of us, we have lower hanging fruit to go after! The results of inlining the second set of calls to subprogram F can be seen at line 27 (__static_sql_exec_line27).
performance conclusion
To summarise, therefore, the gains from inlining will usually be marginal except under very specific circumstances where the executable section of the subprogram is very quick (i.e. similar to the time it takes to call the subprogram). That said, for packages with large numbers of subprograms, the gains might start to accumulate. For some applications, this will be critical. At the very least, developers can no longer use performance degradation as a reason not to modularise (not that performance was ever a good reason not to modularise of course). We can write well-structured, modular programs and have the PL/SQL optimiser take care of the performance.
further reading
For more information on PRAGMA INLINE, especially the rules on precedence and placement, see the PL/SQL Language Reference. For an overview of the PLSQL_OPTIMIZE_LEVEL parameter, read the Database Reference. A detailed discussion of the Hierarchical Profiler is available in the Advanced Application Developer's Guide. The TIMER package used in some of the performance examples is available here.
source code
The source code for the examples in this article can be downloaded from here.
Adrian Billington, November 2007
Back to Top