pipelined function issues

Pipelined functions have been available in Oracle for several versions (and years) since 9i Release 1 and there are several related articles on oracle-developer.net. This latest article summarises some of the issues we might encounter when using pipelined functions in our applications.

Readers wishing to familiarise themselves with pipelined functions should read this oracle-developer.net article which provides all the necessary background to understanding the issues presented below.

issue 1: parallel pipelined functions and cursor variables

One of the major benefits of pipelined functions is that they can be parallelised. This means that Oracle's parallel query mechanism can be exploited to execute PL/SQL code in parallel, providing excellent performance gains. For a pipelined function to be parallel-enabled, however, it must take its source dataset from a ref cursor parameter (rather than have a static cursor defined inside the PL/SQL). Such a function would be used as follows (in pseudo-code):

SELECT *
FROM   TABLE(
          pipelined_function(
             CURSOR(SELECT * FROM staging_table))) --<-- input rowsource

The cursor parameter in a parallel-enabled pipelined function can be defined as either a weak or strong refcursor. The first issue we will see is that when we use a cursor variable to pass to the pipelined function (instead of the direct CURSOR function call as above), Oracle's parallel DML fails.

It should be stated at the outset that this problem only occurs when we are using parallel DML to load a table from the resultset of a pipelined function (i.e. not when simply selecting from the function). This is important, however, because parallel DML from pipelined functions is how Oracle most commonly demonstrates the technology in its articles and documentation.

We will setup a small example of a parallel insert from a parallel pipelined function. We will begin by creating a source and target table of the same structure (for simplicity) as follows.

SQL> CREATE TABLE source_table ( x PRIMARY KEY, y )
  2  PARALLEL
  3  NOLOGGING
  4  AS
  5     SELECT ROWNUM, CAST(NULL AS INTEGER)
  6     FROM   dual
  7     CONNECT BY ROWNUM <= 10000;

Table created.

SQL> CREATE TABLE target_table
  2  PARALLEL
  3  NOLOGGING
  4  AS
  5     SELECT *
  6     FROM   source_table;

Table created.

We will now create the types required for our pipelined function (an object type to define a record and a collection type for buffering arrays of this record), as follows.

SQL> CREATE TYPE target_table_row AS OBJECT
  2  ( x INT, y INT );
  3  /

Type created.

SQL> CREATE TYPE target_table_rows
  2     AS TABLE OF target_table_row;
  3  /

Type created.

To complete our setup, we will create a parallel-enabled pipelined function (in a package) as follows. The cursor parameter in this example is defined as a SYS_REFCURSOR (built-in weak refcursor type).

SQL> CREATE OR REPLACE PACKAGE etl_pkg AS
  2  
  3     FUNCTION pipelined_fx (p_cursor IN SYS_REFCURSOR)
  4        RETURN target_table_rows PIPELINED
  5        PARALLEL_ENABLE (PARTITION p_cursor BY ANY);
  6  
  7  END etl_pkg;
  8  /

Package created.

The function, implemented in the package body below, simply pipes out the input dataset. Obviously, this is not a true representation of what pipelined functions are needed for, but it keeps the example short and simple.

SQL> CREATE OR REPLACE PACKAGE BODY etl_pkg AS
  2  
  3     FUNCTION pipelined_fx (p_cursor IN SYS_REFCURSOR)
  4        RETURN target_table_rows PIPELINED
  5        PARALLEL_ENABLE (PARTITION p_cursor BY ANY) IS
  6  
  7        TYPE cursor_ntt IS TABLE OF source_table%ROWTYPE;
  8        nt_src_data cursor_ntt;
  9  
 10     BEGIN
 11  
 12        LOOP
 13           FETCH p_cursor BULK COLLECT INTO nt_src_data LIMIT 100;
 14  
 15           FOR i IN 1 .. nt_src_data.COUNT LOOP
 16              PIPE ROW (target_table_row(
 17                           nt_src_data(i).x, nt_src_data(i).y
 18                           ));
 19           END LOOP;
 20  
 21           EXIT WHEN p_cursor%NOTFOUND;
 22        END LOOP;
 23  
 24        CLOSE p_cursor;
 25        RETURN;
 26  
 27     END pipelined_fx;
 28  
 29  END etl_pkg;
 30  /

Package body created.

We will now test the function by inserting its output into the target table we created earlier. We enable both parallel query and parallel DML to avoid having to use any hints (we already parallel-enabled both the source and target tables). Note how we use the CURSOR function to supply the refcursor parameter to our pipelined function.

SQL> ALTER SESSION ENABLE PARALLEL QUERY;

Session altered.

SQL> ALTER SESSION ENABLE PARALLEL DML;

Session altered.

SQL> BEGIN
  2
  3     INSERT INTO target_table
  4     SELECT *
  5     FROM   TABLE(
  6               etl_pkg.pipelined_fx(
  7                  CURSOR(SELECT * FROM source_table) ));
  8
  9     DBMS_OUTPUT.PUT_LINE( SQL%ROWCOUNT || ' rows inserted.' );
 10
 11     ROLLBACK;
 12
 13  END;
 14  /

10000 rows inserted.

PL/SQL procedure successfully completed.

As we are using a refcursor parameter in our pipelined function, we might wish to use a cursor variable instead of a direct CURSOR function call. For example, our source data cursor might be large and complex. Therefore, we might wish to avoid having to embed a large SQL statement in a packaged function call. The following example demonstrates, however, that this will not work with parallel DML statements. This is true of all versions up to and including 11.1.0.6 (this is the latest version tested).

SQL> DECLARE
  2
  3     rc SYS_REFCURSOR;
  4
  5  BEGIN
  6
  7     OPEN RC FOR SELECT * FROM source_table;
  8
  9     INSERT INTO target_table
 10     SELECT *
 11     FROM   TABLE(etl_pkg.pipelined_fx(rc));
 12
 13     DBMS_OUTPUT.PUT_LINE( SQL%ROWCOUNT || ' rows inserted.' );
 14
 15     ROLLBACK;
 16
 17  END;
 18  /

DECLARE
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P007
ORA-01008: not all variables bound
ORA-06512: at line 9

The cursor variable method works in serial mode. If we disable parallel DML, the statement succeeds as follows.

SQL> ALTER SESSION DISABLE PARALLEL DML;

Session altered.

SQL> DECLARE
  2
  3     rc SYS_REFCURSOR;
  4
  5  BEGIN
  6
  7     OPEN RC FOR SELECT * FROM source_table;
  8
  9     INSERT INTO target_table
 10     SELECT *
 11     FROM   TABLE(etl_pkg.pipelined_fx(rc));
 12
 13     DBMS_OUTPUT.PUT_LINE( SQL%ROWCOUNT || ' rows inserted.' );
 14
 15     ROLLBACK;
 16
 17  END;
 18  /

10000 rows inserted.

PL/SQL procedure successfully completed.

This issue has been recorded as bug 5349930 and still exists in 11.1.0.6.

issue 2: performance with wide records

One of the uses for pipelined functions is to replace row-by-row inserts/updates with a piped rowsource that is bulk loaded. For example, the following "traditional" PL/SQL-based ETL technique is slow and inefficient.

FOR rec IN (SELECT * FROM source_data) LOOP
   ...prepare table_A variables...
   INSERT INTO table_A VALUES (...);
END LOOP;

Assuming the "prepare table_A variables" stage in the above pseudo-code is sufficiently complex to warrant the use of PL/SQL, pipelined functions can be used to exploit bulk SQL techniques as follows.

INSERT INTO table_A (...)
SELECT ...
FROM   TABLE( pipelined_fx( CURSOR(SELECT * FROM source_table) ) );

We can achieve some good performance gains by adopting this technique, especially if we use parallel-enabled pipelined functions. There is, however, an issue with "wide" records where the efficiency of pipelined functions degrades quite dramatically to the point where the row-by-row alternative is faster.

To demonstrate this issue, we are going to manufacture some variable-width records and simply compare the time it takes to load one table using row-by-row inserts and bulk insert from a pipelined function. We will begin by creating a source and target table with just three columns to demonstrate the type of comparison we will be making.

SQL> CREATE TABLE src
  2  CACHE
  3  AS
  4     SELECT 'xxxxxxxxxx' AS c1
  5     ,      'xxxxxxxxxx' AS c2
  6     ,      'xxxxxxxxxx' AS c3
  7     FROM   dual
  8     CONNECT BY ROWNUM <= 50000;

Table created.

SQL> CREATE TABLE tgt
  2  AS
  3     SELECT *
  4     FROM   src
  5     WHERE  ROWNUM < 1;

Table created.

We will now create an ETL package containing the two methods of loading that we wish to compare. The specification is as follows.

SQL> CREATE PACKAGE etl_pkg AS
  2
  3     PROCEDURE row_by_row;
  4
  5     PROCEDURE bulk_from_pipeline;
  6
  7     TYPE piped_rows IS TABLE OF tgt%ROWTYPE;
  8
  9     FUNCTION pipelined_fx(p_cursor IN SYS_REFCURSOR)
 10        RETURN piped_rows PIPELINED;
 11
 12  END etl_pkg;
 13  /

Package created.

Note that we have defined a PL/SQL-based collection type for our pipelined function. This method can raise issues in itself (as we will see later in this article), but for the purposes of this demonstration, it aligns the collection to however many columns the target table happens to have at runtime (if the table changes, Oracle will recompile the package when we next execute it and create a new underlying collection type to support the pipelined function).

The package body is as follows. We can see that the ROW_BY_ROW and BULK_FROM_PIPELINE procedures implement the two types of loading we saw in pseudo-code earlier. For simplicity, the pipelined function pipes out the input dataset without any modifications.

SQL> CREATE PACKAGE BODY etl_pkg AS
  2
  3     PROCEDURE row_by_row IS
  4     BEGIN
  5        FOR r IN (SELECT * FROM src) LOOP
  6           INSERT INTO tgt VALUES r;
  7        END LOOP;
  8        COMMIT;
  9     END row_by_row;
 10
 11     PROCEDURE bulk_from_pipeline IS
 12     BEGIN
 13        INSERT INTO tgt
 14        SELECT *
 15        FROM   TABLE(
 16                  etl_pkg.pipelined_fx(
 17                     CURSOR( SELECT * FROM src )));
 18        COMMIT;
 19     END bulk_from_pipeline;
 20
 21     FUNCTION pipelined_fx(p_cursor IN SYS_REFCURSOR)
 22        RETURN piped_rows PIPELINED IS
 23        nt piped_rows;
 24     BEGIN
 25        LOOP
 26           FETCH p_cursor BULK COLLECT INTO nt LIMIT 100;
 27           FOR i IN 1 .. nt.COUNT LOOP
 28              PIPE ROW (nt(i));
 29           END LOOP;
 30           EXIT WHEN p_cursor%NOTFOUND;
 31        END LOOP;
 32        RETURN;
 33     END pipelined_fx;
 34
 35  END etl_pkg;
 36  /

Package body created.

With just three columns to begin, we will compare the two methods using the wall-clock as follows. Note that we have deliberately turned off the PL/SQL compiler optimisation introduced in 10g (this optimisation turns cursor-for-loops into implicit bulk fetches).

SQL> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;

Session altered.

SQL> set timing on

SQL> exec etl_pkg.row_by_row;

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.82

SQL> exec etl_pkg.bulk_from_pipeline;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.59

At such low data volume, the timings are reasonably meaningless (note that the source table data was already queried before the test, so the effects of PIOs vs LIOs should be mitigated). Nevertheless, the pipelined function method is noticeably faster.

We will now create a small procedure that will rebuild our tables with the number of columns we supply. We will use this to compare ROW_BY_ROW and BULK_FROM_PIPELINE at various record-widths.

SQL> CREATE PROCEDURE rebuild_tables( p_cols IN NUMBER ) IS
  2
  3     v_ddl VARCHAR2(32767) := 'CREATE TABLE src CACHE AS SELECT ';
  4
  5     PROCEDURE drop_table( p_table IN VARCHAR2 ) IS
  6        x_no_such_table EXCEPTION;
  7        PRAGMA EXCEPTION_INIT(x_no_such_table, -942);
  8     BEGIN
  9        EXECUTE IMMEDIATE 'DROP TABLE ' || p_table;
 10     EXCEPTION
 11        WHEN x_no_such_table THEN NULL;
 12     END drop_table;
 13
 14  BEGIN
 15
 16     drop_table('SRC');
 17     drop_table('TGT');
 18
 19     FOR i IN 1 .. p_cols LOOP
 20        v_ddl := v_ddl || '''xxxxxxxxxx'' AS c' || i || ',';
 21     END LOOP;
 22
 23     v_ddl := RTRIM(v_ddl,',') ||
 24              ' FROM dual CONNECT BY ROWNUM <= 50000';
 25
 26     EXECUTE IMMEDIATE v_ddl;
 27
 28     v_ddl := 'CREATE TABLE tgt
 29               AS
 30                  SELECT *
 31                  FROM src
 32                  WHERE ROWNUM < 1';
 33
 34     EXECUTE IMMEDIATE v_ddl;
 35
 36     DBMS_STATS.GATHER_TABLE_STATS(user,'SRC');
 37
 38     v_ddl := 'ALTER PACKAGE etl_pkg COMPILE';
 39
 40  END;
 41  /

Procedure created.

The following table shows the timings for both methods at 50, 100 and 150 columns on 10.2, 10.1 and 9.2 databases. All timings are from a second run of each procedure and, where relevant, 10g PL/SQL optimisation is disabled.

Columns	10.2 Row (s)	10.2 Bulk (s)	10.1 Row (s)	10.1 Bulk (s)	9.2 Row (s)	9.2 Bulk (s)
50	9.07	4.37	9.90	8.65	7.07	8.00
100	14.23	12.98	15.79	18.42	13.05	17.01
150	14.90	19.04	24.82	30.86	18.06	29.01

The point at which the bulk method becomes slower than row-by-row processing is highlighted for each database version and we can see that it has improved with each release (the patterns are the same with repeated runs of the test). In 9i, the pipelined function performance degrades with records of around 50 attributes or fewer, while in 10g Release 2 (and since confirmed in 11g Release 1) it is with records of somewhere around 150 attributes.

To dig deeper than the "wall-clock" allows, we will compare the row-by-row and bulk pipelined function inserts using a variation on Tom Kyte's RUNSTATS utility (available here). In the following example, we compare the loads at 150 columns on a 10.2 database to see if we can determine the cause of the performance degradation.

SQL> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

SQL> exec etl_pkg.row_by_row;

PL/SQL procedure successfully completed.

SQL> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

SQL> exec etl_pkg.bulk_from_pipeline;

PL/SQL procedure successfully completed.

SQL> exec runstats_pkg.rs_stop(5000);

Run1 ran in 2111 hsecs
Run2 ran in 2243 hsecs
Run1 ran in 94.12% of the time


Name                                  Run1        Run2        Diff
LATCH.row cache objects             17,417       9,661      -7,756
STAT..IMU Redo allocation size           0       8,672       8,672
LATCH.object queue header oper      74,739      65,960      -8,779
LATCH.session allocation            26,480      17,555      -8,925
LATCH.simulator hash latch          32,361      21,450     -10,911
LATCH.simulator lru latch           29,913      18,400     -11,513
LATCH.cache buffers lru chain       63,670      51,941     -11,729
STAT..table scan blocks gotten      50,176      12,729     -37,447
STAT..consistent gets               66,133      28,152     -37,981
STAT..consistent gets from cac      66,133      28,152     -37,981
STAT..no work - consistent rea      52,058      14,005     -38,053
STAT..execute count                 50,870         505     -50,365
LATCH.shared pool                   57,480       5,370     -52,110
STAT..redo entries                 114,550      52,821     -61,729
STAT..db block gets                143,282      81,419     -61,863
STAT..db block gets from cache     143,282      81,419     -61,863
STAT..db block changes             165,904      66,971     -98,933
STAT..session logical reads        209,415     109,571     -99,844
LATCH.library cache pin            104,611       3,265    -101,346
LATCH.library cache                108,966       6,403    -102,563
STAT..recursive calls              111,548       7,402    -104,146
STAT..table scan rows gotten       204,973      55,321    -149,652
STAT..physical read bytes          319,488     548,864     229,376
STAT..physical read total byte     319,488     548,864     229,376
STAT..session uga memory max             0     261,856     261,856
STAT..session pga memory          -327,680      65,536     393,216
LATCH.cache buffers chains         755,709     360,644    -395,065
STAT..undo change vector size    3,181,600   1,004,352  -2,177,248
STAT..redo size                122,201,564  88,791,160 -33,410,404


Run1 latches total versus run2 -- difference and pct
Run1        Run2        Diff        Pct
1,350,719     635,670    -715,049    212.49%

PL/SQL procedure successfully completed.

We can see that the row-by-row method uses far more resources than the bulk pipelined function method and yet it is still quicker. Many of the resource differences we see above are directly attributable to row-by-row insert inefficiencies (such as the volume of redo and the additional latching). Note that the "table scan rows gotten" statistic is misleading: we haven't accidentally loaded TGT with 204,000 rows in the row-based version (these additional rows must be related to recursive SQL).

Reversing the running order of the test makes no difference to the overall patterns we've seen in timings and resource statistics. As a final investigation, we will run the load under SQL trace as follows.

SQL> exec DBMS_MONITOR.SESSION_TRACE_ENABLE;

PL/SQL procedure successfully completed.

SQL> exec etl_pkg.row_by_row;

PL/SQL procedure successfully completed.

SQL> exec etl_pkg.bulk_from_pipeline;

PL/SQL procedure successfully completed.

SQL> exec DBMS_MONITOR.SESSION_TRACE_DISABLE;

PL/SQL procedure successfully completed.

If we run the trace file through TKProf, we see the following statements for the row-by-row load.

********************************************************************************

SELECT * 
FROM
 SRC


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    50001      1.79       1.67          0      50011          0       50000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    50002      1.79       1.68          0      50011          0       50000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 54     (recursive depth: 1)
********************************************************************************

INSERT INTO TGT 
VALUES
 (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 ,:B14 ,
  :B15 ,:B16 ,:B17 ,:B18 ,:B19 ,:B20 ,:B21 ,:B22 ,:B23 ,:B24 ,:B25 ,:B26 ,
  :B27 ,:B28 ,:B29 ,:B30 ,:B31 ,:B32 ,:B33 ,:B34 ,:B35 ,:B36 ,:B37 ,:B38 ,
  :B39 ,:B40 ,:B41 ,:B42 ,:B43 ,:B44 ,:B45 ,:B46 ,:B47 ,:B48 ,:B49 ,:B50 ,
  :B51 ,:B52 ,:B53 ,:B54 ,:B55 ,:B56 ,:B57 ,:B58 ,:B59 ,:B60 ,:B61 ,:B62 ,
  :B63 ,:B64 ,:B65 ,:B66 ,:B67 ,:B68 ,:B69 ,:B70 ,:B71 ,:B72 ,:B73 ,:B74 ,
  :B75 ,:B76 ,:B77 ,:B78 ,:B79 ,:B80 ,:B81 ,:B82 ,:B83 ,:B84 ,:B85 ,:B86 ,
  :B87 ,:B88 ,:B89 ,:B90 ,:B91 ,:B92 ,:B93 ,:B94 ,:B95 ,:B96 ,:B97 ,:B98 ,
  :B99 ,:B100 ,:B101 ,:B102 ,:B103 ,:B104 ,:B105 ,:B106 ,:B107 ,:B108 ,:B109 ,
  :B110 ,:B111 ,:B112 ,:B113 ,:B114 ,:B115 ,:B116 ,:B117 ,:B118 ,:B119 ,:B120 
  ,:B121 ,:B122 ,:B123 ,:B124 ,:B125 ,:B126 ,:B127 ,:B128 ,:B129 ,:B130 ,
  :B131 ,:B132 ,:B133 ,:B134 ,:B135 ,:B136 ,:B137 ,:B138 ,:B139 ,:B140 ,:B141 
  ,:B142 ,:B143 ,:B144 ,:B145 ,:B146 ,:B147 ,:B148 ,:B149 ,:B150 ) 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute  50000      3.39       7.45          2      11217     145770       50000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    50000      3.39       7.45          2      11217     145770       50000

Misses in library cache during parse: 0
Misses in library cache during execute: 2
Optimizer mode: ALL_ROWS
Parsing user id: 54     (recursive depth: 1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                         2        0.02          0.02
  log buffer space                               18        0.45          2.85
  log file switch completion                      7        0.49          1.24
  latch: cache buffers chains                     1        0.00          0.00
  latch: shared pool                              3        0.00          0.00
********************************************************************************

We can see that the individual user-SQL components of the row-by-row load have not really accounted for much time, nor did they spend much time waiting for resources. If we look at the bulk pipelined load SQL, we see the following statistics.

********************************************************************************

SELECT "A3"."C1" "C1","A3"."C2" "C2","A3"."C3" "C3","A3"."C4" "C4","A3"."C5" 
  "C5","A3"."C6" "C6","A3"."C7" "C7","A3"."C8" "C8","A3"."C9" "C9","A3"."C10" 
  "C10","A3"."C11" "C11","A3"."C12" "C12","A3"."C13" "C13","A3"."C14" "C14",
  "A3"."C15" "C15","A3"."C16" "C16","A3"."C17" "C17","A3"."C18" "C18",
  
  ...<snip>...
  
  "A3"."C143" "C143","A3"."C144" "C144","A3"."C145" "C145","A3"."C146" "C146",
  "A3"."C147" "C147","A3"."C148" "C148","A3"."C149" "C149","A3"."C150" "C150" 
FROM
 "SRC" "A3"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      501      1.71       1.80          0      12511          0       50000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      503      1.71       1.80          0      12511          0       50000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 54     (recursive depth: 2)
********************************************************************************

INSERT INTO TGT SELECT * FROM TABLE( ETL_PKG.PIPELINED_FX( CURSOR( SELECT * 
  FROM SRC )))


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      1     16.98      17.50          1      12024      78652       50000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        1     16.98      17.50          1      12024      78652       50000

Misses in library cache during parse: 0
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 54     (recursive depth: 1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                         1        0.00          0.00
  log file switch completion                      5        0.21          0.29
  log buffer space                                2        0.13          0.18
********************************************************************************

The source cursor for the bulk load is faster still, but the bulk insert statement has accounted for the majority of the runtime. Again the waits are trivial, but we'd need to be able to investigate lower still to find out what, if anything, is causing Oracle's reduced performance at this level. It has been suggested that it is Oracle's handling of object types (that underpins pipelined functions) that might contribute to this performance issue but we cannot prove this from the above data.

issue 3: versioned objects: ora-04043

The implementation of all pipelined functions is supported by object and collection types. Oracle provides three ways of defining these types, but most developers will either create their own object and collection types explicitly or rely on "versioned types" that Oracle generates from PL/SQL packaged record and collection declarations. There is an interesting bug with versioned types that is worthy of note in this article.

To demonstrate the issue, we will create a pipelined function that relies on versioned types for its implementation. To do this, we simply create a package with a global record type, a collection type based on this record type and the pipelined function itself, as follows.

SQL> CREATE PACKAGE etl_pkg AS
  2
  3     TYPE plsql_record_type IS RECORD
  4     ( a1 VARCHAR2(30)
  5     , a2 VARCHAR2(30)
  6     , a3 VARCHAR2(30)
  7     );
  8
  9     TYPE plsql_nested_table_type
 10        IS TABLE OF plsql_record_type;
 11
 12     FUNCTION parallel_fx
 13        RETURN plsql_nested_table_type
 14        PIPELINED;
 15
 16  END etl_pkg;
 17  /

Package created.

Remember that pipelined functions require SQL types to be able to pipe collections of data to the consumer. Because we have only declared PL/SQL types to support our pipelined function, Oracle creates the SQL types on our behalf. We can see these in the dictionary using the following type of query.

SQL> WITH t AS (
  2     SELECT object_id AS o
  3     FROM   user_objects
  4     WHERE  object_name = 'ETL_PKG'
  5     AND    object_type = 'PACKAGE'
  6     )
  7  SELECT type_name, typecode
  8  FROM   user_types
  9  WHERE  type_name LIKE 'SYS%'
 10  AND    type_name LIKE '%' || (SELECT o FROM t) || '%';

TYPE_NAME                      TYPECODE
------------------------------ ------------------------------
SYS_PLSQL_53640_33_1           COLLECTION
SYS_PLSQL_53640_9_1            OBJECT
SYS_PLSQL_53640_DUMMY_1        COLLECTION

3 rows selected.

The type names are system-generated but contain the object ID of the package that the types were created for (as an aside, if we base our PL/SQL types on a table%ROWTYPE, the corresponding object ID will be that of the table).

The issue we will see arises when we create synonyms for the versioned types. We are unlikely to do this knowingly (we have no need to access these types; access to the package is all that is required to execute the pipelined function). Nevertheless, some environments automatically generate synonyms for all objects created for their applications, so we will replicate something similar. In the following example, we will imagine that the supplied SH schema is our application user, with SCOTT being the application owner. SCOTT creates synonyms in the SH schema for all of its objects, as follows.

SQL> DECLARE
  2     v_ddl VARCHAR2(1024) := 'CREATE SYNONYM sh."%s" FOR "%s"';
  3  BEGIN
  4     FOR r IN (SELECT object_name FROM user_objects) LOOP
  5        EXECUTE IMMEDIATE
  6           REPLACE(v_ddl, '%s', r.object_name);
  7     END LOOP;
  8  END;
  9  /

PL/SQL procedure successfully completed.

Versioned types are so-named for a reason. If we recompile the package specification and repeat our query over USER_TYPES, we see that the trailing integer in the type name increases, as follows.

SQL> ALTER PACKAGE etl_pkg COMPILE;

Package altered.

SQL> WITH t AS (
  2     SELECT object_id AS o
  3     FROM   user_objects
  4     WHERE  object_name = 'ETL_PKG'
  5     AND    object_type = 'PACKAGE'
  6     )
  7  SELECT type_name, typecode
  8  FROM   user_types
  9  WHERE  type_name LIKE 'SYS%'
 10  AND    type_name LIKE '%' || (SELECT o FROM t) || '%';

TYPE_NAME
------------------------------
SYS_PLSQL_53640_33_2
SYS_PLSQL_53640_9_2
SYS_PLSQL_53640_DUMMY_2

3 rows selected.

Our supporting types have been re-generated by Oracle and a new "version" has been created. If we use CREATE OR REPLACE, the "version numbers" are reset to 1. However, because we have already recompiled this package, it appears as though we have no more chances to change its state, as follows.

SQL> ALTER PACKAGE etl_pkg COMPILE;

ALTER PACKAGE etl_pkg COMPILE
*
ERROR at line 1:
ORA-04043: object SYS_PLSQL_53640_33_1 does not exist

We have hit bug number 3744836. We receive the same message if we try to drop the package, as follows.

SQL> DROP PACKAGE etl_pkg;

DROP PACKAGE etl_pkg
*
ERROR at line 1:
ORA-04043: object SYS_PLSQL_53640_33_1 does not exist

Finally, if we try to replace the package specification, we receive the following compilation errors.

SQL> CREATE OR REPLACE PACKAGE etl_pkg AS
  2     c INTEGER;
  3  END etl_pkg;
  4  /

Warning: Package created with compilation errors.

SQL> sho err

Errors for PACKAGE ETL_PKG:

LINE/COL ERROR
-------- -----------------------------------------------------------------
0/0      ORA-04043: object SYS_PLSQL_53640_33_1 does not exist

Note the version number of the type that we are being told does not exist. We know from above that our system-generated types have been incremented to version 2. We can also verify that we have no version 1 types as follows.

SQL> WITH t AS (
  2     SELECT object_id AS o
  3     FROM   user_objects
  4     WHERE  object_name = 'ETL_PKG'
  5     AND    object_type = 'PACKAGE'
  6     )
  7  SELECT type_name
  8  FROM   user_types
  9  WHERE  type_name LIKE 'SYS%'
 10  AND    type_name LIKE 'SYS_PLSQL_' || (SELECT o FROM t) || '%_1';

no rows selected

Of course, we created some synonyms earlier in the application user's schema (SH) and these are at version 1. But these are in another schema and we are trying to drop our own objects. Fortunately, we are able to see the root of this problem quite easily in a SQL trace file, so we will enable SQL trace and attempt to drop the ETL_PKG package, as follows.

SQL> ALTER SESSION SET SQL_TRACE = TRUE;

Session altered.

SQL> DROP PACKAGE etl_pkg;

DROP PACKAGE etl_pkg
*
ERROR at line 1:
ORA-04043: object SYS_PLSQL_53640_33_1 does not exist

SQL> ALTER SESSION SET SQL_TRACE = FALSE;

Session altered.

The trace file leads us to the cause of this problem. Before Oracle can drop, recompile or replace a package with versioned types, it needs to identify the associated types and drop these first. In 10.2, Oracle uses the following SQL statement to identify the types. Each type is then dropped in turn.

=====================
PARSING IN CURSOR #25 len=109 dep=1 uid=0 oct=3 lid=0 tim=78367384109 hv=2962406971 ad='1d176f3c'
select UNIQUE name from obj$ where name like 'SYS_PLSQL@_53640@_%' escape '@'   and type# != 10 order by name
END OF STMT

Note that there is no schema reference in this query. We can run this query directly, as follows.

SQL> SELECT UNIQUE name
  2  FROM   obj$
  3  WHERE  name LIKE 'SYS_PLSQL@_53640@_%' ESCAPE '@'
  4  AND    type# != 10
  5  ORDER  BY
  6         name;

NAME
------------------------------
SYS_PLSQL_53640_33_1
SYS_PLSQL_53640_33_2
SYS_PLSQL_53640_9_1
SYS_PLSQL_53640_9_2
SYS_PLSQL_53640_DUMMY_1
SYS_PLSQL_53640_DUMMY_2

6 rows selected.

The query returns the synonyms before our own schema's types! This means that Oracle will try to drop the version 1 synonym first, which of course doesn't exist as a type in our schema. To resolve this issue on 10.2, we can simply drop the redundant synonyms, as follows.

SQL> BEGIN
  2     FOR r IN (SELECT synonym_name
  3               FROM   dba_synonyms
  4               WHERE  owner = 'SH'
  5               AND    synonym_name LIKE 'SYS_PLSQL%')
  6     LOOP
  7        EXECUTE IMMEDIATE
  8           'DROP SYNONYM sh.' || r.synonym_name;
  9     END LOOP;
 10  END;
 11  /

PL/SQL procedure successfully completed.

We can now test that Oracle's recursive SQL statement returns the correct versioned type details, as follows.

SQL> SELECT UNIQUE name
  2  FROM   obj$
  3  WHERE  name LIKE 'SYS_PLSQL@_53640@_%' ESCAPE '@'
  4  AND    type# != 10
  5  ORDER  BY
  6         name;

NAME
------------------------------
SYS_PLSQL_53640_33_2
SYS_PLSQL_53640_9_2
SYS_PLSQL_53640_DUMMY_2

3 rows selected.

We should now be able to drop the package.

SQL> DROP PACKAGE etl_pkg;

Package dropped.

a special note for 11g release 1

Note that this bug is fixed in 11g. Oracle has changed its versioned type naming rules to resolve the issue altogether. In 11g, regardless of whether we recompile or replace a package, Oracle retains the corresponding versioned type names (in other words, their number suffix doesn't increment). Hence, Oracle's recursive query against OBJ$ to determine the corresponding type names will always return the correct values, enabling the versioned types to be dropped.

a special note for 9i and 10g release 1

Prior to 10g Release 2, Oracle used a slightly different recursive statement to identify versioned types belonging to a package. The following is taken from a 10.1 trace file.

=====================
PARSING IN CURSOR #29 len=78 dep=1 uid=0 oct=3 lid=0 tim=79199728262 hv=1397641108 ad='21e1cf10'
select UNIQUE name from obj$ where name like 'SYS_PLSQL_53766_%' order by name
END OF STMT

The main difference with this earlier version of the recursive SQL is that there is no restriction on the OBJ$.TYPE# column. On a 10.1 database with the same example code as above, this query returns the following values both before and after the synonyms are dropped.

SQL> SELECT UNIQUE name
  2  FROM   obj$
  3  WHERE  name LIKE 'SYS_PLSQL_53766_%'
  4  ORDER  BY
  5         name;

NAME
------------------------------
SYS_PLSQL_53766_33_1
SYS_PLSQL_53766_33_2
SYS_PLSQL_53766_9_1
SYS_PLSQL_53766_9_2
SYS_PLSQL_53766_DUMMY_1
SYS_PLSQL_53766_DUMMY_2

6 rows selected.

This is critical because when we drop synonyms, Oracle updates the corresponding OBJ$ record to TYPE#=10, rather than delete the record from the dictionary. We can see this below by querying OBJ$ after the synonyms are dropped.

SQL> SELECT name, type#
  2  FROM   obj$
  3  WHERE  name LIKE 'SYS_PLSQL_53766_%'
  4  ORDER  BY
  5         name;

NAME                                TYPE#
------------------------------ ----------
SYS_PLSQL_53766_33_1                   10
SYS_PLSQL_53766_33_2                   13
SYS_PLSQL_53766_9_1                    10
SYS_PLSQL_53766_9_2                    13
SYS_PLSQL_53766_DUMMY_1                10
SYS_PLSQL_53766_DUMMY_2                13

6 rows selected.

This means, of course, that we cannot drop, recompile or recreate the package until the obsolete synonym entries (TYPE#=10) are removed from OBJ$. This cleanup will be performed by SMON after a database bounce (or sometimes after an indeterminate period of time), after which point the package and associated versioned types can be dropped.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, September 2007 (updated May 2008)