sorting collections

Steven Feuerstein once wrote that having spent years persuading PL/SQL developers to modularise their code, he was moving on to encourage their use of collections. Indeed, collections have become more widely used by developers in SQL and PL/SQL code, possibly due to the bulk PL/SQL techniques introduced in 8i and associative array features of 9i. Recent Oracle releases have also added more features for working with collections, such as the multiset operations introduced in 10g. In fact, Oracle is now rich in features for working with collections, with one important exception: sorting arrays of data. This article demonstrates a small number of techniques that we can adopt for sorting our collections.

setup

All of the examples in this article will use a nested table type, because this can be used in both SQL and PL/SQL (unlike associative arrays which are PL/SQL-only). Where alternative collection types can be used (i.e. associative arrays or VARRAYs), this will be noted. Our collection type is defined as follows.

SQL> CREATE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000);
  2  /

Type created.

To keep the initial examples short and simple, we will wrap a single small collection in a view, as follows.

SQL> CREATE VIEW v_collection
  2  AS
  3     SELECT varchar2_ntt(
  4               'Bananas', 'Oranges', 'Apples', 'Toaster Ovens'
  5               ) AS collection
  6     FROM   dual;

View created.

Our collection contains four unordered elements and it is returned to us in the same order when we query it from the view, as follows.

SQL> SELECT collection FROM v_collection;

COLLECTION
-----------------------------------------------------------------
VARCHAR2_NTT('Bananas', 'Oranges', 'Apples', 'Toaster Ovens')

1 row selected.

We will now demonstrate some techniques for re-ordering the four elements in our sample collection below.

sorting pre-populated collections in sql

The easiest way to sort a pre-populated collection is to use SQL, as follows.

SQL> SELECT column_value
  2  FROM   v_collection
  3  ,      TABLE(collection)
  4  ORDER  BY
  5         column_value;

COLUMN_VALUE
--------------------
Apples
Bananas
Oranges
Toaster Ovens

4 rows selected.

This method is extremely simple. We have used the TABLE() operator to convert the collection to a rowsource, meaning of course that it can be ordered with an ORDER BY clause. The TABLE() operator has been available since Oracle 8i and works with nested tables and VARRAYs (but not with associative arrays).

If we want to retain the collection itself (albeit with sorted elements), we can wrap this SQL technique in a simple function, as follows.

SQL> CREATE FUNCTION sort_collection (
  2                  p_collection IN varchar2_ntt
  3                  ) RETURN varchar2_ntt IS
  4
  5     v_collection varchar2_ntt;
  6
  7  BEGIN
  8
  9     SELECT column_value BULK COLLECT INTO v_collection
 10     FROM   TABLE(p_collection)
 11     ORDER  BY
 12            column_value;
 13
 14     RETURN v_collection;
 15
 16  END sort_collection;
 17  /

Function created.

In this collection sorter, we convert the original collection into an ordered rowsource with the TABLE() operator and an ORDER BY clause, then bulk fetch it into a collection of the same type for returning. We use it as follows.

SQL> SELECT SORT_COLLECTION(collection) AS sorted_collection
  2  FROM   v_collection;

SORTED_COLLECTION
-----------------------------------------------------------------
VARCHAR2_NTT('Apples', 'Bananas', 'Oranges', 'Toaster Ovens')

1 row selected.

As to be expected, the elements of the collection are re-ordered alphabetically. Note that to keep the example simple, it only supports ascending sorts. For a link to a version of this example that supports ascending, descending and distinct sorts, see the further reading section at the end of this article.

sorting pre-populated collections with pl/sql

An alternative technique for sorting collections is to use a PL/SQL associative array to re-order the elements. To demonstrate this, we will create our function first and then describe the technique below.

SQL> CREATE FUNCTION sort_collection_plsql (
  2                  p_collection IN varchar2_ntt
  3                  ) RETURN varchar2_ntt IS
  4
  5     TYPE sorter_aat IS TABLE OF PLS_INTEGER
  6        INDEX BY VARCHAR2(4000);
  7
  8     v_collection varchar2_ntt := varchar2_ntt();
  9     v_sorter     sorter_aat;
 10     v_sorter_idx VARCHAR2(4000);
 11
 12  BEGIN
 13
 14     /* Sort the collection using the sorter array... */
 15     FOR i IN 1 .. p_collection.COUNT LOOP
 16        v_sorter_idx := p_collection(i);
 17        v_sorter(v_sorter_idx) := CASE
 18                                     WHEN v_sorter.EXISTS(v_sorter_idx)
 19                                     THEN v_sorter(v_sorter_idx) + 1
 20                                     ELSE 1
 21                                  END;
 22     END LOOP;
 23
 24     /* Assign sorted elements back to collection... */
 25     v_sorter_idx := v_sorter.FIRST;
 26     WHILE v_sorter_idx IS NOT NULL LOOP
 27
 28        /* Handle multiple copies of same value... */
 29        FOR i IN 1 .. v_sorter(v_sorter_idx) LOOP
 30           v_collection.EXTEND;
 31           v_collection(v_collection.LAST) := v_sorter_idx;
 32        END LOOP;
 33
 34        v_sorter_idx := v_sorter.NEXT(v_sorter_idx);
 35
 36     END LOOP;
 37
 38     RETURN v_collection;
 39
 40  END sort_collection_plsql;
 41  /

Function created.

Some notes on the techniques used are as follows:

Lines 5-6: we use a string-indexed associative array to sort our collection and keep a log of the number of times each collection element occurs. The associative array index allows for collection elements of VARCHAR2(4000) and is of course ordered by definition;
Lines 15-22: we loop through our collection (this example assumes that it is densely packed) and assign each element to the index of the associative array. We also increment a count for the number of times each collection element appears (i.e. to handle duplicate values);
Lines 25-36: once our collection has been copied to the associative array, we can build our sorted return collection. We do this by cycling through the associative array in index order. Each index value (remember that these are our original collection values) is assigned to our return collection the same number of times it originally occurred;
Line 38: we return a sorted copy of our original collection.

We use our PL/SQL-based sort function in the same way as our original SQL-based function, as follows.

SQL> SELECT SORT_COLLECTION_PLSQL(collection) AS sorted_collection
  2  FROM   v_collection;

SORTED_COLLECTION
-----------------------------------------------------------------
VARCHAR2_NTT('Apples', 'Bananas', 'Oranges', 'Toaster Ovens')

1 row selected.

Note that this technique also supports the sorting and return of associative arrays, in addition to nested tables and VARRAYs. If the sorted collection is never to be referenced or used in SQL, then an associative array can be used in place of a nested table type. It is only when using the collection in SQL that we require a SQL type such as a nested table.

handling repeating elements

Note that our PL/SQL implementation explicitly handles repeating or duplicate collection elements. The SQL implementation handles this naturally, of course, which is one reason why our first sorter function is much more simple to code than the second. We can test that both functions handle repeating elements by passing two copies of the same collection to be sorted. We will use the MULTISET UNION collection operation for this (this appends one collection to another).

We will begin by testing our SQL implementation, as follows.

SQL> SELECT sort_collection(
  2            collection MULTISET UNION collection
  3            ) AS sorted_collection
  4  FROM   v_collection;

SORTED_COLLECTION
-----------------------------------------------------------------
VARCHAR2_NTT('Apples', 'Apples', 'Bananas', 'Bananas', 'Oranges',
 'Oranges', 'Toaster Ovens', 'Toaster Ovens')

1 row selected.

We can see that the collection is sorted and each element appears twice. We will repeat the test with our PL/SQL implementation, as follows.

SQL> SELECT sort_collection_plsql(
  2            collection MULTISET UNION collection
  3            ) AS sorted_collection
  4  FROM   v_collection;

SORTED_COLLECTION
-----------------------------------------------------------------
VARCHAR2_NTT('Apples', 'Apples', 'Bananas', 'Bananas', 'Oranges',
 'Oranges', 'Toaster Ovens', 'Toaster Ovens')

1 row selected.

Again, our collection is sorted and repeating elements are handled correctly.

performance considerations

We have two implementations of a collection sorting function, but which should we use? We will run two simple performance tests to determine which is the most efficient to use. First we will compare many sorts of a tiny collection and second we will compare a small number of sorts for a large collection. We will use a version of Tom Kyte's RUNSTATS utility to compare the resources and time used by each method.

Before we begin our tests, however, we will recompile our functions to use native compilation. This will make them as efficient as possible (although we would expect this to have a greater positive impact on our PL/SQL-based function).

SQL> ALTER FUNCTION sort_collection COMPILE PLSQL_CODE_TYPE = NATIVE;

Function altered.

SQL> ALTER FUNCTION sort_collection_plsql COMPILE PLSQL_CODE_TYPE = NATIVE;

Function altered.

SQL> SELECT name
  2  ,      plsql_code_type
  3  FROM   user_plsql_object_settings
  4  WHERE  name LIKE 'SORT_COLLECTION%';

NAME                           PLSQL_CODE_TYPE
------------------------------ --------------------
SORT_COLLECTION                NATIVE
SORT_COLLECTION_PLSQL          NATIVE

2 rows selected.

performance test one: sorting small collections

We will first compare 10,000 sorts of a tiny collection, as follows.

SQL> DECLARE
  2     v_small_collection  varchar2_ntt := varchar2_ntt();
  3     v_sorted_collection varchar2_ntt := varchar2_ntt();
  4     v_iterations        PLS_INTEGER := 10000;
  5  BEGIN
  6
  7     /* Assign small collection... */
  8     v_small_collection := varchar2_ntt('B','A','D','C');
  9
 10     runstats_pkg.rs_start();
 11
 12     /* Run 1: sort in SQL... */
 13     FOR i IN 1 .. v_iterations LOOP
 14        v_sorted_collection := sort_collection(v_small_collection);
 15     END LOOP;
 16
 17     runstats_pkg.rs_middle();
 18
 19     /* Run 2: sort in PL/SQL... */
 20     FOR i IN 1 .. v_iterations LOOP
 21        v_sorted_collection := sort_collection_plsql(v_small_collection);
 22     END LOOP;
 23
 24     runstats_pkg.rs_stop(100);
 25
 26  END;
 27  /

================================================================================
Runstats report : 20-NOV-2009 19:18:51
================================================================================


--------------------------------------------------------------------------------
1. Summary timings
--------------------------------------------------------------------------------
Run1 ran in 243 hsecs
Run2 ran in 21 hsecs
Run2 was 91.4% quicker than Run1


--------------------------------------------------------------------------------
2. Statistics report
--------------------------------------------------------------------------------


Type  Name                                        Run1         Run2         Diff
----- ----------------------------------- ------------ ------------ ------------
STAT  recursive cpu usage                          192            2         -190
STAT  CPU used by this session                     221           19         -202
STAT  execute count                             10,001            1      -10,000
STAT  opened cursors cumulative                 10,001            1      -10,000
STAT  recursive calls                           11,144        1,144      -10,000
STAT  session cursor cache hits                 10,001            1      -10,000
STAT  sorts (memory)                            10,000            0      -10,000
STAT  workarea executions - optimal             10,001            1      -10,000
LATCH shared pool                               10,003            1      -10,002
STAT  index fetch by key                        20,000            0      -20,000
STAT  rows fetched via callback                 20,000            0      -20,000
STAT  table fetch by rowid                      20,000            0      -20,000
STAT  calls to get snapshot scn: kcmgss         30,001            1      -30,000
STAT  buffer is not pinned count                40,000            0      -40,000
STAT  sorts (rows)                              40,000            0      -40,000
STAT  consistent gets                           60,000            0      -60,000
STAT  consistent gets - examination             60,000            0      -60,000
STAT  consistent gets from cache                60,000            0      -60,000
STAT  session logical reads                     60,000            0      -60,000
LATCH cache buffers chains                      60,005            1      -60,004
STAT  session uga memory                             0      246,880      246,880
STAT  session pga memory                             0      262,144      262,144


--------------------------------------------------------------------------------
3. Latching report
--------------------------------------------------------------------------------
Run1 used 70,405 latches
Run2 used 271 latches
Run2 used 99.6% fewer latches than Run1


================================================================================
End of report
================================================================================

PL/SQL procedure successfully completed.

As we can see, the time-spans for this test are small, but the PL/SQL method took one-tenth of the time of the SQL method. The report shows the higher LIOs and latching that the SQL method incurred (but also demonstrates that all sorts took place in memory). To sort small collections, therefore, it appears that the PL/SQL method is the most efficient.

performance test two: sorting large collections

For our second test, we will run fewer sorts, but with a much larger collection. We will populate a collection with the object names from ALL_OBJECTS (this gives us approximately 71,000 elements on this test 11.2 database). Our comparison is as follows.

SQL> DECLARE
  2     v_large_collection  varchar2_ntt := varchar2_ntt();
  3     v_sorted_collection varchar2_ntt := varchar2_ntt();
  4     v_iterations        PLS_INTEGER := 100;
  5  BEGIN
  6
  7     /* Assign large collection... */
  8     SELECT object_name BULK COLLECT INTO v_large_collection
  9     FROM   all_objects;
 10
 11     runstats_pkg.rs_start;
 12
 13     /* Run 1: sort in SQL... */
 14     FOR i IN 1 .. v_iterations LOOP
 15        v_sorted_collection := sort_collection(v_large_collection);
 16     END LOOP;
 17
 18     runstats_pkg.rs_middle;
 19
 20     /* Run 2: sort in PL/SQL... */
 21     FOR i IN 1 .. v_iterations LOOP
 22        v_sorted_collection := sort_collection_plsql(v_large_collection);
 23     END LOOP;
 24
 25     runstats_pkg.rs_stop(100);
 26
 27  END;
 28  /

================================================================================
Runstats report : 20-NOV-2009 19:21:04
================================================================================


--------------------------------------------------------------------------------
1. Summary timings
--------------------------------------------------------------------------------
Run1 ran in 1321 hsecs
Run2 ran in 3209 hsecs
Run1 was 58.8% quicker than Run2


--------------------------------------------------------------------------------
2. Statistics report
--------------------------------------------------------------------------------


Type  Name                                        Run1         Run2         Diff
----- ----------------------------------- ------------ ------------ ------------
STAT  execute count                                101            1         -100
STAT  opened cursors cumulative                    101            1         -100
STAT  recursive calls                            1,244        1,144         -100
STAT  session cursor cache hits                    101            1         -100
STAT  sorts (memory)                               100            0         -100
LATCH channel operations parent latch               65          187          122
LATCH checkpoint queue latch                        81          226          145
LATCH JS queue state obj latch                      72          252          180
LATCH messages                                     108          289          181
STAT  index fetch by key                           200            0         -200
STAT  rows fetched via callback                    200            0         -200
STAT  table fetch by rowid                         200            0         -200
STAT  workarea executions - optimal                201            1         -200
LATCH SQL memory manager workarea list lat         471          740          269
STAT  calls to get snapshot scn: kcmgss            301            1         -300
LATCH row cache objects                             31          419          388
STAT  buffer is not pinned count                   400            0         -400
LATCH enqueues                                     160          570          410
LATCH enqueue hash chains                          163          598          435
STAT  consistent gets                              600            0         -600
STAT  consistent gets - examination                600            0         -600
STAT  consistent gets from cache                   600            0         -600
STAT  session logical reads                        600            0         -600
STAT  recursive cpu usage                        1,126            2       -1,124
STAT  CPU used by this session                   1,202        2,883        1,681
STAT  session uga memory max                    65,512            0      -65,512
STAT  session pga memory max                    65,536            0      -65,536
STAT  session uga memory                             0      246,880      246,880
STAT  sorts (rows)                           7,149,800            0   -7,149,800
STAT  session pga memory                    -9,240,576    2,686,976   11,927,552


--------------------------------------------------------------------------------
3. Latching report
--------------------------------------------------------------------------------
Run1 used 2,117 latches
Run2 used 4,816 latches
Run1 used 56% fewer latches than Run2


================================================================================
End of report
================================================================================

PL/SQL procedure successfully completed.

This time the results are very different. The SQL method is far more efficient at sorting large collections and is over twice as fast as the PL/SQL-based method. The statistics report looks slightly strange in places (particularly the memory statistics) but we can conclude that we should use the SQL technique for sorting larger collections.

sort during fetch

The main focus of this article is sorting pre-populated collections. However, we can also generate pre-sorted collections using various techniques, as described below.

pre-sorting with multiset

The MULTISET function has been available since Oracle 8 and enables us to populate a collection from a SQL query. In the following example, we will load the ENAME values from the EMP table into a sorted collection.

SQL> SELECT CAST(
  2            MULTISET(
  3               SELECT ename FROM emp ORDER BY ename
  4               ) AS varchar2_ntt ) AS ordered_emps
  5  FROM   dual;

ORDERED_EMPS
-----------------------------------------------------------------
VARCHAR2_NTT('ADAMS', 'ALLEN', 'BLAKE', 'CLARK', 'FORD', 'JAMES',
 'JONES', 'KING', 'MARTIN', 'MILLER', 'SCOTT', 'SMITH', 'TURNER',
 'WARD')

1 row selected.

pre-sorting with the collect function

The COLLECT aggregate function was introduced in Oracle 10g and also enables us to populate a collection from a rowsource (although unlike MULTISET, we don't need to use a subquery). Since 11g Release 2, COLLECT also officially supports the ordering of the collection elements, as follows.

SQL> SELECT CAST(
  2            COLLECT(ename ORDER BY ename)
  3               AS varchar2_ntt ) AS ordered_emps
  4  FROM   emp;

ORDERED_EMPS
-----------------------------------------------------------------
VARCHAR2_NTT('ADAMS', 'ALLEN', 'BLAKE', 'CLARK', 'FORD', 'JAMES',
 'JONES', 'KING', 'MARTIN', 'MILLER', 'SCOTT', 'SMITH', 'TURNER',
 'WARD')

1 row selected.

Note the use of the word "officially" above. The COLLECT function has supported the ordering technique used in our example since its introduction in 10g Release 1, but Oracle has only documented this feature in 11g Release 2.

pre-sorting during a bulk collect in pl/sql

A third opportunity to pre-order collection elements is when using BULK COLLECT. With this technique, we can again take advantage of using an ORDER BY clause in the SQL that drives the fetch. In the following example, we will bulk fetch all of the employee names pre-sorted and simply print them out.

SQL> DECLARE
  2     v_enames varchar2_ntt;
  3  BEGIN
  4     SELECT ename BULK COLLECT INTO v_enames
  5     FROM   emp
  6     ORDER  BY
  7            ename;
  8     FOR i IN 1 .. v_enames.COUNT LOOP
  9        DBMS_OUTPUT.PUT_LINE(v_enames(i));
 10     END LOOP;
 11  END;
 12  /

ADAMS
ALLEN
BLAKE
CLARK
FORD
JAMES
JONES
KING
MARTIN
MILLER
SCOTT
SMITH
TURNER
WARD

PL/SQL procedure successfully completed.

Note that because BULK COLLECT is a PL/SQL construct, it supports associative arrays in addition to nested tables and VARRAYs.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, November 2009

oracle-developer.net