sorting collections
Steven Feuerstein once wrote that having spent years persuading PL/SQL developers to modularise their code, he was moving on to encourage their use of collections. Indeed, collections have become more widely used by developers in SQL and PL/SQL code, possibly due to the bulk PL/SQL techniques introduced in 8i and associative array features of 9i. Recent Oracle releases have also added more features for working with collections, such as the multiset operations introduced in 10g. In fact, Oracle is now rich in features for working with collections, with one important exception: sorting arrays of data. This article demonstrates a small number of techniques that we can adopt for sorting our collections.
setup
All of the examples in this article will use a nested table type, because this can be used in both SQL and PL/SQL (unlike associative arrays which are PL/SQL-only). Where alternative collection types can be used (i.e. associative arrays or VARRAYs), this will be noted. Our collection type is defined as follows.
SQL> CREATE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000); 2 /
Type created.
To keep the initial examples short and simple, we will wrap a single small collection in a view, as follows.
SQL> CREATE VIEW v_collection 2 AS 3 SELECT varchar2_ntt( 4 'Bananas', 'Oranges', 'Apples', 'Toaster Ovens' 5 ) AS collection 6 FROM dual;
View created.
Our collection contains four unordered elements and it is returned to us in the same order when we query it from the view, as follows.
SQL> SELECT collection FROM v_collection;
COLLECTION ----------------------------------------------------------------- VARCHAR2_NTT('Bananas', 'Oranges', 'Apples', 'Toaster Ovens') 1 row selected.
We will now demonstrate some techniques for re-ordering the four elements in our sample collection below.
sorting pre-populated collections in sql
The easiest way to sort a pre-populated collection is to use SQL, as follows.
SQL> SELECT column_value 2 FROM v_collection 3 , TABLE(collection) 4 ORDER BY 5 column_value;
COLUMN_VALUE -------------------- Apples Bananas Oranges Toaster Ovens 4 rows selected.
This method is extremely simple. We have used the TABLE() operator to convert the collection to a rowsource, meaning of course that it can be ordered with an ORDER BY clause. The TABLE() operator has been available since Oracle 8i and works with nested tables and VARRAYs (but not with associative arrays).
If we want to retain the collection itself (albeit with sorted elements), we can wrap this SQL technique in a simple function, as follows.
SQL> CREATE FUNCTION sort_collection ( 2 p_collection IN varchar2_ntt 3 ) RETURN varchar2_ntt IS 4 5 v_collection varchar2_ntt; 6 7 BEGIN 8 9 SELECT column_value BULK COLLECT INTO v_collection 10 FROM TABLE(p_collection) 11 ORDER BY 12 column_value; 13 14 RETURN v_collection; 15 16 END sort_collection; 17 /
Function created.
In this collection sorter, we convert the original collection into an ordered rowsource with the TABLE() operator and an ORDER BY clause, then bulk fetch it into a collection of the same type for returning. We use it as follows.
SQL> SELECT SORT_COLLECTION(collection) AS sorted_collection 2 FROM v_collection;
SORTED_COLLECTION ----------------------------------------------------------------- VARCHAR2_NTT('Apples', 'Bananas', 'Oranges', 'Toaster Ovens') 1 row selected.
As to be expected, the elements of the collection are re-ordered alphabetically. Note that to keep the example simple, it only supports ascending sorts. For a link to a version of this example that supports ascending, descending and distinct sorts, see the further reading section at the end of this article.
sorting pre-populated collections with pl/sql
An alternative technique for sorting collections is to use a PL/SQL associative array to re-order the elements. To demonstrate this, we will create our function first and then describe the technique below.
SQL> CREATE FUNCTION sort_collection_plsql ( 2 p_collection IN varchar2_ntt 3 ) RETURN varchar2_ntt IS 4 5 TYPE sorter_aat IS TABLE OF PLS_INTEGER 6 INDEX BY VARCHAR2(4000); 7 8 v_collection varchar2_ntt := varchar2_ntt(); 9 v_sorter sorter_aat; 10 v_sorter_idx VARCHAR2(4000); 11 12 BEGIN 13 14 /* Sort the collection using the sorter array... */ 15 FOR i IN 1 .. p_collection.COUNT LOOP 16 v_sorter_idx := p_collection(i); 17 v_sorter(v_sorter_idx) := CASE 18 WHEN v_sorter.EXISTS(v_sorter_idx) 19 THEN v_sorter(v_sorter_idx) + 1 20 ELSE 1 21 END; 22 END LOOP; 23 24 /* Assign sorted elements back to collection... */ 25 v_sorter_idx := v_sorter.FIRST; 26 WHILE v_sorter_idx IS NOT NULL LOOP 27 28 /* Handle multiple copies of same value... */ 29 FOR i IN 1 .. v_sorter(v_sorter_idx) LOOP 30 v_collection.EXTEND; 31 v_collection(v_collection.LAST) := v_sorter_idx; 32 END LOOP; 33 34 v_sorter_idx := v_sorter.NEXT(v_sorter_idx); 35 36 END LOOP; 37 38 RETURN v_collection; 39 40 END sort_collection_plsql; 41 /
Function created.
Some notes on the techniques used are as follows:
- Lines 5-6: we use a string-indexed associative array to sort our collection and keep a log of the number of times each collection element occurs. The associative array index allows for collection elements of VARCHAR2(4000) and is of course ordered by definition;
- Lines 15-22: we loop through our collection (this example assumes that it is densely packed) and assign each element to the index of the associative array. We also increment a count for the number of times each collection element appears (i.e. to handle duplicate values);
- Lines 25-36: once our collection has been copied to the associative array, we can build our sorted return collection. We do this by cycling through the associative array in index order. Each index value (remember that these are our original collection values) is assigned to our return collection the same number of times it originally occurred;
- Line 38: we return a sorted copy of our original collection.
We use our PL/SQL-based sort function in the same way as our original SQL-based function, as follows.
SQL> SELECT SORT_COLLECTION_PLSQL(collection) AS sorted_collection 2 FROM v_collection;
SORTED_COLLECTION ----------------------------------------------------------------- VARCHAR2_NTT('Apples', 'Bananas', 'Oranges', 'Toaster Ovens') 1 row selected.
Note that this technique also supports the sorting and return of associative arrays, in addition to nested tables and VARRAYs. If the sorted collection is never to be referenced or used in SQL, then an associative array can be used in place of a nested table type. It is only when using the collection in SQL that we require a SQL type such as a nested table.
handling repeating elements
Note that our PL/SQL implementation explicitly handles repeating or duplicate collection elements. The SQL implementation handles this naturally, of course, which is one reason why our first sorter function is much more simple to code than the second. We can test that both functions handle repeating elements by passing two copies of the same collection to be sorted. We will use the MULTISET UNION collection operation for this (this appends one collection to another).
We will begin by testing our SQL implementation, as follows.
SQL> SELECT sort_collection( 2 collection MULTISET UNION collection 3 ) AS sorted_collection 4 FROM v_collection;
SORTED_COLLECTION ----------------------------------------------------------------- VARCHAR2_NTT('Apples', 'Apples', 'Bananas', 'Bananas', 'Oranges', 'Oranges', 'Toaster Ovens', 'Toaster Ovens') 1 row selected.
We can see that the collection is sorted and each element appears twice. We will repeat the test with our PL/SQL implementation, as follows.
SQL> SELECT sort_collection_plsql( 2 collection MULTISET UNION collection 3 ) AS sorted_collection 4 FROM v_collection;
SORTED_COLLECTION ----------------------------------------------------------------- VARCHAR2_NTT('Apples', 'Apples', 'Bananas', 'Bananas', 'Oranges', 'Oranges', 'Toaster Ovens', 'Toaster Ovens') 1 row selected.
Again, our collection is sorted and repeating elements are handled correctly.
performance considerations
We have two implementations of a collection sorting function, but which should we use? We will run two simple performance tests to determine which is the most efficient to use. First we will compare many sorts of a tiny collection and second we will compare a small number of sorts for a large collection. We will use a version of Tom Kyte's RUNSTATS utility to compare the resources and time used by each method.
Before we begin our tests, however, we will recompile our functions to use native compilation. This will make them as efficient as possible (although we would expect this to have a greater positive impact on our PL/SQL-based function).
SQL> ALTER FUNCTION sort_collection COMPILE PLSQL_CODE_TYPE = NATIVE;
Function altered.
SQL> ALTER FUNCTION sort_collection_plsql COMPILE PLSQL_CODE_TYPE = NATIVE;
Function altered.
SQL> SELECT name 2 , plsql_code_type 3 FROM user_plsql_object_settings 4 WHERE name LIKE 'SORT_COLLECTION%';
NAME PLSQL_CODE_TYPE ------------------------------ -------------------- SORT_COLLECTION NATIVE SORT_COLLECTION_PLSQL NATIVE 2 rows selected.
performance test one: sorting small collections
We will first compare 10,000 sorts of a tiny collection, as follows.
SQL> DECLARE 2 v_small_collection varchar2_ntt := varchar2_ntt(); 3 v_sorted_collection varchar2_ntt := varchar2_ntt(); 4 v_iterations PLS_INTEGER := 10000; 5 BEGIN 6 7 /* Assign small collection... */ 8 v_small_collection := varchar2_ntt('B','A','D','C'); 9 10 runstats_pkg.rs_start(); 11 12 /* Run 1: sort in SQL... */ 13 FOR i IN 1 .. v_iterations LOOP 14 v_sorted_collection := sort_collection(v_small_collection); 15 END LOOP; 16 17 runstats_pkg.rs_middle(); 18 19 /* Run 2: sort in PL/SQL... */ 20 FOR i IN 1 .. v_iterations LOOP 21 v_sorted_collection := sort_collection_plsql(v_small_collection); 22 END LOOP; 23 24 runstats_pkg.rs_stop(100); 25 26 END; 27 /
================================================================================ Runstats report : 20-NOV-2009 19:18:51 ================================================================================ -------------------------------------------------------------------------------- 1. Summary timings -------------------------------------------------------------------------------- Run1 ran in 243 hsecs Run2 ran in 21 hsecs Run2 was 91.4% quicker than Run1 -------------------------------------------------------------------------------- 2. Statistics report -------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ----------------------------------- ------------ ------------ ------------ STAT recursive cpu usage 192 2 -190 STAT CPU used by this session 221 19 -202 STAT execute count 10,001 1 -10,000 STAT opened cursors cumulative 10,001 1 -10,000 STAT recursive calls 11,144 1,144 -10,000 STAT session cursor cache hits 10,001 1 -10,000 STAT sorts (memory) 10,000 0 -10,000 STAT workarea executions - optimal 10,001 1 -10,000 LATCH shared pool 10,003 1 -10,002 STAT index fetch by key 20,000 0 -20,000 STAT rows fetched via callback 20,000 0 -20,000 STAT table fetch by rowid 20,000 0 -20,000 STAT calls to get snapshot scn: kcmgss 30,001 1 -30,000 STAT buffer is not pinned count 40,000 0 -40,000 STAT sorts (rows) 40,000 0 -40,000 STAT consistent gets 60,000 0 -60,000 STAT consistent gets - examination 60,000 0 -60,000 STAT consistent gets from cache 60,000 0 -60,000 STAT session logical reads 60,000 0 -60,000 LATCH cache buffers chains 60,005 1 -60,004 STAT session uga memory 0 246,880 246,880 STAT session pga memory 0 262,144 262,144 -------------------------------------------------------------------------------- 3. Latching report -------------------------------------------------------------------------------- Run1 used 70,405 latches Run2 used 271 latches Run2 used 99.6% fewer latches than Run1 ================================================================================ End of report ================================================================================ PL/SQL procedure successfully completed.
As we can see, the time-spans for this test are small, but the PL/SQL method took one-tenth of the time of the SQL method. The report shows the higher LIOs and latching that the SQL method incurred (but also demonstrates that all sorts took place in memory). To sort small collections, therefore, it appears that the PL/SQL method is the most efficient.
performance test two: sorting large collections
For our second test, we will run fewer sorts, but with a much larger collection. We will populate a collection with the object names from ALL_OBJECTS (this gives us approximately 71,000 elements on this test 11.2 database). Our comparison is as follows.
SQL> DECLARE 2 v_large_collection varchar2_ntt := varchar2_ntt(); 3 v_sorted_collection varchar2_ntt := varchar2_ntt(); 4 v_iterations PLS_INTEGER := 100; 5 BEGIN 6 7 /* Assign large collection... */ 8 SELECT object_name BULK COLLECT INTO v_large_collection 9 FROM all_objects; 10 11 runstats_pkg.rs_start; 12 13 /* Run 1: sort in SQL... */ 14 FOR i IN 1 .. v_iterations LOOP 15 v_sorted_collection := sort_collection(v_large_collection); 16 END LOOP; 17 18 runstats_pkg.rs_middle; 19 20 /* Run 2: sort in PL/SQL... */ 21 FOR i IN 1 .. v_iterations LOOP 22 v_sorted_collection := sort_collection_plsql(v_large_collection); 23 END LOOP; 24 25 runstats_pkg.rs_stop(100); 26 27 END; 28 /
================================================================================ Runstats report : 20-NOV-2009 19:21:04 ================================================================================ -------------------------------------------------------------------------------- 1. Summary timings -------------------------------------------------------------------------------- Run1 ran in 1321 hsecs Run2 ran in 3209 hsecs Run1 was 58.8% quicker than Run2 -------------------------------------------------------------------------------- 2. Statistics report -------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ----------------------------------- ------------ ------------ ------------ STAT execute count 101 1 -100 STAT opened cursors cumulative 101 1 -100 STAT recursive calls 1,244 1,144 -100 STAT session cursor cache hits 101 1 -100 STAT sorts (memory) 100 0 -100 LATCH channel operations parent latch 65 187 122 LATCH checkpoint queue latch 81 226 145 LATCH JS queue state obj latch 72 252 180 LATCH messages 108 289 181 STAT index fetch by key 200 0 -200 STAT rows fetched via callback 200 0 -200 STAT table fetch by rowid 200 0 -200 STAT workarea executions - optimal 201 1 -200 LATCH SQL memory manager workarea list lat 471 740 269 STAT calls to get snapshot scn: kcmgss 301 1 -300 LATCH row cache objects 31 419 388 STAT buffer is not pinned count 400 0 -400 LATCH enqueues 160 570 410 LATCH enqueue hash chains 163 598 435 STAT consistent gets 600 0 -600 STAT consistent gets - examination 600 0 -600 STAT consistent gets from cache 600 0 -600 STAT session logical reads 600 0 -600 STAT recursive cpu usage 1,126 2 -1,124 STAT CPU used by this session 1,202 2,883 1,681 STAT session uga memory max 65,512 0 -65,512 STAT session pga memory max 65,536 0 -65,536 STAT session uga memory 0 246,880 246,880 STAT sorts (rows) 7,149,800 0 -7,149,800 STAT session pga memory -9,240,576 2,686,976 11,927,552 -------------------------------------------------------------------------------- 3. Latching report -------------------------------------------------------------------------------- Run1 used 2,117 latches Run2 used 4,816 latches Run1 used 56% fewer latches than Run2 ================================================================================ End of report ================================================================================ PL/SQL procedure successfully completed.
This time the results are very different. The SQL method is far more efficient at sorting large collections and is over twice as fast as the PL/SQL-based method. The statistics report looks slightly strange in places (particularly the memory statistics) but we can conclude that we should use the SQL technique for sorting larger collections.
sort during fetch
The main focus of this article is sorting pre-populated collections. However, we can also generate pre-sorted collections using various techniques, as described below.
pre-sorting with multiset
The MULTISET function has been available since Oracle 8 and enables us to populate a collection from a SQL query. In the following example, we will load the ENAME values from the EMP table into a sorted collection.
SQL> SELECT CAST( 2 MULTISET( 3 SELECT ename FROM emp ORDER BY ename 4 ) AS varchar2_ntt ) AS ordered_emps 5 FROM dual;
ORDERED_EMPS ----------------------------------------------------------------- VARCHAR2_NTT('ADAMS', 'ALLEN', 'BLAKE', 'CLARK', 'FORD', 'JAMES', 'JONES', 'KING', 'MARTIN', 'MILLER', 'SCOTT', 'SMITH', 'TURNER', 'WARD') 1 row selected.
pre-sorting with the collect function
The COLLECT aggregate function was introduced in Oracle 10g and also enables us to populate a collection from a rowsource (although unlike MULTISET, we don't need to use a subquery). Since 11g Release 2, COLLECT also officially supports the ordering of the collection elements, as follows.
SQL> SELECT CAST( 2 COLLECT(ename ORDER BY ename) 3 AS varchar2_ntt ) AS ordered_emps 4 FROM emp;
ORDERED_EMPS ----------------------------------------------------------------- VARCHAR2_NTT('ADAMS', 'ALLEN', 'BLAKE', 'CLARK', 'FORD', 'JAMES', 'JONES', 'KING', 'MARTIN', 'MILLER', 'SCOTT', 'SMITH', 'TURNER', 'WARD') 1 row selected.
Note the use of the word "officially" above. The COLLECT function has supported the ordering technique used in our example since its introduction in 10g Release 1, but Oracle has only documented this feature in 11g Release 2.
pre-sorting during a bulk collect in pl/sql
A third opportunity to pre-order collection elements is when using BULK COLLECT. With this technique, we can again take advantage of using an ORDER BY clause in the SQL that drives the fetch. In the following example, we will bulk fetch all of the employee names pre-sorted and simply print them out.
SQL> DECLARE 2 v_enames varchar2_ntt; 3 BEGIN 4 SELECT ename BULK COLLECT INTO v_enames 5 FROM emp 6 ORDER BY 7 ename; 8 FOR i IN 1 .. v_enames.COUNT LOOP 9 DBMS_OUTPUT.PUT_LINE(v_enames(i)); 10 END LOOP; 11 END; 12 /
ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD PL/SQL procedure successfully completed.
Note that because BULK COLLECT is a PL/SQL construct, it supports associative arrays in addition to nested tables and VARRAYs.
further reading
A package for sorting collections, with support for descending and distinct sorts, is available as an oracle-developer.net utility. The version of RUNSTATS used in the performance examples is available here. For more information on the COLLECT function and its use, see this oracle-developer.net article. The 11g Release 2 new features for COLLECT are discussed in this oracle-developer.net article.
source code
The source code for the examples in this article can be downloaded from here.
Adrian Billington, November 2009
Back to Top