listing files with the external table preprocessor in 11g
The external table preprocessor was introduced in 11g Release 1 (patchset 11.1.0.7) and formally documented in 11g Release 2. Briefly, the preprocessor enables us to define an executable or shell/batch script that can be used to generate a rowsource for the external table as it is queried. The most obvious use for such a feature is to enable compressed files to be queried directly from external tables and this is described in another oracle-developer.net article (see references below).
Another obvious use for the preprocessor is to solve the common problem of how to list files in directories from within Oracle (i.e. using SQL or PL/SQL). There is no documented built-in method for doing this in Oracle and techniques described on the web include Java stored procedures or an undocumented DBMS_BACKUP_RESTORE procedure. With the introduction of the preprocessor, however, we will demonstrate how directory listings can be achieved with an external table and a preprocessor batch/shell script.
background reading
Developers who are not familiar with the new preprocessor can read this oracle-developer.net article on new features for external tables in 11g, which describes the feature in some detail and how it can be used to query compressed files.
a simple file listing example
To begin, we will create a simple example of a preprocessor directory listing and for this we require the following components:
- a read-write Oracle directory (for the external table);
- an executable Oracle directory (for the preprocessor);
- an external table;
- a shell/batch script;
- a dummy text file (for our table location).
As a SYSDBA, we will create the Oracle directories and grant the relevant privileges to SCOTT, as follows.
SQL> conn / as sysdba
Connected.
SQL> CREATE DIRECTORY xt_dir AS 'd:\oracle\dir\xt_dir';
Directory created.
SQL> GRANT READ, WRITE ON DIRECTORY xt_dir TO scott;
Grant succeeded.
SQL> CREATE DIRECTORY bin_dir AS 'd:\oracle\dir\bin_dir';
Directory created.
SQL> GRANT EXECUTE ON DIRECTORY bin_dir TO scott;
Grant succeeded.
We will now create the external table to read a standard DOS directory listing, as follows.
SQL> conn scott/tiger
Connected.
SQL> CREATE TABLE files_xt 2 ( file_date VARCHAR2(50) 3 , file_time VARCHAR2(50) 4 , file_size VARCHAR2(50) 5 , file_name VARCHAR2(255) 6 ) 7 ORGANIZATION EXTERNAL 8 ( 9 TYPE ORACLE_LOADER 10 DEFAULT DIRECTORY xt_dir 11 ACCESS PARAMETERS 12 ( 13 RECORDS DELIMITED BY NEWLINE 14 LOAD WHEN file_size != '<DIR>' 15 PREPROCESSOR bin_dir: 'list_files.bat' 16 FIELDS TERMINATED BY WHITESPACE 17 ) 18 LOCATION ('sticky.txt') 19 ) 20 REJECT LIMIT UNLIMITED;
Table created.
Note the following points about this external table:
- Line 14: we use the LOAD WHEN clause to exclude directories from our listings;
- Line 15: our preprocessor command simply calls a batch script named
list_files.bat
in the BIN_DIR; - Line 18: we need a location clause, even though we are executing a script and not reading a file. This file must exist (even if it is not used) and cannot be a directory. Note that the file name given for the location can be read by the preprocessor batch script, so we will exploit this feature later on in this article to build a dynamic file listing. For now, however, we have no interest in the
sticky.txt
file other than its presence.
We now need to create two files; namely the preprocessor script (i.e. list_files.bat
) and the location file (i.e. sticky.txt
). As this is a Windows system, the preprocessor script needs to be a batch script. In our case, list_files.bat
contains the following simple code:
@echo off dir d:\oracle\diag\rdbms\ora11\ora11\trace
The first line is mandatory for batch scripts and the script itself must be located in the executable Oracle directory (in our case BIN_DIR). Note that for now we have hard-coded the directory we wish to query (i.e. the user dump destination for this instance).
Moving on, we can now create our sticky.txt
location file in our Oracle read-write directory (XT_DIR), as follows.
SQL> host echo > d:\oracle\dir\xt_dir\sticky.txt
For this example, the file doesn't need to contain anything. It simply has to exist. We will now query our preprocessor external table to get a file listing for the 11g trace directory that we coded in the list_files.bat
script, as follows.
SQL> SELECT * FROM files_xt WHERE ROWNUM <= 5;
FILE_DATE FILE_TIME FILE_SIZE FILE_NAME ------------ ------------ ------------ ---------------------- Volume in drive D Volume Serial Number is 14/10/2009 22:39 618,508 alert_ora11.log 14/10/2009 22:38 1,474 ora11_cjq0_5148.trc 14/10/2009 22:38 175 ora11_cjq0_5148.trm 5 rows selected.
We can see our directory listing at work, but the data requires some cleanup. The easiest way to do this is with a view, to filter and format the data we require in a consistent manner, as follows.
SQL> CREATE VIEW files_vxt 2 AS 3 SELECT file_name 4 , TO_DATE( 5 file_date||','||file_time, 6 'DD/MM/YYYY HH24:MI') AS file_time 7 , TO_NUMBER( 8 file_size, 9 'fm999,999,999,999') AS file_size 10 FROM files_xt 11 WHERE REGEXP_LIKE( 12 file_date, 13 '[0-9]{2}/[0-9]{2}/[0-9]{4}');
View created.
We can now query our file listing more cleanly, as follows.
SQL> col file_time format a20 SQL> col file_name format a30 SQL> col file_size format 999,999,999 SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ alert_ora11.log 14/10/2009 22:39:00 618,508 ora11_cjq0_5148.trc 14/10/2009 22:38:00 1,474 ora11_cjq0_5148.trm 14/10/2009 22:38:00 175 ora11_dbrm_4016.trc 14/10/2009 23:20:00 20,184 ora11_dbrm_4016.trm 14/10/2009 23:20:00 991 5 rows selected.
Of course, because we are using SQL to access our file listing, we can use wildcard searches, date ordering and so on, as the following example demonstrates.
SQL> SELECT * 2 FROM files_vxt 3 WHERE file_name LIKE '%.trc' 4 ORDER BY 5 file_time DESC;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ ora11_dbrm_4016.trc 14/10/2009 22:51:00 18,385 ora11_cjq0_5148.trc 14/10/2009 22:38:00 1,474 ora11_lgwr_2444.trc 14/10/2009 22:37:00 1,312 ora11_smon_2040.trc 14/10/2009 22:37:00 1,133 ora11_mmnl_3760.trc 14/10/2009 22:37:00 1,478 ora11_reco_5792.trc 14/10/2009 22:37:00 1,222 ora11_fbda_3052.trc 13/10/2009 17:41:00 1,167 ora11_w000_4796.trc 13/10/2009 06:52:00 905 8 rows selected.
dynamic file listing with preprocessor scripts
The small batch script example above uses a pre-defined directory listing (i.e. it only lists the 11g trace directory files). One of the features of the batch or shell script that we use is that it can read the path and name of the file used in the external table location clause. This means that it is possible to make the directory listing more dynamic by putting the paths to the directory or directories we wish to list inside the external table location file.
To demonstrate this, we will re-create our FILES_XT external table, using a new batch script and location file, as follows.
SQL> DROP TABLE files_xt;
Table dropped.
SQL> CREATE TABLE files_xt 2 ( file_date VARCHAR2(50) 3 , file_time VARCHAR2(50) 4 , file_size VARCHAR2(50) 5 , file_name VARCHAR2(255) 6 ) 7 ORGANIZATION EXTERNAL 8 ( 9 TYPE ORACLE_LOADER 10 DEFAULT DIRECTORY xt_dir 11 ACCESS PARAMETERS 12 ( 13 RECORDS DELIMITED by NEWLINE 14 LOAD WHEN file_size != '<DIR>' 15 PREPROCESSOR bin_dir: 'dynamic_list_files.bat' 16 FIELDS TERMINATED BY WHITESPACE 17 ) 18 LOCATION ('directories.txt') 19 ) 20 REJECT LIMIT UNLIMITED;
Table created.
This time we have a batch script in our BIN_DIR named dynamic_list_files.bat
and a location file called directories.txt
. The batch file contents are quite simple, as follows.
@echo off for /F %%X in (%1) do dir %%X
This small script reads the external table location file (%1
) and each line in the file (/F
) is assigned to a variable (%%X
) and used as the target for a directory listing (dir %%X
). Each line in the location file therefore needs to be a valid directory path. With this method, there is no hard-coding of paths required in the preprocessor script and we can change the contents of the location file to list different directories as and when required.
To test this, we will add the 11g trace file path to the directories.txt
file and run a query over our external table, as follows.
SQL> host echo d:\oracle\diag\rdbms\ora11\ora11\trace > d:\oracle\dir\xt_dir\directories.txt SQL> SELECT * 2 FROM files_vxt 3 WHERE file_name LIKE '%.trc' 4 ORDER BY 5 file_time DESC;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ ora11_dbrm_4016.trc 14/10/2009 22:59:00 18,899 ora11_cjq0_5148.trc 14/10/2009 22:38:00 1,474 ora11_lgwr_2444.trc 14/10/2009 22:37:00 1,312 ora11_smon_2040.trc 14/10/2009 22:37:00 1,133 ora11_mmnl_3760.trc 14/10/2009 22:37:00 1,478 ora11_reco_5792.trc 14/10/2009 22:37:00 1,222 ora11_fbda_3052.trc 13/10/2009 17:41:00 1,167 ora11_w000_4796.trc 13/10/2009 06:52:00 905 8 rows selected.
As described above, the batch script is able to read inside the directories.txt
file and pass the contents (i.e. a path) to the dir command. If we wish, we can list multiple paths in the location file. In the following example, we will list the files in two different user dump destinations. First we will add the paths to the location file, as follows.
SQL> host echo d:\oracle\admin\ora92\udump > d:\oracle\dir\xt_dir\directories.txt SQL> host echo d:\oracle\admin\ora102\udump >> d:\oracle\dir\xt_dir\directories.txt
We are now able to list a sample of 9i and 10g trace files from the same external table, below.
SQL> SELECT * 2 FROM files_vxt 3 WHERE file_name LIKE '%.trc' 4 AND ROWNUM <= 10;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ ora92_ora_3164.trc 29/04/2009 18:32:00 448,045,831 ora92_ora_3340_bv.trc 19/10/2007 18:14:00 9,802 ora92_ora_4364_sctx.trc 19/10/2007 18:23:00 10,536 ora92_ora_4532_lit.trc 19/10/2007 18:14:00 10,799 ora92_ora_4708_bvcs.trc 19/10/2007 18:28:00 10,513 ora92_ora_5156.trc 28/04/2009 18:00:00 1,784 ora102_ora_1344_x2.trc 18/10/2007 19:14:00 36,239 ora102_ora_1544.trc 17/03/2009 23:04:00 934 ora102_ora_1896.trc 13/03/2009 13:19:00 608 ora102_ora_2272.trc 17/11/2008 07:47:00 591 10 rows selected.
We can clearly see trace files from the two different paths we included in our directories.txt
location file.
a more secure dynamic method
Being able to dynamically inject any path into the directories.txt
location file has security implications. As an alternative, readers might like to consider the following, more secure, method. Rather than a single "injectable" location file for the FILES_XT external table, we can setup individual location files that each contain one hard-coded directory path. These files can be placed in a readonly directory and whenever a directory needs to be listed, the external table's location can be changed to the corresponding location file.
To demonstrate this, we will begin by creating a readonly Oracle directory as follows.
SQL> conn / as sysdba
Connected.
SQL> CREATE DIRECTORY readonly_dir AS 'd:\oracle\dir\readonly_dir';
Directory created.
SQL> GRANT READ ON DIRECTORY readonly_dir TO scott;
Grant succeeded.
We will now add three location files to the READONLY_DIR directory, one for each of the user dump destinations used in the previous examples. Each file contains the path to the directory indicated by the filename.
SQL> host echo d:\oracle\admin\ora92\udump > d:\oracle\dir\readonly_dir\ora92_udump.txt SQL> host echo d:\oracle\admin\ora102\udump > d:\oracle\dir\readonly_dir\ora102_udump.txt SQL> host echo d:\oracle\diag\rdbms\ora11\ora11\trace > d:\oracle\dir\readonly_dir\ora11_udump.txt
We will now modify our FILES_XT table to set the default directory to the READONLY_DIR, as follows.
SQL> conn scott/tiger
Connected.
SQL> ALTER TABLE files_xt DEFAULT DIRECTORY readonly_dir;
Table altered.
We are now ready to use the more secure file listing method. We will begin by listing the files in the ORA92 user dump destination, as follows.
SQL> ALTER TABLE files_xt LOCATION ('ora92_udump.txt');
Table altered.
SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ ora92_ora_3164.trc 29/04/2009 18:32:00 448,045,831 ora92_ora_3340_bv.trc 19/10/2007 18:14:00 9,802 ora92_ora_4364_sctx.trc 19/10/2007 18:23:00 10,536 ora92_ora_4532_lit.trc 19/10/2007 18:14:00 10,799 ora92_ora_4708_bvcs.trc 19/10/2007 18:28:00 10,513 5 rows selected.
To query the ORA102 and ORA11 destinations, we simply change the location of the external table to the corresponding (and more importantly, controlled) location files, as follows.
SQL> ALTER TABLE files_xt LOCATION ('ora102_udump.txt');
Table altered.
SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ ora102_ora_1344_x2.trc 18/10/2007 19:14:00 36,239 ora102_ora_1544.trc 17/03/2009 23:04:00 934 ora102_ora_1896.trc 13/03/2009 13:19:00 608 ora102_ora_2272.trc 17/11/2008 07:47:00 591 ora102_ora_2564.trc 20/08/2008 07:39:00 591 5 rows selected.
SQL> ALTER TABLE files_xt LOCATION ('ora11_udump.txt');
Table altered.
SQL> SELECT * FROM files_vxt WHERE ROWNUM <= 5;
FILE_NAME FILE_TIME FILE_SIZE ------------------------------ -------------------- ------------ alert_ora11.log 14/10/2009 22:39:00 618,508 ora11_cjq0_5148.trc 14/10/2009 22:38:00 1,474 ora11_cjq0_5148.trm 14/10/2009 22:38:00 175 ora11_dbrm_4016.trc 14/10/2009 23:25:00 20,441 ora11_dbrm_4016.trm 14/10/2009 23:25:00 1,003 5 rows selected.
This method is equally as flexible from an application's perspective, but it gives the DBA more control over which directories are available for listing and who can use the utility. Be aware, however, that if the application has access to any read-write directories, it is still possible for the external table's default directory to be altered and the less secure method used instead. If this is the case, it might be more secure to create a dedicated external table for each directory that needs to be listed and ensure that the application is unable to alter the external tables' directories or locations.
using a pipelined function for dynamic file listing
Finally, we will take the dynamic file listing to a logical conclusion and create a pipelined function to perform the actions in the previous example. For this, we will need to create some types to describe our pipelined function's output data, as follows.
SQL> CREATE TYPE file_listing_ot AS OBJECT 2 ( file_path VARCHAR(1000) 3 , file_name VARCHAR2(100) 4 , file_time DATE 5 , file_size NUMBER 6 ); 7 /
Type created.
SQL> CREATE TYPE file_listing_ntt AS TABLE OF file_listing_ot; 2 /
Type created.
As we can see, the function will return the basic file listing information, together with the path of the directory being listed. We will now create a simple pipelined function to list the files, as follows.
SQL> CREATE FUNCTION list_files( 2 p_directory_file IN VARCHAR2 3 ) RETURN file_listing_ntt 4 AUTHID CURRENT_USER 5 PIPELINED IS 6 7 v_path VARCHAR2(1000); 8 9 ---------------------------------------------------------- 10 PROCEDURE get_path( p_file IN VARCHAR2, 11 p_path OUT VARCHAR2 ) IS 12 v_file UTL_FILE.FILE_TYPE; 13 BEGIN 14 v_file := UTL_FILE.FOPEN('READONLY_DIR', p_file, 'r'); 15 UTL_FILE.GET_LINE(v_file, p_path); 16 UTL_FILE.FCLOSE(v_file); 17 END get_path; 18 19 ---------------------------------------------------------- 20 PROCEDURE set_location( p_file IN VARCHAR2 ) IS 21 PRAGMA AUTONOMOUS_TRANSACTION; 22 BEGIN 23 EXECUTE IMMEDIATE 24 'ALTER TABLE files_xt LOCATION (''' || p_file || ''')'; 25 END set_location; 26 27 BEGIN 28 29 /* Read the path from the directory file... */ 30 get_path( p_directory_file, v_path ); 31 32 /* Prepare the external table... */ 33 set_location( p_directory_file ); 34 35 /* Read the file listing... */ 36 FOR r_files IN (SELECT * FROM files_vxt) LOOP 37 PIPE ROW ( file_listing_ot( v_path, 38 r_files.file_name, 39 r_files.file_time, 40 r_files.file_size )); 41 END LOOP; 42 RETURN; 43 END list_files; 44 /
Function created.
This function wraps the previous interactive example and performs three simple tasks:
- Line 30 (calling lines 10-17): first, it reads the path of the listed directory from the static location file;
- Line 33 (calling lines 20-25): second, it sets the location of the FILES_XT to the static location file parameter;
- Lines 36-41: third, it queries the FILES_VXT view (that sits over the FILES_XT table) to output the directory listing.
We will now query the pipelined function to get a file listing of the user dump destination for the ORA92 instance on this system.
SQL> SELECT * 2 FROM TABLE(list_files('ora92_udump.txt')) 3 ORDER BY 4 file_size DESC;
FILE_PATH FILE_NAME FILE_TIME FILE_SIZE ------------------------------ ------------------------- -------------------- ------------ d:\oracle\admin\ora92\udump ora92_ora_3164.trc 29/04/2009 18:32:00 448,045,831 d:\oracle\admin\ora92\udump ora92_ora_4532_lit.trc 19/10/2007 18:14:00 10,799 d:\oracle\admin\ora92\udump ora92_ora_4364_sctx.trc 19/10/2007 18:23:00 10,536 d:\oracle\admin\ora92\udump ora92_ora_4708_bvcs.trc 19/10/2007 18:28:00 10,513 d:\oracle\admin\ora92\udump ora92_ora_3340_bv.trc 19/10/2007 18:14:00 9,802 d:\oracle\admin\ora92\udump ora92_ora_5156.trc 28/04/2009 18:00:00 1,784 6 rows selected.
We can see that this is a simple and relatively secure method for listing directory files with a preprocessor. To make this more robust, we would need to add some exception handling and an application lock around the FILES_XT access. We could also pass in a collection of directory files and have multiple locations set and listed in one query, but enhancements like this can be an exercise for the reader!
further reading
For more information on the external table preprocessor, read the ORACLE_LOADER Access Driver documentation for 11g Release 2.
source code
The source code for the examples in this article can be downloaded from here.
Adrian Billington, October 2009
Back to Top