db2Dean  a.k.a. Dean Compher a.k.a ďThe Dean of DB2Ē is here to answer your DB2 Questions !

 

Utah Events

None currently Scheduled

 

Nevada Events

None currently Scheduled

 

Search db2Dean

 

 

Add your insights to the:

db2Dean and Friends

 

Get the Free Data Studio! 

Get DB2 Express-C Here!

A person wearing a red and black hair

Description automatically generated

dean@db2Dean.com

 

 

 

 

Online Events

Technical Conference Calls

 

Other Events

Think, San Francisco, May 4-7

   

 

Welcome to db2Deanís web site.  Iím Dean Compher an IBM IT Specialist who, along with my team, helps customers and prospective customers with DB2 on Linux, UNIX and Windows (LUW) technical questions and issues.  As this page makes painfully clear, I am a DBA and not a web designer, but I would be happy to get your DB2 questions answered or talk to you about the great features of DB2 or IBM Integration Products.  If you are looking at a new database solution or want to compare us to your existing database vendor, please do not hesitate to contact me about getting a presentation or just to ask questions.  My e-mail address is dean@db2Dean.com

 

I am located in Utah and primarily serve DB2/LUW and dashDB customers in the Western US, but I can forward requests to my peers in other technology and geographic areas as well.  My team also covers Big Data and Informix and products.  There are questions that I get on a regular basis, and I will write articles relating to them here.  I hope that you find them useful.  I also welcome suggestions for future content.  Click here for more information about me.

 

 

Favorite Links

 

Previous

Articles

 

Hot Links

 

Manuals

 

Info Center

 

Informix

 

DB2/zOS

 

DB2 i5/OS

 

DB2 Connect

 

Best Practices

 

 

 

 

 

 

 

 

 

 

 

External Tables

Dean Compher

27 November 2019

 

 

Db2 11.5 introduces a great new feature called external tables.  The feature allows you to make data in a file appear to be a table in your database, letting you query it like any other table.  It also allows you to copy data into a file if you insert into the external table.  These files can appear as any other table in your database or can be referenced like a table in a query without actually issuing a create table command against the file.  Querying the data in files has all sorts of uses.  Examples include a fast way to examine data in a file using SQL, easily loading all or parts of a file into a regular table, and exporting columns of tables to a file.  Db2 only checks the validity of data at the time when you query the data and not when you actually crate the table.  Rows with bad data are ignored, making it easy to access just the good rows.  The files can be local on your database server or in certain types of cloud storage like S3 storage.  This is great if your data is already in a supported cloud storage because you donít have to download the file to start using the data.  You just reference the cloud storage object in the CREATE TABLE command or query. 

 

The files under your external tables can be local on the Db2 server, on a different server, or in the cloud.  In this article Iíll just describe the local and cloud options.  Cloud files can be in S3, SWIFT and AZURE storage types.  The IBM Cloud supports both S3 and SWIFT.  

 

For now Iíll discuss the local files, but will talk about cloud later.  Db2 will only use local files in directories specified in the EXTBL_LOCATION database configuration parameter.  You can just place your files in that directory or any directory below it, or you can specify a list of directories and use any of them.  Again, you only need to specify a top-level directory.  Any subdirectories under the directories you specify can be created and used any time.  Here is the command I used to specify two directories:

 

update db cfg using EXTBL_LOCATION '/db2fs/external;/database/config/db2inst1'

 

Once youíve opened access to the directories you want, you can now start creating external tables on files in those directories.  Those files can be delimited or have fixed length fields.  In my examples I will use the gosales.order_header table that is used in many Db2 documents and is available for download.  In my case I extracted several rows from the table in a delimited format and then used that file for my purposes.

 

Example 1:  Simple external table

 

CREATE EXTERNAL TABLE gosales.ext_order_header LIKE gosales.order_header

       USING (dataobject '/db2fs/external/oh2.del' maxerrors 1000

              DELIMITER ',')

 

Notice the following about this example:

      Used the EXTERNAL keyword in the CREATE table command

      Used the LIKE key word to define the columns and data types of my external table using the values of an existing table.  I could have specified the columns and data types if I didnít already have a table formatted like my file.

      Specified the full path to my file called ďoh2.delĒ that Iím using for my table. 

      Specified 1000 for the maximum number of errors that can be encountered before a query operation quits.  The default is quitting on one error.  Make this number really big if you want to ignore all bad rows.

      My DELIMITER is a comma.  This is a comma delimited file.  You can specify any character you like.

      There are many other parameters you can specify to control the behavior of the external table

      Note that not all data types are currently supported.  The link in the preceding bullet enumerates the ones allowed.  Donít let this stop you though. For example, the XML data type is not supported, but you could have an external table with a column containing XML, but the column would need to be defined as a CLOB, VARCHAR, etc. 

      You can control access to the table using grants and revokes, roles, etc., just as you can with any other table.

 

The create external table command will work even if none of the records in the file meets the data types or lengths of the columns you defined in the create table command.  The validity of the data is checked each time you query the data.  Two files are written each time the data is queried.  A .log file shows the parameters used to create the external table along with statistics about the query, including the number of rows read, number of bad records, and the issue with each bad record.  A .bad file is also written with a copy of the invalid records.  This is pretty nice, since you can then easily see which rows are bad, edit them and then replace them in the original file. 

 

The files are created in one of the directories specified in the DB CFG EXTBL_LOCATION parameter. Here is what the file names looked like after I queried the ext_order_header external table created in Example 1 above.

 

SAMPLE.GOSALES.EXT_ORDER_HEADER.oh2.del.00000056.0000003.bad

SAMPLE.GOSALES.EXT_ORDER_HEADER.oh2.del.00000056.0000003.log

 

As I noted above, these files get created every time you query the table, so be careful about cleaning them up, especially if you have a lot of bad records.

 

The file I used in Example 1 had 100 records, of which 11 had bad data.  If there are bad rows in the file, but fewer than the maximum bad rows you specify when creating the table then applications will only see the good rows when you query the data.  To illustrate I ran the following query to count the rows in the table.  The number of rows returned was 89. 

 

select count(*) from gosales.ext_order_header

 

However, I also got a warning code ďSQL5108WĒ and the warning message indicated that there are 11 bad records in the file.  The summary statistics in the log file looked like this after running the above query.

 

Statistics

  number of records read:      100

  number of bytes read:        10896

  number of records skipped    0

  number of bad records        11

  -------------------------------------------------

  number of records loaded:    89

 

The log file also described the issue for each bad record.  I then recreated that table specifying a value of 7 for maxerrors to see what would happen when I exceeded the number of bad rows.   The create table worked fine, but when I ran the query to count rows, I got the following error code:  ďSQL20570NĒ with no data returned. 

 

Using external tables allows you to query data in a file quickly since you donít have to wait to load the file to start querying it.  Just create an external table pointing the file and you can start querying right away.  However, for extremely large files the queries may be slow because the files are not optimized as they would be if you actually loaded them into regular Db2 tables.  You should note that you can pretty easily load the data into a regular table from an external table.  Maybe you want this for performance or maybe you just like this method better than setting up load, import or ingest jobs.  In any case, here is one easy way to put the data into another table, possibly a column organized table for analytics queries.

 

Example 2.  LOAD from CURSOR into a regular table. 

 

declare c1 cursor for select * from gosales.ext_order_header

                         order by order_date

;

load from c1 of cursor messages /db2fs/external/loadcsr.msg

     replace resetdictionary into gosales.order_header

;

 

One nice thing about this is that in the cursor you can do lots of manipulation, call Db2 functions, exclude columns etc., to make the query look like the table being loaded.  Also if you are using the same data types in both tables, then the load should be clean, since bad records will be excluded by the query and not forwarded to the load utility.  In this example I just sorted the rows by order_date, but you can use very involved SQL if you like.  Further, instead of using load you could use insert with select and other commands if you prefer.  If you need to run the load from cursor from a client, you will use the ďadmin_cmdĒ procedure, but the file still needs to be on the database server or in a supported cloud store. 

 

Another thing I like about external table is the easy access to JSON data.  I like the external table interface for accessing JSON better than the insert/load/ingest utilities, and the data is available immediately after creating the external table instead of having to wait until a load is completed.  Here is an example of creating an external table and querying it as relational data.

 

Example 3.  External Table for JSON and Relational Query

 

CREATE EXTERNAL TABLE json.energy_json_ext

           (JSON_FIELD varchar(2000))

       USING (dataobject '/db2fs/external/july.json'

       maxerrors 1000 DELIMITER '|')

;

 

Notice the following about this example:

      JSON documents in a file are not delimited, so you need to specify a character as the delimiter that is NOT in the file.  In this case the pipe ď|Ē symbol is not in my file.  This forces the entire record to go into one column.  Each record (document) goes into a different row, though.

      Notice that I created the table with just one column, because we are forcing the feature to consider the entire record as a column. 

      You have to use one of the character data types for column such as VARCHAR or CLOB.  You can NOT use binary types like VARBINARY because the data is character.

 

Now you can use SQL to do just about anything with the JSON documents, including a load from cursor as shown above or query the data in the documents with SQL as I show in my Query JSON in Db2 article.  This is a great way to quickly query JSON data using any application like a report writer that uses SQL, although for some tools you may need to build a view on the JSON table as shown in my article. 

 

As I mentioned earlier, the files for the external tables can be on certain types of cloud storage.  This is handy for a number of reasons.  Many DBAís donít have access to place files on database servers, so this is a great workaround, but your database server does need to have access to the internet unless you use local cloud storage.  Another reason is convenience when your files are already on the cloud.  In that case you can just point your external tables at those files and have access to the data.   The CREATE EXTERNAL TABLE page in the knowledge center has lots of good information on how to create your tables using these various types of cloud storage.  Iíll show an example that is very similar to my Example 1 above, except that my file is on cloud storage instead of local. 

 

Example 4.  External Table on Cloud Storage

 

CREATE EXTERNAL TABLE gosales.ext_order_header_cloud LIKE gosales.order_header

   USING (dataobject 'oh2.del'

          s3('s3.us-south.cloud-object-storage.appdomain.cloud',

             'ca6086c5ac504f9eb6c8d45874b080z0',

             'a77c827e9f3767a8984a1a8e80db85a55c76db71a2936508',

             'cc-dean-tutorial')

          maxerrors 1000 DELIMITER ',')

;

 

When I ran a query against the table in Example 4, the .log and .bad files were created on the same object storage.  This is really good if you donít have access the database server.  However, it does mean that your database should have write access to the bucket/directory in the cloud storage. 

 

So far, we have created external tables that exist in the catalog until they are dropped.  This is frequently desirable, but in the case where you are only going to use the file once, then you can just reference the file as an external table in a query so that you donít have to bother with the create and drop statements.  This is called a transient external table.  One reason for doing this is where the file is strictly there for a load or single query. 

 

Example 5.  Selecting from a Transient External Table

 

select order_nbr, retailer_name from EXTERNAL '/db2fs/external/oh2.del'

  LIKE gosales.order_header

  USING (maxerrors 1000 DELIMITER ',')

  FETCH FIRST 5 ROWS ONLY

;

select order_nbr, retailer_name from EXTERNAL '/db2fs/external/oh2.del'

     (ORDER_NBR INT, RETAILER_NAME VARCHAR(150),

      RETAILER_NAME_MB VARCHAR(150), RETAILER_SITE_CODE INT,

      RETAILER_CONTACT_CODE INT, SALES_STAFF_CODE INT,

      SALES_BRANCH_CODE INT, ORDER_DATE TIMESTAMP,

      ORDER_CLOSE_DATE TIMESTAMP, ORDER_METHOD_CODE INT)

    USING (maxerrors 1000 DELIMITER ',')

    FETCH FIRST 5 ROWS ONLY

;

 

Notice the following about this example:

      Both queries do the same thing but being able to use the LIKE parameter saves a lot of typing if you have a corresponding table available. 

      I query the data in the file in a select without first creating the table. 

      In the second query, I had to define all fields in the file, even though I only selected the first two columns.  If you donít define all columns, the query will fail, and all records will be copied to the ďbadĒ file.  The column definitions syntax can be used with creating a named table too. 

      A similar syntax to either of these queries can be used with the load from cursor feature and other things that involve a query. 

      I used local storage again in this example, but cloud storage could have been used.

 

So far everything Iíve shown you relates to using data in a file you provide.  External tables also allow you to copy data from Db2 in just the format you want it.  Hereís an example:

 

Example 6.  Exporting data from Db2 into a file

 

CREATE EXTERNAL TABLE '/db2fs/external/employee_join.csv'

        USING (INCLUDEHEADER ON  DELIMITER ',')

   AS SELECT b.PROJNO, a.empno, a.firstnme, a.lastname, a.workdept

            FROM db2inst1.employee a, db2inst1.empprojact b

           WHERE a.empno = b.empno

             AND workdept like 'E%'

;

 

Notice the following about this example:

      Data from the query will be placed in a file called employee_join.csv and the file will be in the path shown.  This path must be in one of the directories specified in the EXTBL_LOCATION parameter.

      Column names are placed in the first record of the file since I specified INCLUDEHEADER ON.  The default is off.  There are many other formatting options so check the ones available on the CREATE EXTERNAL TABLE page. 

      This feature makes it easy for you get the output file in just about any format you like since you define it with a select.  You can put the columns in any order, join tables, use Db2 functions, use predicates to limit rows output, etc.

      In this case I used a transient external table, but I could have created a permanent table instead and then copied the rows desired.

      If I didnít have access to the Db2 server, I could have used a cloud storage system instead of the local file system. 

 

I think that the external table functionality opens up a remarkable number of possibilities.  For example, you could put the free Db2 Community Edition on a server with lots of delimited or JSON files, and with some create statements, you have JDBC/ODBC access to those files at no cost.  Alternatively, you can make lots of data available by building external tables on cloud storage and inserting the data into them that you want. 

 

***

 

Iím sure that there are many other things that you can do so please add anythoughts you have on the subject to my Facebook Page or my db2Dean and Friends Community and share your thoughts about them.

 

Top | Search

 

 

Db2Deanís

Other  Interests

 

Real Estate Investing

 

Quail at my Bird Feeder

 

My Googlegšngers

 

Fortune Cookie