Allan’s Musings

Just another WordPress.com weblog

Archive for the ‘PL/SQL’ Category

Oracle’s server side programming enhancements to SQL.

On Extracting Transforming and Loading

Posted by Allan Peda on May 19, 2009

ETL

Tricky stuff this ETL. One might think it as simple as copying data from one spot to another, but as usual things can get complicated real fast.

Some questions that may come up:

  1. Who is upstream? Who is downstream? In the parlance of project management we might want to confer closely with both these shareholders.
  2. How often to the data feeds coming in? Are these feeds being pushed, pulled, or some combination thereof?
  3. Are these deltas to existing data, or are these simple additions to a data set?
  4. How large are the feeds?
  5. What if we miss a feed, as might happen during an extended network outage?
  6. Is one “feed” really multiple files which must be processed at one time? If so, is there a particular order they must be processed in?
  7. How often do the downstream users expect the data to materialize? How often is it generated?
  8. What type of processing is necessary to “digest” the data so that it is usable?
  9. Are there issues with auditing and data integrity? Do we need to always be able to trace data back to the source?

With these questions as a starting point we can identify possible issues and hopefully we can head off possible problems.

What are the tools we have at our disposal on a typical *nix system in a typical big firm?

  1. FTP, try as you might to avoid it, you will encounter old insecure FTP.
  2. Oracle SQL *Loader
  3. Perl and it’s newfangled relative Ruby, in conjunction with some variant of DBI.
  4. Unix cron and the fancier asynchronous utilities such as fsniper
  5. Oracle’s internal scheduler, which has replaced the older DBMS_JOB scheduling mechanism.

The Nitty Gritty: Files

On the face of it, a flat tab delimited file would seem to have little information in it beyond the contents and the filename, but if you reflect in it, every time that file is transferred across business unit boundaries, there is some (potential) data that someone may want to know. The following questions may need to be answered:

  1. When was the file first generated?
  2. Can we be sure it was not tampered with?
  3. How large is it?
  4. What time period does this report represent? Is the generation time to be assumed the same as the reporting time, or is there a delay? For example are reports for the previous year generated at the start of the next year?
  5. Again, what are the auditing requirements?

A lot of times a file may be updated as it traverses business unit boundaries, and naturally the size of the file and the checksum will change. It may not be sufficient to simply save the most recent data. Similarly data derived from the filesystem (such as mtime and ctime) could be inaccurate.

Posted in PL/SQL, Programming, Ruby, Unix Utilities | Leave a Comment »

Oracle and rownum

Posted by Allan Peda on May 15, 2009

The other day I was looking at the pagination using the rownum pseudo column. Tom Kyte’s Oracle Column here has some useful hints and an excellent explanation of its dynamic nature. When I tried to recreate his query, it occurred to me that the following DQL would be more concise. I’m not sure if it is just as effective performance wise, but it is easier to read:

SQL> select T3.* from (select T2.*, rownum as outer_rownum
   2  from (select T1.report_date from mytab T1 order by T1.report_date )T2 )T3 
   3  where T3.outer_rownum between 3 and 5;


Which reminds me; much as I have a distaste for most things Microsoft®, it seems the Common Table Expressions (CTEs) are an overlooked way to make SQL more readable. It’s almost a secret, but Oracle 10g does supports CTE syntax.  Consider the following way of identifying violations of a business rule disallowing the same business unit being assigned to multiple companies (such as a Toyota Jeep being submitted in the same feed as a Chrysler Jeep):

with distinct_tab1_v as (select distinct
                              compid as COMP_ID,
                              unitid as UNIT_ID
                         from tab1)
select 'unit id: '|| unit_id || '  comp id: '|| comp_id
from distinct_tab1_v
where unit_id in ( select unit_id
                     from distinct_tab1_v
                 group by unit_id
                   having count(*) > 1 ) order by comp_id, unit_id;


The meat and potatoes of this query is the isolation of a unique collection of units and businesses, which are not needed elsewhere. Rather than create a view that sits around forever, a CTE accomplishes the same effect without globally modifying the schema.

It might be easier to look at this as three queries chained together. The first orders the select, the second assigns the row numbering, then the third imposes the limits on the desired rows. I rewrote the query using a table expression so as to make it easier to alias each derived select.

The raw data we are querying is as follows:

 select * from mytab order by commentary;

    ID DESCR        COMMENTARY
---------- -------------------- ----------
    10 ten         azure
     5 five        blue
     4 four        green
     6 six         indigo
     9 nine        ochre
     2 two         orange
     1 one         red
     8 eight       salmon
     7 seven       violet
     3 three       yellow


Selecting a subset from an ordered select is as simple (or as complex) as:

with ordered_sel as (
   select 
      id,
      descr,
      commentary
   from mytab
   order by commentary
)
select numbered_sel.* from (
   select
      ordered_sel.*,
      rownum as rnum
   from ordered_sel
) numbered_sel where numbered_sel.rnum between 2 and 5
/


Which results in:

	ID DESCR		COMMENTARY	 RNUM
---------- -------------------- ---------- ----------
	 5 five 		blue		    2
	 4 four 		green		    3
	 6 six			indigo		    4
	 9 nine 		ochre		    5


Yeah, SQL is weird, but after a while the logic becomes clearer.

Posted in PL/SQL, Programming | Tagged: | Leave a Comment »

Brushing up on PL/SQL

Posted by Allan Peda on April 27, 2009

A few weeks ago I came across what might be the best intermediate introduction to PL/SQL I have ever seen. While it is no substitute for the writings of masters such as Steven Feuerstein, it covers some essential aspects of PL/SQL and it clarified concepts such as block scope and dynamic cursors.


Doing SQL from PL/SQL: Best and Worst Practices

The Google html-ized version is available here

Posted in PL/SQL | Tagged: , | Leave a Comment »