Allan’s Musings

Just another weblog

Archive for the ‘Ruby’ Category

An agile object oriented scripting language that can be thought of as a hybrid of the Perl and Smalltalk languages.

On Extracting Transforming and Loading

Posted by Allan Peda on May 19, 2009


Tricky stuff this ETL. One might think it as simple as copying data from one spot to another, but as usual things can get complicated real fast.

Some questions that may come up:

  1. Who is upstream? Who is downstream? In the parlance of project management we might want to confer closely with both these shareholders.
  2. How often to the data feeds coming in? Are these feeds being pushed, pulled, or some combination thereof?
  3. Are these deltas to existing data, or are these simple additions to a data set?
  4. How large are the feeds?
  5. What if we miss a feed, as might happen during an extended network outage?
  6. Is one “feed” really multiple files which must be processed at one time? If so, is there a particular order they must be processed in?
  7. How often do the downstream users expect the data to materialize? How often is it generated?
  8. What type of processing is necessary to “digest” the data so that it is usable?
  9. Are there issues with auditing and data integrity? Do we need to always be able to trace data back to the source?

With these questions as a starting point we can identify possible issues and hopefully we can head off possible problems.

What are the tools we have at our disposal on a typical *nix system in a typical big firm?

  1. FTP, try as you might to avoid it, you will encounter old insecure FTP.
  2. Oracle SQL *Loader
  3. Perl and it’s newfangled relative Ruby, in conjunction with some variant of DBI.
  4. Unix cron and the fancier asynchronous utilities such as fsniper
  5. Oracle’s internal scheduler, which has replaced the older DBMS_JOB scheduling mechanism.

The Nitty Gritty: Files

On the face of it, a flat tab delimited file would seem to have little information in it beyond the contents and the filename, but if you reflect in it, every time that file is transferred across business unit boundaries, there is some (potential) data that someone may want to know. The following questions may need to be answered:

  1. When was the file first generated?
  2. Can we be sure it was not tampered with?
  3. How large is it?
  4. What time period does this report represent? Is the generation time to be assumed the same as the reporting time, or is there a delay? For example are reports for the previous year generated at the start of the next year?
  5. Again, what are the auditing requirements?

A lot of times a file may be updated as it traverses business unit boundaries, and naturally the size of the file and the checksum will change. It may not be sufficient to simply save the most recent data. Similarly data derived from the filesystem (such as mtime and ctime) could be inaccurate.


Posted in PL/SQL, Programming, Ruby, Unix Utilities | Leave a Comment »

The Power of Ruby

Posted by Allan Peda on May 7, 2009

The other day I came across Oracle error ORA-01555 during the backup of a database. Thankfully Oracle had some good advice on addressing the issue here and I tossed together a Ruby script to make adjusting the system a bit more automated. Admittedly this could be done in one block of PLSQL, but it was easier to generate the DDL on the fly and print it our prior to running the script. Ruby rocks in this way, being able to do everything perl does, but (honestly I must admit) better, if only because it handles exceptions more neatly. To top all this off, I am not even talking about full application deployment with Rails, although Oracle covers that topic at great length.

# database stuff
require 'dbi'
require 'oci8'

# require 'rubygems'
# require 'optparse'

@verbose = false
def echo( arg )
  if @verbose
    puts arg

   # db handle
   oph = DBI.connect('DBI:OCI8:DB',  'system',  'pass')
   # first fetch prior recorded timestamp from the monitorin DB
   # only need first row
   tuple_snapshot_undo = oph.select_one(
      %Q[ /* tune_undotbs.rb */
            MIN(snap_id) as min_snap_id,
            MAX(snap_id) as max_snap_id
         FROM dba_hist_snapshot
   min_snap_id = tuple_snapshot_undo[0].to_i
   max_snap_id = tuple_snapshot_undo[1].to_i
   # puts "#{min_snap_id}, #{max_snap_id}"
   tune_proc = %Q[
        l_task_name  VARCHAR2(30) := '#{min_snap_id}_#{max_snap_id}_AWR_SNAPSHOT_UNDO';
        l_object_id  NUMBER;
        -- Create an ADDM task.
        DBMS_ADVISOR.create_task (
          advisor_name      => 'Undo Advisor',
          task_name         => l_task_name,
          task_desc         => 'Undo Advisor Task');

        DBMS_ADVISOR.create_object (
          task_name   => l_task_name,
          object_type => 'UNDO_TBS',
          attr1       => NULL, 
          attr2       => NULL, 
          attr3       => NULL, 
          attr4       => 'null',
          attr5       => NULL,
          object_id   => l_object_id);

        -- Set the target object.
        DBMS_ADVISOR.set_task_parameter (
          task_name => l_task_name,
          parameter => 'TARGET_OBJECTS',
          value     => l_object_id);

        -- Set the start and end snapshots.
        DBMS_ADVISOR.set_task_parameter (
          task_name => l_task_name,
          parameter => 'START_SNAPSHOT',
          value     => #{min_snap_id});

        DBMS_ADVISOR.set_task_parameter (
          task_name => l_task_name,
          parameter => 'END_SNAPSHOT',
          value     => #{max_snap_id});

        -- Execute the task.
        DBMS_ADVISOR.execute_task(task_name => l_task_name);
rescue DBI::DatabaseError => e
     STDERR.puts "Error code:    #{e.err}"
     STDERR.puts "Error message: #{e.errstr}"
     oph.disconnect if oph

Posted in Programming, Ruby | Tagged: | Leave a Comment »