Allan’s Musings

Just another WordPress.com weblog

Why Windows Powershell Is Next To Useless

Posted by Allan Peda on August 21, 2009

So a few weeks ago I decide to learn some powershell in order to set up a backup routine for some mission critical systems. One would expect that Microsoft would have put something together that was useful to *nix heads, in fact all they did was succeed in frustrating me.

Why is this? Well here in a nutshell are my complaints:

  • Powershell uses pipes are used for intra process communication. They move objects around. That’s all well and good, but pipes are historically used by shells for inter process communication.
  • Redirection locks files so that you cannot read them as they are being written. Terrific, so I cannot monitor powershell process that are launched in the background.
  • Redirection of standard error mangles the output. Why is it that such a good thing?

Here

1# ruby -e "STDERR.puts %Q[I AM AN ERROR]" 2> file
2# cat file
ruby.exe : I AM AN ERROR
At line:1 char:5
+ ruby <<<< -e "STDERR.puts %Q[I AM AN ERROR]" 2> file

So what the hell is going on here? But that’s not the end of it:

  • I cannot (easily) launch process in the background, so forget launching something that writes to a log, then tailing the log. This really hinders whating the progress of scheduled tasks. As to shell transcripts; yeah I tried that. Big failure.
  • This last one I think really put me over the edge. If I use Mark Russinovich’s psexec.exe to launch a terminal session, I get nothing, zip.

C:\>psexec \\appserver cmd.exe
PsExec v1.94 – Execute processes remotely
Copyright (C) 2001-2008 Mark Russinovich
Sysinternals – http://www.sysinternals.com
Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.
C:\WINDOWS\system32>powershell -command "echo hello"
*nothing*

Yes, where I typed the word “*nothing*” is where nothing is returned. It just hangs there. Some shell! Obviously powershell does not want to play well with others. I’ll be this is a feature!

  • Shell transcripts do not capture standout output of child process. I didn’t bother trying to work around this. By this time I decided to redo everything in perl or ruby if I ever have the chance.

Lets look at the powershell answer to *nix "find"

1# get-childitem C:\windows\system32 | where {$_.extension -eq ".log"}
Directory: Microsoft.PowerShell.Core\FileSystem::C:\windows\system32
Mode LastWriteTime Length Name
—- ————- —— —-
-a— 8/10/2009 12:35 PM 90 spupdwxp.log
-a— 4/16/2009 2:32 PM 211792 TZLog.log

Good luck trying to pipe that into pkzip or 7zip as a list of input filenames! As to the commandlets to format this stuff, I don’t want to do extra work to coerce my output into a simple format. (POSIX calls them builtins, but why not come up with a trademarked term. What a great idea!)

At this point I rest my case. Powershell exists only as a platform to prevent stringing programs together, which is contrary to the essential purpose of a shell. I am sure the party line response is to re-write everything in dotNet. Well sure, but the point of a shell is to allow you to quickly get work done, not to force you to re-write everything. Or maybe Microsoft doesn’t see it quite that way?

Powershell is a terrific example of "embrace and extend" and then break it as cleverly as possible!

Advertisements

Posted in Programming, Windows Frustrations | Tagged: , , | Leave a Comment »

On Extracting Transforming and Loading

Posted by Allan Peda on May 19, 2009

ETL

Tricky stuff this ETL. One might think it as simple as copying data from one spot to another, but as usual things can get complicated real fast.

Some questions that may come up:

  1. Who is upstream? Who is downstream? In the parlance of project management we might want to confer closely with both these shareholders.
  2. How often to the data feeds coming in? Are these feeds being pushed, pulled, or some combination thereof?
  3. Are these deltas to existing data, or are these simple additions to a data set?
  4. How large are the feeds?
  5. What if we miss a feed, as might happen during an extended network outage?
  6. Is one “feed” really multiple files which must be processed at one time? If so, is there a particular order they must be processed in?
  7. How often do the downstream users expect the data to materialize? How often is it generated?
  8. What type of processing is necessary to “digest” the data so that it is usable?
  9. Are there issues with auditing and data integrity? Do we need to always be able to trace data back to the source?

With these questions as a starting point we can identify possible issues and hopefully we can head off possible problems.

What are the tools we have at our disposal on a typical *nix system in a typical big firm?

  1. FTP, try as you might to avoid it, you will encounter old insecure FTP.
  2. Oracle SQL *Loader
  3. Perl and it’s newfangled relative Ruby, in conjunction with some variant of DBI.
  4. Unix cron and the fancier asynchronous utilities such as fsniper
  5. Oracle’s internal scheduler, which has replaced the older DBMS_JOB scheduling mechanism.

The Nitty Gritty: Files

On the face of it, a flat tab delimited file would seem to have little information in it beyond the contents and the filename, but if you reflect in it, every time that file is transferred across business unit boundaries, there is some (potential) data that someone may want to know. The following questions may need to be answered:

  1. When was the file first generated?
  2. Can we be sure it was not tampered with?
  3. How large is it?
  4. What time period does this report represent? Is the generation time to be assumed the same as the reporting time, or is there a delay? For example are reports for the previous year generated at the start of the next year?
  5. Again, what are the auditing requirements?

A lot of times a file may be updated as it traverses business unit boundaries, and naturally the size of the file and the checksum will change. It may not be sufficient to simply save the most recent data. Similarly data derived from the filesystem (such as mtime and ctime) could be inaccurate.

Posted in PL/SQL, Programming, Ruby, Unix Utilities | Leave a Comment »

Oracle Habits

Posted by Allan Peda on May 17, 2009

I am a big believer in following best practices, at least most of the time. In the case of Oracle databases that means creating surrogate keys. Even if they are of little use initially, I have found these unique identifiers are often useful later on. This is easy to do via auto numbering triggers.

Lets say we have a simple two column table, described below:

SQL> create table mytab (id integer primary key, descr varchar2(20));


The creation of an associated sequence and trigger will allow anyone to insert data into this table without bothering with the details of generating unique ID numbers.

SQL> create sequence mytab_seq;
Sequence created.
SQL> create or replace
trigger mytab_trg
before insert or update on mytab
for each row
BEGIN
  select nvl(:new.id,mytab_seq.nextval) into :new.id from dual;
END;
2    3    4    5    6    7    8  /
Trigger created.
SQL> insert into mytab (id, descr) values (1,'one');
1 row created.
SQL> insert into mytab (id, descr) values (null,'two');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from mytab;
	ID DESCR
---------- --------------------
	 1 one
	 2 two


Notice that the nvl() function always increments the sequence. This is important, as it allows users to manually add ID values without inadvertently triggering subsequent collisions. The only caveat I would add is that I am not sure if this is specified behavior, but I have observed it empirically
.

Posted in Programming, SQL | Tagged: , | Leave a Comment »

Oracle and rownum

Posted by Allan Peda on May 15, 2009

The other day I was looking at the pagination using the rownum pseudo column. Tom Kyte’s Oracle Column here has some useful hints and an excellent explanation of its dynamic nature. When I tried to recreate his query, it occurred to me that the following DQL would be more concise. I’m not sure if it is just as effective performance wise, but it is easier to read:

SQL> select T3.* from (select T2.*, rownum as outer_rownum
   2  from (select T1.report_date from mytab T1 order by T1.report_date )T2 )T3 
   3  where T3.outer_rownum between 3 and 5;


Which reminds me; much as I have a distaste for most things Microsoft®, it seems the Common Table Expressions (CTEs) are an overlooked way to make SQL more readable. It’s almost a secret, but Oracle 10g does supports CTE syntax.  Consider the following way of identifying violations of a business rule disallowing the same business unit being assigned to multiple companies (such as a Toyota Jeep being submitted in the same feed as a Chrysler Jeep):

with distinct_tab1_v as (select distinct
                              compid as COMP_ID,
                              unitid as UNIT_ID
                         from tab1)
select 'unit id: '|| unit_id || '  comp id: '|| comp_id
from distinct_tab1_v
where unit_id in ( select unit_id
                     from distinct_tab1_v
                 group by unit_id
                   having count(*) > 1 ) order by comp_id, unit_id;


The meat and potatoes of this query is the isolation of a unique collection of units and businesses, which are not needed elsewhere. Rather than create a view that sits around forever, a CTE accomplishes the same effect without globally modifying the schema.

It might be easier to look at this as three queries chained together. The first orders the select, the second assigns the row numbering, then the third imposes the limits on the desired rows. I rewrote the query using a table expression so as to make it easier to alias each derived select.

The raw data we are querying is as follows:

 select * from mytab order by commentary;

    ID DESCR        COMMENTARY
---------- -------------------- ----------
    10 ten         azure
     5 five        blue
     4 four        green
     6 six         indigo
     9 nine        ochre
     2 two         orange
     1 one         red
     8 eight       salmon
     7 seven       violet
     3 three       yellow


Selecting a subset from an ordered select is as simple (or as complex) as:

with ordered_sel as (
   select 
      id,
      descr,
      commentary
   from mytab
   order by commentary
)
select numbered_sel.* from (
   select
      ordered_sel.*,
      rownum as rnum
   from ordered_sel
) numbered_sel where numbered_sel.rnum between 2 and 5
/


Which results in:

	ID DESCR		COMMENTARY	 RNUM
---------- -------------------- ---------- ----------
	 5 five 		blue		    2
	 4 four 		green		    3
	 6 six			indigo		    4
	 9 nine 		ochre		    5


Yeah, SQL is weird, but after a while the logic becomes clearer.

Posted in PL/SQL, Programming | Tagged: | Leave a Comment »

The Ulam Spiral

Posted by Allan Peda on May 11, 2009

A few years ago I wrote a perl program to generate a map of prime numbers based on on a what is known as an Ulam Spiral. Sure enough the patterns are readily recognizible. The origin is barely discernible in the center of the image below as a lone red pixel, though you’ll need to zoom in to see it.

Prime numbers illustrated as Ulam spiral

Prime numbers Plotted as an Ulam Spiral


A color pdf version of the source is available here, and a downloadable version should be here

#!/usr/bin/env perl
# -*- cperl -*-

BEGIN { $^W = 1 }

use strict;
use diagnostics;
use warnings;
use Carp;
use English;
use Math::Trig;
use GD;
use constant PI => 4*atan(1);

# Allan Peda
# June 30, 2007
# Ulam prime number spiral
# http://mathworld.wolfram.com/PrimeSpiral.html
# http://en.wikipedia.org/wiki/Ulam_spiral
#
# we iterate over a grid, but we want to follow a spiral.
# Direction is cartesian based x being horizontal motion,
# and y being vertical. Up and right are postive motion.

unless(defined($ARGV[0])){
    die "Please provide a png file to save to.";
}

my $pngfile = 'ulam_spiral.png';
if($ARGV[0] =~ m/\.png$/i){
    $pngfile = $ARGV[0];
} else {
    print "Supplied file not suffixed with .png defaulting to \"$pngfile\".\n";
}

# how many turns to make in the spiral
my $num_corners = 1600;
my @turning_points = mkturns($num_corners);
# which directions do we follow
my $max = $turning_points[$#turning_points];
print "Generating map for first $max primes.\n";
my ($x_width, $y_width) = (0,0);
$x_width = $y_width = int(sqrt($max+0.5));

print "Dimensions are: $x_width x $y_width\n";
my %primes = %{genprimes($max)};

my $im           = new GD::Image($x_width, $y_width);
my $white        = $im->colorAllocate(255,255,255);
my $black        = $im->colorAllocate(0,0,0);
my $origin_color = $im->colorAllocate(255,0,0);
my $position=$turning_points[0];
my ($x, $y) = (0,0);
my ($delx,$dely) = (1,0); # start out moving to the right

sub gdx { return( shift()+($x_width/2) ) }
sub gdy { return( ($y_width/2)-shift()) }

$im->setPixel(gdx($x),gdy($y),$origin_color);

foreach my $next_turn (@turning_points) {
    for (my $i=$position;$isetPixel(gdx($x),gdy($y),$black);
        }
    }
    $position = $next_turn+1;
}

open(PNGFH, ">".$pngfile)
  or die "Could not open image file \"$pngfile\".";
print PNGFH $im->png;
close PNGFH;

# an array of run lengths up to each turn
# equation was guessed at after inspection of coordinates
# from a hand drawn cartesian spiral
sub mkturns{
    my $ulim = shift;
    my $len = 0;
    my @rv;
    foreach my $i (1..$ulim) {
        my $delta = ($i%2?$i+1:$i)/2;
        $len = $len+$delta;
        push( @rv, $len);
    }
    return @rv;
}

# generates prime numbers as hash keys
sub genprimes {
    my $max = shift;
    my $sieve = '';
    my %primes = ( 2 => 1 );
  GUESS: for (my $guess = 2; $guess <= $max; $guess++) {
        next GUESS if vec($sieve,$guess,1);
        $primes{$guess} = 1;
        for (my $mults = $guess * $guess; $mults <= $max; $mults += $guess) {
            vec($sieve,$mults,1) = 1;
        }
    }
    return \%primes;
}

# uses memoized trig function
# function modded to go past 180 degrees
{
    my %memo = ();
    sub nextDirection {
        my ($x, $y) = (shift, shift);
        my $nextoffset = undef;
        if( defined($memo{$x}{$y})){
            $nextoffset = $memo{$x}{$y};
        } else {
            my $acos = acos($x);
            my $asin = asin($y);
            if( $asin $asin?$acos:$asin)/PI);
            # hard coded count of four directions in spiral (right, up, left down)
            $nextoffset = $memo{$x}{$y} = ($offset+1)%4;
        }
        return to_xy($nextoffset);
    }
}

# uses memoized trig function
{
    my %memo = ();
    sub to_xy {
        defined($memo{$_[0]}) and return @{$memo{$_[0]}};
        $memo{$_[0]}= [int(cos($_[0]*PI/2)),int(sin($_[0]*PI/2))];
        return @{$memo{$_[0]}};
    }
}

Posted in Mathematical Recreations | Tagged: , , | Leave a Comment »

The Power of Ruby

Posted by Allan Peda on May 7, 2009

The other day I came across Oracle error ORA-01555 during the backup of a database. Thankfully Oracle had some good advice on addressing the issue here and I tossed together a Ruby script to make adjusting the system a bit more automated. Admittedly this could be done in one block of PLSQL, but it was easier to generate the DDL on the fly and print it our prior to running the script. Ruby rocks in this way, being able to do everything perl does, but (honestly I must admit) better, if only because it handles exceptions more neatly. To top all this off, I am not even talking about full application deployment with Rails, although Oracle covers that topic at great length.

# database stuff
require 'dbi'
require 'oci8'

# require 'rubygems'
# require 'optparse'

@verbose = false
def echo( arg )
  if @verbose
    puts arg
  end
end

begin
   # db handle
   oph = DBI.connect('DBI:OCI8:DB',  'system',  'pass')
   
   # first fetch prior recorded timestamp from the monitorin DB
   # only need first row
   tuple_snapshot_undo = oph.select_one(
      %Q[ /* tune_undotbs.rb */
         SELECT
            MIN(snap_id) as min_snap_id,
            MAX(snap_id) as max_snap_id
         FROM dba_hist_snapshot
      ])
   min_snap_id = tuple_snapshot_undo[0].to_i
   max_snap_id = tuple_snapshot_undo[1].to_i
   # puts "#{min_snap_id}, #{max_snap_id}"
   
   tune_proc = %Q[
      DECLARE
        l_task_name  VARCHAR2(30) := '#{min_snap_id}_#{max_snap_id}_AWR_SNAPSHOT_UNDO';
        l_object_id  NUMBER;
      BEGIN
        -- Create an ADDM task.
        DBMS_ADVISOR.create_task (
          advisor_name      => 'Undo Advisor',
          task_name         => l_task_name,
          task_desc         => 'Undo Advisor Task');

        DBMS_ADVISOR.create_object (
          task_name   => l_task_name,
          object_type => 'UNDO_TBS',
          attr1       => NULL, 
          attr2       => NULL, 
          attr3       => NULL, 
          attr4       => 'null',
          attr5       => NULL,
          object_id   => l_object_id);

        -- Set the target object.
        DBMS_ADVISOR.set_task_parameter (
          task_name => l_task_name,
          parameter => 'TARGET_OBJECTS',
          value     => l_object_id);

        -- Set the start and end snapshots.
        DBMS_ADVISOR.set_task_parameter (
          task_name => l_task_name,
          parameter => 'START_SNAPSHOT',
          value     => #{min_snap_id});

        DBMS_ADVISOR.set_task_parameter (
          task_name => l_task_name,
          parameter => 'END_SNAPSHOT',
          value     => #{max_snap_id});

        -- Execute the task.
        DBMS_ADVISOR.execute_task(task_name => l_task_name);
      END;
   ]
   oph.do(tune_proc)
rescue DBI::DatabaseError => e
     STDERR.puts "Error code:    #{e.err}"
     STDERR.puts "Error message: #{e.errstr}"
ensure
     oph.disconnect if oph
end

Posted in Programming, Ruby | Tagged: | Leave a Comment »

Brushing up on PL/SQL

Posted by Allan Peda on April 27, 2009

A few weeks ago I came across what might be the best intermediate introduction to PL/SQL I have ever seen. While it is no substitute for the writings of masters such as Steven Feuerstein, it covers some essential aspects of PL/SQL and it clarified concepts such as block scope and dynamic cursors.


Doing SQL from PL/SQL: Best and Worst Practices

The Google html-ized version is available here

Posted in PL/SQL | Tagged: , | Leave a Comment »

Timeout functions

Posted by Allan Peda on April 26, 2009

Recently I had to write some controlling code for batch jobs that are run nightly. the problem was that transient network issues caused jobs to hang. The solution was to force hanging jobs to fail after a specified timeout allowing these jobs could be retried several times.

Usage:
timeout 5 /usr/local/bin/some_longtask

function timeout {
  if [[ ${1//[^[:digit:]]} != ${1} ]]; then
    echo "First argument of this function is timeout in minutes." >&2
    return 1
  fi

  declare -i timeout_minutes=${1:-1}
  shift
  # sanity check, can this be run at all?
  if [ ! -x $1 ]; then
    echo "Error: attempt to locate background executable failed." >&2
    return 2
  fi

  "$@" &
  declare -i bckrnd_pid=$!
  declare -i jobspec=$(echo kill -9 $bckrnd_pid |\
    at now + $timeout_minutes minutes 2>&1 |\
    perl -ne 's/\D+(\d+)\b.+/$1/ and print')
  # echo kill -9 $bckrnd_pid | at now + $timeout_minutes minutes
  # echo "will kill -9 $bckrnd_pid after $timeout_minutes minutes" >&2
  wait $bckrnd_pid
  declare -i rc=$?
  # cleanup unused batch job
  atrm $jobspec 2>/dev/null
  return $rc
}

Posted in Programming | Tagged: , | Leave a Comment »