Browsing all posts tagged perl

One of the things I most appreciate about Perl is that it requires code blocks to be surrounded by curly braces. In my mind, this is particularly important with nested if-else statements. Many programming languages don't require braces to surround code blocks, so nested conditionals can quickly become unreadable and much harder to maintain. Let's take a look at an example:

if (something)
    if (another_thing)
    {
        some_call;
        some_other_call;
        if (yet_another_thing)
        {
            do_it;
            do_it_again;
        }
    }

Note that the outer if-statement doesn't have corresponding curly braces. As surprising as it may seem, this is completely legal code in many languages. In my opinion, this is a dangerous programming practice. If I wanted to add additional logic to the contents of the outer if block, I would have to remember to put the appropriate braces in place.

Had I attempted to use this code in a Perl script, the interpreter would have complained immediately, even if warnings and strict parsing were both disabled! This kind of safety checking prevents me from shooting myself in the foot. Some may complain that requiring braces makes programming slightly more inefficient from a productivity standpoint. My response to that is that any code editor worth its salt can insert the braces for you. My favorite editor, SlickEdit, even supports dynamic brace surrounding, a feature I truly appreciate. It's a shame that more programming languages don't enforce this kind of safety net. Hopefully future languages will keep small matters like this in mind.

One of my Perl scripts here at work used the Add_Delta_Days subroutine from the Date::Calc module to do some calendar date arithmetic. I'm in the process of building a new machine on which this script will run, and I don't have access to an external network. Unfortunately, the install process for Date::Calc is fairly difficult. The module relies on a C library which must be compiled with the same compiler as was used to build the local Perl install. To make matters worse, the modules that Date::Calc is dependent on have similar requirements. As a result, I decided to skip installing this non-standard module, and instead use a home-brew replacement. It turns out that Add_Delta_Days is fairly straightforward to replace:

use Time::Local; # Standard module

sub addDaysToDate
{
    my ($y, $m, $d, $offset) = @_;

    # Convert the incoming date to epoch seconds
    my $TIME = timelocal(0, 0, 0, $d, $m-1, $y-1900);

    # Convert the offset from days to seconds and add
    # to our epoch seconds value
    $TIME += 60 * 60 * 24 * $offset;

    # Convert the epoch seconds back to a legal 'calendar date'
    # and return the date pieces
    my @values = localtime($TIME);
    return ($values[5] + 1900, $values[4] + 1, $values[3]);
}

You call this subroutine like this:

my $year = 2009;
my $month = 4;
my $day = 22;

my ($nYear, $nMonth, $nDay) = addDaysToDate($year, $month, $day, 30);

This subroutine isn't a one-to-one replacement, obviously. Unlike Date::Calc, my home-brew subroutine suffers from the Year 2038 problem (at least on 32-bit operating systems). It likewise can't go back in time by incredible amounts (I'm bound to the deltas around the epoch). However, this workaround saves me a bunch of setup time, and works just as well.

I ran into a strange problem with a Perl CGI script yesterday. Upon script execution, I received the following error message from IIS:

CGI Error The specified CGI application misbehaved by not returning a complete set of HTTP headers.

A quick Google search of this error message turned up a number of discussions mentioning bugs in IIS, server configuration problems, etc. However, I suspected that my scripts were to blame (I had been hacking on them on Friday). But how could I determine whether I was at fault or if the server was to blame? Thankfully, the solution comes through one of the Perl CGI modules (here's the Perl tip):

use CGI::Carp qw(fatalsToBrowser warningsToBrowser);

The Carp module (and where does that name come from?) gives us the fatalsToBrowser and warningsToBrowser subroutines. When included in your script, any resulting Perl execution errors will be output into the browser window (very handy). After turning on these features, I immediately found my error. It resided in this line (here's the gotcha):

$safeProductName =~ s/\$/\\$/g;

It was my intent to replace any instances of the dollar sign character ($) with a backslash-dollar sign pair (\$). At first glance, this substitution rule may look alright. But it's not! The replacement portion of a substitution is treated as a double quoted string. So, the interpreter was escaping the backslash just fine, but then hits a naked dollar sign, indicating a variable (of which I didn't provide a name). And so it chokes! The line should have read:

$safeProductName =~ s/\$/\\\$/g;

Note the three backslashes in the replacement string. Two to print an actual backslash character, and one to print the actual dollar sign. Subtle? You bet.

I ran into an interesting side-effect with the foreach loop in Perl today. I'm surprised that I haven't hit this before, but it may be a subtle enough issue that it only pops up under the right circumstances. Here's a sample program that we'll use as an example:

#!/usr/bin/perl
use strict;
use warnings;

my @array = ("Test NUM", "Line NUM", "Part NUM");

for (my $i=0; $i < 3; $i++)
{
    foreach (@array)
    {
        s/NUM/$i/;
        print "$_\n";
    }
    print "------\n";
}

What should the output for this little script look like? Here's what I assumed it would be:

Test 0
Line 0
Part 0
------
Test 1
Line 1
Part 1
------
Test 2
Line 2
Part 2
------

But here's the actual output:

Test 0
Line 0
Part 0
------
Test 0
Line 0
Part 0
------
Test 0
Line 0
Part 0
------

So what's going on here? Well, it turns out that the foreach construct doesn't act quite like I thought it did. Let's isolate just that loop:

foreach (@array)
{
    s/NUM/$i/;
    print "$_\n";
}

We simply loop over each element of the array, we do a substitution, and we print the result. Pretty simple. Pay attention to the fact that we are storing each iteration through the loop in Perl's global $. The point here is that $ doesn't represent a copy of the array element, it represents the actual array element. From the Programming Perl book (which I highly recommend):

foreach VAR (LIST) {
    ...
}
If LIST consists entirely of assignable values (meaning variables, generally, not enumerated constants), you can modify each of those variables by modifying VAR inside the loop. That's because the foreach loop index variable is an implicit alias for each item in the list that you're looping over.

This is an interesting side effect, which can be unwanted in some cases. As a workaround, I simply created a temporary buffer to operate on in my substitution call:

foreach (@array)
{
    my $temp = $_;
    $temp =~ s/NUM/$i/;
    print "$temp\n";
}

An easy fix to a not-so-obvious problem.

A little over a year ago, I inherited a productivity tool at work that allows users to enter weekly status reports for various products in our division. The tool is web-based and is written entirely in Perl. One of the mangers who uses this tool recently suggested a new feature, and I decided to implement it using cookies. Having never implemented cookies from a programming perspective, I was new to the subject and had to do some research on how to do it in Perl. It turns out to be quite easy, so I figured I would share my newfound knowledge:

Creating a Cookie

Although there are other ways to do this (as always with Perl), this tutorial will be making use of the CGI::Cookie module. It makes creating and reading cookies very easy, which is a good thing. Furthermore, this module ships with virtually all Perl distributions! Here's a chunk of code that creates a cookie:

use CGI qw(:all);

my $cgi = new CGI;
my $cookie = $cgi->cookie(-name => 'my_first_cookie',
                          -value => $someValueToStore,
                          -expires => '+1y',
                          -path => '/');

print $cgi->header(-cookie => $cookie);

I first import all of the CGI modules. This isn't exactly necessary, and it might be a little slower than using the :standard include directive, but I needed a number of sub-modules for the tool I was writing. I then create a new CGI object, and use it to call the cookie() subroutine. This routine takes a number of parameters, but the most important ones are shown.

The -name parameter is simply what you want to name this cookie. You should use something that clearly identifies what the cookie is being used for (though you should always be mindful of the associated security implications). The -value parameter is just that: the value you wish to store in the cookie. I believe cookies have a bounds of around 4K of storage, so remember to limit what you store. Next up is the -expires parameter, which specifies how far into the future (or past) the cookie should expire. The value of '+1y' that we specified in the example above indicates we should expire in one year's time. Values in the past (specified with a minus sign) simply indicate that the cookie should be expired immediately. No value will cause the cookie to expire when the user closes their browser. Finally, the -path parameter indicates for what paths on your site the cookie should apply. A value of '/cgi-bin/' for example will only allow the cookie to work for scripts in the /cgi-bin folder of your site. We specified '/' in our example above, which means the cookie is valid for any path at our site.

Finally we print our CGI header, passing along a -cookie parameter with our cookie variable. As always, the documentation for the CGI module will give you lots more information on what's available.

Reading a Cookie

Reading back the value stored in a cookie is even simpler:

use CGI qw(:all);

my $cgi = new CGI;
my $someValue= $cgi->cookie('my_first_cookie');

Again we create our CGI object, but this time we use it to read our cookie, simply by calling the cookie() routine with the name of the cookie we created before. If the cookie is found, the stored value is read and stored into our variable ($someValue in the example above). If the cookie is not found, a null value is returned.

One Gotcha

In the tool I was working with, I was handling storing and reading the cookie on the same page. Since we have to create our cookie via the header() call, I was concerned about how to handle the case where we weren't creating a cookie. The solution, it turns out, is pretty simple:

use CGI qw(:all);

my $cgi = new CGI;
unless (param())
{
    print $cgiquery->header;
}

In this example, we print out a generic CGI header only if no parameters were passed in (i.e. the user didn't push us either a POST or GET). If we do have parameters, we want to create a cookie, and we'll send the header after we have done so. Pretty easy!

Perl 5.10

Feb 11, 2008

I just found out about Perl 5.10, which has been out for some time now (released on December 18 ... how did I miss this?). The perldelta documentation goes into detail on what's new, but here's a brief overview of some of the features I find most appealing:

The 'feature' pragma

First and foremost is the feature pragma, which is used to turn on the new features added by 5.10. By default, the new features are disabled, and you explicitly have to request their support (a great idea, in my opinion). A simple use feature; statement will do the trick.

New 'Defined-Or' operator

A new // operator is now available, for handling the 'defined-or' case. For example:

$a // $b; # This is equivalent to the line below

defined $a : $a ? $b; # Same meaning as above

This new operator has the same precedence as the logical-or operator. In typical Perl fashion, the new operator is simply a shortcut that makes your scripts shorter and more difficult to read one month after you write it. ;)

Switch statements

At long last, Perl has a switch statement. The syntax here is quite different from other programming languages with which you might be familiar:

given ($state)
{
    when ("state_1") { $a = 1; }
    when (/^abcdef/) { $b = 2; }
    default { $c = 0; }
}

The various when tests allow for some powerful options, including: array slices, string compares, regular expression matches, and beyond.

Named captures in regular expressions

Suppose we want to read in a configuration file that contains lines with the following structure: option = value. Today, we could write a regular expression to capture these values like this: /(\w+) = (\w+)/. We would then access the captured values with $1 and $2.

In Perl 5.10, we could write the same expression like this: /(<?option>\w+) = (<?value>\w+)/. Now, the captured values are accessed through either the %+ or %- magical hashes, using each label as the key into each hash (see the perldelta documentation for the differences between the two hashes). This will make complex regular expressions much easier to decipher, and gets rid of the annoying parenthesis counting that we currently have to do.

Just 'say' it

The new say keyword is just like print, but it automatically adds a newline at the end of what it prints. How great is that? This simplifies printing code a little bit, especially for loops. Instead of print "$_\n" for @items; we can now use say for @items;. Clean and simple!

Stackable file tests

Doing multiple file tests is much easier now. Instead of if (-f $file and -w $file and -z $file) we can now write if (-f -w -z $file). Again, this makes things much cleaner.

Better error messages

Have you ever seen this error message? I know I have:

$str = "Hello $name! Today is $day and the time is $time.\n";

Use of uninitialized value in concatenation (.) or string at test.pl line 3.

In 5.10, this same error message will read:


$str = "Hello $name! Today is $day and the time is $time.\n";

Use of uninitialized value $time in
concatenation (.) or string at test.pl line 3.

Now I can know exactly where the error occurred! Finally!

And lots more

There are plenty of other new features that I haven't touched here: recursive regular expressions, a new smart matching operator, state ("static") variables, inside-out objects, and lots more. I'm really looking forward to trying out some of these new features.

It's time once again for a programming tips grab bag. As with the previous grab bag, I'll focus on Perl tips since I've been doing some Perl coding recently. Next time, I'll present some tips for PHP.

1. Always use the 'strict' and 'warning' pragmas for production code

This tip is pretty much a no-brainer. Whenever you write production level code, you must make use of the 'strict' pragma (enabled with 'use strict;'). Not only will it save you from a lot of pain in the long run, but it also forces you to write cleaner code. You should also enable warnings, just for good measure. And don't do this at the end of your development cycle; do it right from the beginning. Always start scripts that you think will be used by others with the following two lines:

#!/usr/bin/perl
use strict;
use warnings;

I can't tell you how many times turning on strict checking has saved me from some goofy problems (such as using square brackets instead of curly braces for a hash reference).

2. Use 'our' to fake global variables

Global variables are generally considered to be bad practice in the world of programming, and rightfully so. They can cause untold amounts of trouble and can be quite dangerous in the hands of novice programmers. Out of the box, Perl only uses global variables, which is both a blessing and a curse. For quick and dirty scripts, globals are fine (and encouraged). But for production level code (which uses the 'strict' pragma mentioned above), globals aren't an option.

But sometimes, you can't avoid having a global variable (and they even make more sense than locals in some instances). I recently made use of the File::Find module in one of my scripts, calling it like this:

#!/usr/bin/perl
use strict;
use warnings;
use File::Find;

my $inSomeState;
find(\&mySearchFunction, $somePathVariable);

sub mySearchFunction {
    if ($inSomeState) {
        # Do something
    }
}

The find() call will execute the mySearchFunction subroutine, operating in the $somePathVariable folder. I cannot pass any parameters to the mySearchFunction subroutine, but it needs to be able to check the value of the variable $inSomeState. We previously created this variable using the 'my' construct, but since this subroutine is out of that variable's scope, Perl will complain. We can fix this by forcing the $inSomeState variable to be global, using the our call instead of 'my':

#!/usr/bin/perl
use strict;
use warnings;
use File::Find;

our $inSomeState;
find(\&mySearchFunction, $somePathVariable);

sub mySearchFunction {
    if ($inSomeState) {
        # Do something
    }
}

By declaring the variable with 'our,' we essentially force the variable into a global state (for the current scope, which happens to be the script itself in this case). Very handy!

3. Capture matched regex expressions inline

The parenthesis capturing functionality in regular expressions is extremely useful. However, I found that I always wrote my capture statements as a part of an if block:

if(m/(\w+)-(\d+)/)
{
    my $word = $1;
    my $number = $2;
}

I recently learned that this same code can be shortened into a one liner:

my ($word, $number) = (m/(\w+)-(\d+)/);

Of course, the match may not occur, so you'd have to test that the values of $word and $number aren't null, but it's a cleaner way of capturing stuff from a regular expression.

4. Make sure to shift by 8 for return codes

If you're trying to automate something (which I have been doing a lot of recently), the return codes from external processes are generally of great interest. The system call makes executing a process very easy, but getting the return code is (to me at least) a little non-intuitive. Here's how to do it:

system ("some_process.exe");

my $retval = ($? >> 8);

The return code from the some_process.exe program will be stored in the $? variable, but you have to remember to shift the value right by 8 to get the actual return value.

Another new recurring feature I'm going to try out here at the site are programming tip 'grab bags.' These will often feature a few tips I've picked up over the years, which I find highly useful. We'll start out this inaugural article with a few Perl tips:

1. Don't parse command line options yourself

One thing I've learned a number of times over is to never parse command line options yourself. Why? Because the Getopt::Long and Getopt::Std modules do it for you (and they make it both easy and convenient). These standard modules allow you to store away your command line options either in separate variables, or in a hash. There are times you'll want to use Getopt::Long over Getopt::Std (and vice-versa), so know the differences between the two. Either one will save you lots of time and headache. Here's one way to make use of this module:

use Getopt::Std;

our($opt_c, $opt_d, $opt_t);
getopts("cdt:");

my $filename = shift;

This tiny snippet parses the given command line parameters, looking for either a 'c', a 'd', or a 't' option. In this example, the 'c' and 'd' options are flags and the 't' option expects a user supplied value (note the trailing colon). If the user passes either '-c' or '-d' on the command line, the $opt_c and $opt_d variables will get set appropriately (otherwise, they remain null). Likewise, if the user passes a '-t' on the command line, the $opt_t variable gets set to the value the user passed in (so the user would need to type something like myScript.pl -t someValue). Otherwise, $opt_t remains null. Also note that we are still able to retrieve other values passed in via the command line (in this example, a filename). Quite handy!

One other hidden benefit of the Getopt modules is the fact that they handle combined options. So, myScript.pl -cd would parse just the same as myScript.pl -c -d. Doing this kind of parsing by hand would be tricky, so don't try to do it. Let Getopt do all the work for you.

Getopt::Long allows for long options (which make use of the double dash, such as --verbose), but it can also handle single letter options. Storing options in a hash is also available to both modules, making it very easy to set up if you have lots of options to parse.

2. Use printf (or variants) to print plurals

This tip comes from the excellent Perl Cookbook, and I've used it a number of times. Use either the printf or sprintf functions to handle printing the proper plural (or singular) of a value. For example:

printf "%d item%s returned", $size, $size == 1 ? "" : "s";

If there were only 1 item, we would print out 1 item returned. Likewise, if we printed out 2 or more items, 2 items returned (note the trailing 's'). You can use this trick to print the proper plural for words that have strange plurals, like "goose" and "geese."

3. Use File::Spec to handle cross platform file paths

The File::Spec module and its children allow one to easily make cross-platform file paths, useful for those scripts which must operate across operating systems. In one project at work, I made use of the File::Spec::Functions module, which exports a number of handy functions. I find the catfile function very handy, and I use it like so:

my $logFile = catfile('weeklybuild', 'log', 'build.log');

The function takes care of putting the right separators between the values (backslash for Windows, forward slash for Linux, and colons for the Mac).

A Perl Module Primer

Aug 18, 2007

I've recently been wrangling with some Perl code for a project at work, and have been putting together a Perl module that includes a number of common functions that I need. As such, I had to remind myself how to create a Perl module. During my initial development, I ran into a number of problems, but I eventually worked through all of them. In the hopes of helping myself remember how to do this, and to help any other burgeoning Perl developers, I've written the following little guide. Hopefully it will help shed some light on this subject.

Let me preface this guide with two important statements:

  1. I'm not aiming to show you how to create a module for distribution. Most of the other tutorials cover that topic in depth.
  2. I am going to assume that you have a working knowledge of Perl.

To start, let's take a look at our sample module:

package MyPackage;
use strict;
use warnings;

require Exporter;
our @ISA = ("Exporter");

our %EXPORT_TAGS = ( 'all' => [ qw(sayHello whoAreYou $firstName
    %hashTable @myArray) ] );
our @EXPORT_OK = (@{ $EXPORT_TAGS{'all'} });
our @EXPORT = qw();

our $firstName = "Jonah";
our $lastName = "Bishop";

our %hashTable = { a => "apple", b => "bird", c => "car" };
our @myArray = ("Monday", "Tuesday", "Wednesday");

sub sayHello
{
    print "Hello World!\n";
}

sub whoAreYou
{
    print "My name is $firstName $lastName\n";
}

1;

We start out by declaring our package name with the package keyword. Special Note: If you intend on having multiple modules, and you use the double colon (::) separator, you're going to need to set up your directory structure correspondingly. For example, if I had two modules, one named Jonah::ModuleOne and another named Jonah::ModuleTwo, I would need to have a folder named Jonah, inside of which would live the code to my two modules.

I next enable the strict and warnings pragmas, since that's good programming practice. Lines 5 and 6 are standard to virtually all Perl modules. First, we require inclusion of the standard Exporter module, then we indicate that our module inherits from said Exporter (the @ISA (is a) array is what sets this).

Line 8 is where things get interesting. We need to specify what symbols we want to export from this module. There are a number of ways of doing this, but I have chosen to use the EXPORT_TAGS hash. Special Note: This is a hash, not an array! I recently spent about an hour trying to debug a strange error message, and it all stemmed from the fact that I had accidentally created this as an array.

The EXPORT_TAGS hash gives us a means of grouping our symbols together. We essentially associate a label with a group of symbols, which makes it easy to selectively choose what you want to import when using the module. In this example, I simply have a tag named 'all' which, as you might guess, allows me to import all of the specified symbols I provide in the associated qw() list. Note that you must precede exported variable names with their appropriate character: $ for scalars, @ for arrays, and % for hashes. Exported subroutines don't need to have the preceding & character, but it doesn't hurt if you put it there.

Line 10 shows the EXPORT_OK array. This array specifies the symbols that are allowed to be requested by the user. I have placed the EXPORT_TAGS{'all'} value here for exporting. I will show how to import this symbol into a script in just a moment. Line 11 is the EXPORT array, which specifies the symbols that are exported by default. Note that I don't export anything by default. Special Note: It is good programming practice to not export anything by default; the user should specifically ask for their desired symbols when they import your package.

Lines 13 through 27 should be self explanatory. We set up two scalar variables, $firstName and $lastName, as well as a hash table and an array. Note that we precede all variables with the our declaration, which puts this variable into the global scope for the given context. Since we're using the strict pragma, we need these our declarations; otherwise we'd get some compilation errors.

Line 29 is very important and can easily be forgotten. When a Perl module is loaded via a use statement, the compiler expects the last statement to produce a true value when executed. This particular line ensures that this is always the case.

Now that we've taken a look at the module, let's take a look at a script that uses it:

#!/usr/bin/perl
use strict;
use warnings;
use MyPackage qw(:all);

sayHello();
whoAreYou();

print "$lastName\n"; # WRONG!
print $MyPackage::lastName . "\n"; # RIGHT!

Most of this should be pretty clear. Note, however, how we import the module on line 4. We do the typical use MyPackage statement, but we also include the symbols we want to import. Since we didn't export anything by default, the user has to explicitly ask for the desired symbols. All we exported was a tag name, so we specify it here. Note the preceding colon! When you are importing a tag symbol, it must be preceded by a single colon. This too caused me a great deal of frustration, and it's a subtlety that's easily missed.

One other interesting note: on line 9, we try to print the $lastName variable. Since we never exported that particular variable in our module, referencing it by name only will result in an error. The correct way to access the variable, even though it wasn't exported, is shown on line 9. You must fully qualify non-exported symbols!

Hopefully this quick little guide has made things a little clearer for you. If for no other reason, it will help me remember these subtleties of Perl programming. :-)