Skip to main content
  1. Posts/

Giving Perl a Chance

·2742 words·13 mins

Me and the Perl programming language have had a few close encounters in the past. I was born in the late 90s, and got into programming sometime around 2010, and at the time Perl still used to show up as a serious contender when googling "how to learn programming" on Google.

I don't remember why I settled on Python in the end. And I don't remember why– more than a decade later– I decided to give Perl a whirl again, but here we are. What follows is a personal account of my descent into the bizarre, arcane, and often entertaining world of Perl programming.

I remember the thing that always stuck out to me like a sore thumb about Perl was the strange mechanism used for accepting arguments. The my (...) = @_ pattern here is simply a convention, it unpacks the implicit array variable @_ into the "parameters" $x, $y, and $z.

sub f {
    my ($x, $y, $z) = @_;
    $x + $y * $z
}

You see, in Perl, instead of explicitly declaring the input parameters of a function, you instead just always get all the parameters as an array called _ (handy!)

Also, Perl, unlike Python, doesn't throw an error when unpacking an array of \( m \) elements into \( n \neq m \) parameters, so this example accepts any number of arguments. The error– if one is even produced– appears later down the line when e.g $y = undef is used erroneously.

But hang on, what is an implicit variable? I'll answer that question with yet another lucid Perl program.

sub g {
    my $arg = shift;
    my $arg_2 = shift;
    $arg - $arg_2
}

print "result: " . g(5, 3);
result: 2

shift pops from the front of a vector. But which vector? Well, it takes an argument of course, but that argument can be implicit.

First, a note on the @ character used in @_, this is called a sigil.

my @vec = (1, 2, 3);

Here we declare a list called vec. Perl has a separate namespace for lists (@), maps (%), scalars ($), and functions/subroutines (&). We need to prefix our variables with the appropriate sigil, like @ to tell the Perl interpreter which namespace we are referring to. A $name and a @name can exists simultaneously.

Now we can pass vec to shift, and put that into the scalar x.

my $x = shift @vec;
$x == 1

But. If given no arguments, it will supplement using the implicit argument, called _. And because shift expects a list variable, it will be given @_. Other functions may be given $_ implicitly if they expect a scalar like a number or a string.

Ruby inherited some of these Perlisms, though today Ruby linters will yell at you for using them, they are still commonplace and indeed considered idiomatic in Perl.

Implicit variables seem like a really bad idea, but there's something about them that makes code flow nicely when used appropriately (well, under at least one definition of appropriate.)

/^DEBUG/ and print while (<>)

To the uninitiated, this means

# The construct <FILE> reads a line from FILE, omitting FILE means to read
# from STDIN.
while (my $line = <STDIN>) {
  if ($line =~ /^DEBUG/) {
    print $line;
  }
}

Using the implicit parameter $_ instead, we get:

# The special construct while (<>) will set $_ to the next line of input until EOF
while (<>) {
  # Equivalent to $_ =~ /^DEBUG/
  if (/^DEBUG/) {
    print;
  }
}

Then we can use and in place of if, and use a postfix while keyword to get back to our original example.

while (<>) { /^DEBUG/ and print }
# equivalent to
/^DEBUG/ and print while (<>)

The implicit variable also exists in AWK, one of the languages that Perl took inspiration from. In AWK, the last example would simply be written as

/^DEBUG/

Because AWK is a DSL for solving problems related to filtering text files line-by-line, everything is already wrapped in a while (<>) { ... }, regular expressions are always matched with the current line ($0) by default, and a statement being true with no other context implies to print the current line. Indeed perl has command-line flags to operate more like AWK, for this kind of line-oriented filtering.

perl -ne "print if /^DEBUG/"

Still an evolving language

Perl recently had a major release, where they after 40-odd years actually added conventional function contract declarations.

# Activate features from version 5.36 of Perl, this is to
# maintain backwards-compatibility with older Perl scripts.
use v5.36;

sub f($x, $y) {
  $x + $y
}

This is a very new feature, so you'll mostly run into the old form when reading Perl code out in the wild. It will also take some time for the Perl community at large to start using this feature, so you should be familiar with both styles.

There's also been previous attempts to add parameter checks to Perl, and you can see remnants of these in code-bases like that of the game Frozen Bubble.

sub add_default_rect($) {
    my ($surface) = @_;
    $rects{$surface} = SDL::Rect->new(0,0,$surface->w, $surface->h);
}
sub put_image($$$) {
    my ($image, $x, $y) = @_;
    $image = translate_mini_image($image);
    $rects{$image} or die "please don't call me with no rects\n".backtrace();
    my $drect = SDL::Rect->new($x, $y, $image->w, $image->h);
    SDL::Video::blit_surface($image, $rects{$image}, $app, $drect);
    push @update_rects, $drect;
}
Frozen Bubble is Copyright (c) 2000-2012 The Frozen-Bubble Team, and is free software as defined by the GNU GPL. The full source can be accessed here.

In this unique style of function declaration, the act of defining the names of arguments e.g my ($surface) = @_ , and defining the function contract e.g put_image($$$) for 3 parameters, are written in different places. I'll go as far as calling it an interesting approach, and I'm glad it didn't make it as the official method declaration syntax (Frozen Bubble activates an extension for this.)

The addition of conventional function contract declarations shows a willingness to fix old and deeply rooted mistakes in the design of Perl, which I think bodes well for the language moving forwards.

Call of Cthulhu: Dark Corners of the Implementation

To really get a good grip of the language, I decided to give The Advent of Code a try in Perl.

Iterators

I had some code that broke out of a list-iteration loop, and couldn't figure out why it wasn't working. I then created the following example on a hunch

use v5.36;

my @xs = ("abc", "xyz", "qwerty", "hjkl");

# Loop over xs until we hit "xyz"
# note that `last` is the same as `break` in other languages.
while (my $i = each @xs) {
  say "Loop 1: $xs[$i]";
  last if $xs[$i] eq "xyz";
}

# broken out of the loop

# Run the loop again, this time over all of xs
while (my $i = each @xs) {
  say "Loop 2: $xs[$i]";
}
Loop 1: abc
Loop 1: xyz
Loop 2: qwerty
Loop 2: hjkl

So it turns out, that iterators like each are not lexically scoped to loops, but are instead associated with the objects themselves.

This is… somewhat horrifying, to the point that calling it bug or feature is really up for debate. It definitely feels like an oversight.

Implicit trouble

Here I have a function f, and I'm trying to give it the regular expression /2/ as a parameter f($_, /2/):

use v5.36;

sub f {
  my ($x, $regex) = @_;
  #say "Given regex: $regex";
  $x =~ $regex
}

my @xs = (1,2,3);
foreach (@xs) {
  print "Does '$_' match the regex /2/: ";
  if (f($_, /2/)) { say "yes" }
  else { say "no" }
}
Does '1' match the regex /2/: yes
Does '2' match the regex /2/: no
Does '3' match the regex /2/: no

Ooops, remember our old helpful friend the implicit argument $_? Yeah, I sure didn't when writing this code. This code doesn't pass /2/ to f, it instead matches the regular expression against $_, and passes that boolean result to f.

(1) is then reinterpreted as the regex /1/ and matched against the input.

You have to explicitly quote regexes when passing them as values, so the invocation of f becomes

f($_, qr/2/)

Unicode

Unlike Python, which had its painful 2to3 transition, where the old US/EU-centric everything-is-ascii-or-latin1 legacy was thrown out with the bathwater, Perl 6 fizzled out into its own very niche language called Raku and Perl5 instead received unicode add-on features that need to be switched on.

In Perl today, there's quite a lot of ceremony involved in making everything speak UTF-8.

use v5.36;
use utf8;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);
use charnames qw(:full);
use feature qw(unicode_strings);
use Encode qw(decode);

if (grep /\P{ASCII}/ => @ARGV) {
   @ARGV = map { decode("UTF-8", $_) } @ARGV;
}

binmode(DATA, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
binmode(STDIN, ":utf8");
See unicode - Why does modern Perl avoid UTF-8 by default? - Stack Overflow

Yeah.

Alternatively you can drop some of this by setting the environment variable PERL_UNICODE=AS, but relying on an environment variable for correct unicode handling seems pretty wild to me. Having a shell script associated with every Perl script containing something like this would also be a bit ridiculous.

#!/usr/bin/env sh

export PERL_UNICODE=AS
exec perl "$(basename "$0" .sh).pl"

Now it feels like I'm shipping a Java application.

I think it would be reasonable to expect use utf8 to handle all of this; and to have that enabled by use v5.36 or any other sufficiently high version. Instead one needs to either put that large chunk of text at the top of all scripts, or to rely on use utf8::all, which is an external module, and now you're no longer just able to throw your script onto any Linux or BSD box, you have to mess around with dependency management.

Perl really needs to upstream the external utf8::all module or something like it, and to enable it by default in a future version with use v5.xx.

The Perl ecosystem and CPAN

Perl was once as widespread as PHP or Ruby is today, but now I'd estimate the active user-base to be more comparable to Vala, which is a language you probably haven't even heard of. But unlike Vala, even though you haven't written any Perl, you have heard of it.

And that explosion of usage in the 80s and 90s, followed by the precipitous decline ever since really makes the Perl community feel very empty.

That's not to say you can't find what you need though, and there are even some striking advantages.

The average quality of information related to Perl on the web is far greater than that of Ruby/Python resources. For the more recently popular languages there's far too much noise from low effort growth-hacking blogspam, written primarily for search engines and not people.

As for libraries, Perl was the first PL community to create a massive online library of code for use as building blocks in other projects. Like crates.io for Rust or RubyGems.org for Ruby, CPAN The Comprehensive Perl Archive Network has been serving the Perl community since 1995.

The standard CLI tool for interacting with CPAN, just called cpan is easy to use. For working with TOML files you'd simply execute cpan TOML, and it gets installed globally (more on this later.)

The CPAN client makes a few strange decisions. It outputs a lot of information by default, including all the C-compiler invocations for extensions. It also runs all test-suites for installed libraries, which is something I've never seen a package manager do for user installations. This makes CPAN very slow, as test cases don't tend to be optimized for speed.

A real application

After having used Perl to solve some programming exercises, I decided to reach for it again when a friend of mine asked me for a a software development favor. It was just a simple web-scraping job, and since I was doing this Perl project at the time, I figured it was a good opportunity to write a real Perl program.

I knew Perl had WWW::Mechanize for this task, because of the Ruby and Python modules that were named after it, so I decided to give it a shot with Perl.

I was also able to find WWW::Mailgun for sending emails, and WWW::Mechanize::TreeBuilder for scanning the DOM. The script was simple enough to write, and the documentation for the dependencies was adequate.

Now I needed to deploy this highly critical Perl service in production. I decided to use a systemd timer to run a oneshot docker container that would host the script. It ended up being feasible to create a relatively small alpine container, sitting at a nice 69 MB

FROM alpine:3.16.2

RUN mkdir -p /usr/src/perl
WORKDIR /usr/src/perl

ENV PATH="/opt/perl/bin:${PATH}"
COPY cpanfile /etc/cpanfile
RUN apk update && apk upgrade && apk add --no-cache \
        build-base \
        curl \
        gcc \
        gnupg \
        make \
        openssl \
        openssl-dev \
        tar \
        zlib \
        zlib-dev \
        ca-certificates \
    && rm -rf /var/cache/apk/* \
    && curl -SLO https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz \
    && echo 'e26085af8ac396f62add8a533c3a0ea8c8497d836f0689347ac5abd7b7a4e00a *perl-5.36.0.tar.gz' | sha256sum -c - \
    && tar --strip-components=1 -xzf perl-5.36.0.tar.gz -C /usr/src/perl \
    && rm perl-5.36.0.tar.gz \
    && ./Configure -des \
        -Duse64bitall \
        -Dcccdlflags='-fPIC' \
        -Dccdlflags='-rdynamic' \
        -Dlocincpth=' ' \
        -Duselargefiles \
        -Duseshrplib \
        -Dd_semctl_semun \
        -Dusenm \
        -Dprefix='/opt/perl' \
        -Doptimize='-Os -march=x86-64-v3 -flto' \
    && make libperl.so \
    && make -j$(nproc) \
    && make install \
    && rm -rf /usr/src/perl \
    && curl -o /tmp/cpm -sL --compressed https://raw.githubusercontent.com/skaji/cpm/f437699963027b94951500d1e923937adf08efee/cpm \
    && chmod 755 /tmp/cpm \
    && /tmp/cpm install --cpanfile=/etc/cpanfile --show-build-log-on-failure -g \
    && rm -rf /root/.perl-cpm /tmp/cpm \
    && apk del \
           openssl-dev \
           zlib-dev \
           build-base \
           curl \
           gcc \
           gnupg \
           make \
           openssl \
           tar \
           zlib \
           zlib-dev \
    && find /opt/perl -name "*.h" -exec rm '{}' \;

WORKDIR /opt/perl

COPY varsel.pl /usr/local/bin/varsel
CMD ["perl", "/usr/local/bin/varsel"]

Importantly, many Perl modules require a C-compiler to be present, and many may require external libraries. Our application however does not need these build tools, so they are installed and removed in the same RUN docker stage to not bloat the image (note: since everything is statically linked in Alpine Linux we are able to remove the openssl libs too.)

The Perl ecosystem hasn't settled on any one solution for managing project dependencies, and from what I can tell the conventional way to handle this in Perl deployments has historically been to just kinda wing it and put it in the global Perl module directory with cpan. This isn't just Perls' fault, it's heyday predates a lot of currently accepted best-practices. (App::Virtualenv - Perl virtual environment also exists.)

The default Perl package manager cpan does not include a standard file format for specifying dependencies, but a fairly new package manager for Perl called cpm does.

requires 'WWW::Mechanize', '==2.15';
requires 'WWW::Mechanize::TreeBuilder', '==1.20000';
requires 'WWW::Mailgun', '==0.54';
requires 'LWP::Protocol::https', '==6.10';
requires 'TOML', '==0.97';

Annoyingly, specifying requires 'module', '1.0' does not restrict module to 1.0, or even to 1.x. It actually just implies requires 'module', '>=1.0' . But you can specify a strict requirement with requires 'module', '==1.0'

The cpanfile is then just copied into the container and loaded by the cpm package manager with cpm install --cpanfile=/etc/cpanfile -g .

Kinda fun tho

It turns out, Perl is actually a really good language for recreational programming. It's implicit subject $_, though leaky, sometimes feels very natural. It's supplemental or and and to the || and && operators make for fun control flow without needing parentheses, and in general the abundance of built-ins with overlapping functionality add a lot of flavour.

Perl is big, with many strange corners and odd behaviors, and that kind of space is fun to explore in puzzles.

If I was a project manager, working on a big "Software Engineering" project, I would ban Perl. But for single-author or small-team utilities with a limited scope, I think it works well. Most importantly, I think it's just more fun to write Perl than Python.

It's like the Vim of programming languages. Esoteric, hard to master, organically developed, but it makes boring things more fun. I think the additional cognitive load added because of all the novel ways to express solutions is comparable to a stress ball or fidget toy when working.

If you're just a hacker looking at a problem that a 200 line script can solve, Perl is still a really good option, and damn fun too.