Giving Perl a Chance
Table of Contents
Me and the Perl programming language have had a few close encounters in the past. I was born in the late 90s, and got into programming sometime around 2010, and at the time Perl still used to show up as a serious contender when googling "how to learn programming" on Google.
I don't remember why I settled on Python in the end. And I don't remember why– more than a decade later– I decided to give Perl a whirl again, but here we are. What follows is a personal account of my descent into the bizarre, arcane, and often entertaining world of Perl programming.
I remember the thing that always stuck out to me like a sore thumb about Perl
was the strange mechanism used for accepting arguments. The my (...) = @_
pattern here is simply a convention, it unpacks the implicit array variable @_
into the "parameters" $x
, $y
, and $z
.
sub f {
my ($x, $y, $z) = @_;
$x + $y * $z
}
You see, in Perl, instead of explicitly declaring the input parameters of a
function, you instead just always get all the parameters as an array called _
(handy!)
Also, Perl, unlike Python, doesn't throw an error when unpacking an array of \( m \)
elements into \( n \neq m \) parameters, so this example accepts any number of
arguments. The error– if one is even produced– appears later down the line when
e.g $y = undef
is used erroneously.
But hang on, what is an implicit variable? I'll answer that question with yet another lucid Perl program.
sub g {
my $arg = shift;
my $arg_2 = shift;
$arg - $arg_2
}
print "result: " . g(5, 3);
result: 2
shift
pops from the front of a vector. But which vector? Well, it takes an
argument of course, but that argument can be implicit.
First, a note on the @
character used in @_
, this is called a sigil.
my @vec = (1, 2, 3);
Here we declare a list called vec
. Perl has a separate namespace for lists (@
),
maps (%
), scalars ($
), and functions/subroutines (&
). We need to prefix
our variables with the appropriate sigil, like @
to tell the Perl
interpreter which namespace we are referring to. A $name
and a @name
can
exists simultaneously.
Now we can pass vec
to shift
, and put that into the scalar x
.
my $x = shift @vec;
$x == 1
But. If given no arguments, it will supplement using the implicit argument,
called _
. And because shift
expects a list variable, it will be given @_
.
Other functions may be given $_
implicitly if they expect a scalar like a
number or a string.
Ruby inherited some of these Perlisms, though today Ruby linters will yell at you for using them, they are still commonplace and indeed considered idiomatic in Perl.
Implicit variables seem like a really bad idea, but there's something about them that makes code flow nicely when used appropriately (well, under at least one definition of appropriate.)
/^DEBUG/ and print while (<>)
To the uninitiated, this means
# The construct <FILE> reads a line from FILE, omitting FILE means to read
# from STDIN.
while (my $line = <STDIN>) {
if ($line =~ /^DEBUG/) {
print $line;
}
}
Using the implicit parameter $_
instead, we get:
# The special construct while (<>) will set $_ to the next line of input until EOF
while (<>) {
# Equivalent to $_ =~ /^DEBUG/
if (/^DEBUG/) {
print;
}
}
Then we can use and
in place of if
, and use a postfix while
keyword to get
back to our original example.
while (<>) { /^DEBUG/ and print }
# equivalent to
/^DEBUG/ and print while (<>)
The implicit variable also exists in AWK, one of the languages that Perl took inspiration from. In AWK, the last example would simply be written as
/^DEBUG/
Because AWK is a DSL for solving problems related to filtering text files
line-by-line, everything is already wrapped in a while (<>) { ... }
, regular
expressions are always matched with the current line ($0
) by default, and a
statement being true with no other context implies to print the current line.
Indeed perl
has command-line flags to operate more like AWK, for this kind of
line-oriented filtering.
perl -ne "print if /^DEBUG/"
Still an evolving language
Perl recently had a major release, where they after 40-odd years actually added conventional function contract declarations.
# Activate features from version 5.36 of Perl, this is to
# maintain backwards-compatibility with older Perl scripts.
use v5.36;
sub f($x, $y) {
$x + $y
}
This is a very new feature, so you'll mostly run into the old form when reading Perl code out in the wild. It will also take some time for the Perl community at large to start using this feature, so you should be familiar with both styles.
There's also been previous attempts to add parameter checks to Perl, and you can see remnants of these in code-bases like that of the game Frozen Bubble.
sub add_default_rect($) {
my ($surface) = @_;
$rects{$surface} = SDL::Rect->new(0,0,$surface->w, $surface->h);
}
sub put_image($$$) {
my ($image, $x, $y) = @_;
$image = translate_mini_image($image);
$rects{$image} or die "please don't call me with no rects\n".backtrace();
my $drect = SDL::Rect->new($x, $y, $image->w, $image->h);
SDL::Video::blit_surface($image, $rects{$image}, $app, $drect);
push @update_rects, $drect;
}
In this unique style of function declaration, the act of defining the names of
arguments e.g my ($surface) = @_
, and defining the function contract e.g
put_image($$$)
for 3 parameters, are written in different places. I'll go as far as
calling it an interesting approach, and I'm glad it didn't make it as the
official method declaration syntax (Frozen Bubble activates an extension for
this.)
The addition of conventional function contract declarations shows a willingness to fix old and deeply rooted mistakes in the design of Perl, which I think bodes well for the language moving forwards.
Call of Cthulhu: Dark Corners of the Implementation
To really get a good grip of the language, I decided to give The Advent of Code a try in Perl.
Iterators
I had some code that broke out of a list-iteration loop, and couldn't figure out why it wasn't working. I then created the following example on a hunch
use v5.36;
my @xs = ("abc", "xyz", "qwerty", "hjkl");
# Loop over xs until we hit "xyz"
# note that `last` is the same as `break` in other languages.
while (my $i = each @xs) {
say "Loop 1: $xs[$i]";
last if $xs[$i] eq "xyz";
}
# broken out of the loop
# Run the loop again, this time over all of xs
while (my $i = each @xs) {
say "Loop 2: $xs[$i]";
}
Loop 1: abc Loop 1: xyz Loop 2: qwerty Loop 2: hjkl
So it turns out, that iterators like each
are not lexically scoped to loops,
but are instead associated with the objects themselves.
This is… somewhat horrifying, to the point that calling it bug or feature is really up for debate. It definitely feels like an oversight.
Implicit trouble
Here I have a function f
, and I'm trying to give it the regular expression
/2/
as a parameter f($_, /2/)
:
use v5.36;
sub f {
my ($x, $regex) = @_;
#say "Given regex: $regex";
$x =~ $regex
}
my @xs = (1,2,3);
foreach (@xs) {
print "Does '$_' match the regex /2/: ";
if (f($_, /2/)) { say "yes" }
else { say "no" }
}
Does '1' match the regex /2/: yes Does '2' match the regex /2/: no Does '3' match the regex /2/: no
Ooops, remember our old helpful friend the implicit argument $_
? Yeah, I sure
didn't when writing this code. This code doesn't pass /2/
to f
, it instead
matches the regular expression against $_
, and passes that boolean result to
f
.
(1) is then reinterpreted as the regex /1/
and matched
against the input.
You have to explicitly quote regexes when passing them as values, so the
invocation of f
becomes
f($_, qr/2/)
Unicode
Unlike Python, which had its painful 2to3
transition, where the old
US/EU-centric everything-is-ascii-or-latin1 legacy was thrown out with the
bathwater, Perl 6 fizzled out into its own very niche language called Raku and
Perl5 instead received unicode add-on features that need to be switched on.
In Perl today, there's quite a lot of ceremony involved in making everything speak UTF-8.
Yeah.
Alternatively you can drop some of this by setting the environment variable
PERL_UNICODE=AS
, but relying on an environment variable for correct unicode
handling seems pretty wild to me. Having a shell script associated with every
Perl script containing something like this would also be a bit ridiculous.
#!/usr/bin/env sh
export PERL_UNICODE=AS
exec perl "$(basename "$0" .sh).pl"
Now it feels like I'm shipping a Java application.
I think it would be reasonable to expect use utf8
to handle all of this; and
to have that enabled by use v5.36
or any other sufficiently high version.
Instead one needs to either put that large chunk of text at the top of all
scripts, or to rely on use utf8::all
, which is an external module, and now
you're no longer just able to throw your script onto any Linux or BSD box, you
have to mess around with dependency management.
Perl really needs to upstream the external utf8::all
module or something like
it, and to enable it by default in a future version with use v5.xx
.
The Perl ecosystem and CPAN
Perl was once as widespread as PHP or Ruby is today, but now I'd estimate the active user-base to be more comparable to Vala, which is a language you probably haven't even heard of. But unlike Vala, even though you haven't written any Perl, you have heard of it.
And that explosion of usage in the 80s and 90s, followed by the precipitous decline ever since really makes the Perl community feel very empty.
That's not to say you can't find what you need though, and there are even some striking advantages.
The average quality of information related to Perl on the web is far greater than that of Ruby/Python resources. For the more recently popular languages there's far too much noise from low effort growth-hacking blogspam, written primarily for search engines and not people.
As for libraries, Perl was the first PL community to create a massive online library of code for use as building blocks in other projects. Like crates.io for Rust or RubyGems.org for Ruby, CPAN The Comprehensive Perl Archive Network has been serving the Perl community since 1995.
The standard CLI tool for interacting with CPAN, just called cpan
is easy to
use. For working with TOML files you'd simply execute cpan TOML
, and it gets
installed globally (more on this later.)
The CPAN client makes a few strange decisions. It outputs a lot of information by default, including all the C-compiler invocations for extensions. It also runs all test-suites for installed libraries, which is something I've never seen a package manager do for user installations. This makes CPAN very slow, as test cases don't tend to be optimized for speed.
A real application
After having used Perl to solve some programming exercises, I decided to reach for it again when a friend of mine asked me for a a software development favor. It was just a simple web-scraping job, and since I was doing this Perl project at the time, I figured it was a good opportunity to write a real Perl program.
I knew Perl had WWW::Mechanize
for this task, because of the Ruby and Python
modules that were named after it, so I decided to give it a shot with Perl.
I was also able to find WWW::Mailgun
for sending emails, and
WWW::Mechanize::TreeBuilder
for scanning the DOM. The script was simple enough
to write, and the documentation for the dependencies was adequate.
Now I needed to deploy this highly critical Perl service in production. I decided to use a systemd timer to run a oneshot docker container that would host the script. It ended up being feasible to create a relatively small alpine container, sitting at a nice 69 MB
FROM alpine:3.16.2
RUN mkdir -p /usr/src/perl
WORKDIR /usr/src/perl
ENV PATH="/opt/perl/bin:${PATH}"
COPY cpanfile /etc/cpanfile
RUN apk update && apk upgrade && apk add --no-cache \
build-base \
curl \
gcc \
gnupg \
make \
openssl \
openssl-dev \
tar \
zlib \
zlib-dev \
ca-certificates \
&& rm -rf /var/cache/apk/* \
&& curl -SLO https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz \
&& echo 'e26085af8ac396f62add8a533c3a0ea8c8497d836f0689347ac5abd7b7a4e00a *perl-5.36.0.tar.gz' | sha256sum -c - \
&& tar --strip-components=1 -xzf perl-5.36.0.tar.gz -C /usr/src/perl \
&& rm perl-5.36.0.tar.gz \
&& ./Configure -des \
-Duse64bitall \
-Dcccdlflags='-fPIC' \
-Dccdlflags='-rdynamic' \
-Dlocincpth=' ' \
-Duselargefiles \
-Duseshrplib \
-Dd_semctl_semun \
-Dusenm \
-Dprefix='/opt/perl' \
-Doptimize='-Os -march=x86-64-v3 -flto' \
&& make libperl.so \
&& make -j$(nproc) \
&& make install \
&& rm -rf /usr/src/perl \
&& curl -o /tmp/cpm -sL --compressed https://raw.githubusercontent.com/skaji/cpm/f437699963027b94951500d1e923937adf08efee/cpm \
&& chmod 755 /tmp/cpm \
&& /tmp/cpm install --cpanfile=/etc/cpanfile --show-build-log-on-failure -g \
&& rm -rf /root/.perl-cpm /tmp/cpm \
&& apk del \
openssl-dev \
zlib-dev \
build-base \
curl \
gcc \
gnupg \
make \
openssl \
tar \
zlib \
zlib-dev \
&& find /opt/perl -name "*.h" -exec rm '{}' \;
WORKDIR /opt/perl
COPY varsel.pl /usr/local/bin/varsel
CMD ["perl", "/usr/local/bin/varsel"]
Importantly, many Perl modules require a C-compiler to be present, and many may
require external libraries. Our application however does not need these build
tools, so they are installed and removed in the same RUN
docker stage to not
bloat the image (note: since everything is statically linked in Alpine Linux we
are able to remove the openssl libs too.)
The Perl ecosystem hasn't settled on any one solution for managing project
dependencies, and from what I can tell the conventional way to handle this in
Perl deployments has historically been to just kinda wing it and put it in the
global Perl module directory with cpan
. This isn't just Perls' fault, it's
heyday predates a lot of currently accepted best-practices. (App::Virtualenv -
Perl virtual environment also exists.)
The default Perl package manager cpan
does not include a standard file format
for specifying dependencies, but a fairly new package manager for Perl called
cpm does.
requires 'WWW::Mechanize', '==2.15';
requires 'WWW::Mechanize::TreeBuilder', '==1.20000';
requires 'WWW::Mailgun', '==0.54';
requires 'LWP::Protocol::https', '==6.10';
requires 'TOML', '==0.97';
Annoyingly, specifying requires 'module', '1.0'
does not restrict module
to 1.0
, or even to 1.x
. It actually just implies requires 'module',
'>=1.0'
. But you can specify a strict requirement with requires 'module',
'==1.0'
The cpanfile
is then just copied into the container and loaded by the cpm
package manager with cpm install --cpanfile=/etc/cpanfile -g
.
Kinda fun tho
It turns out, Perl is actually a really good language for recreational
programming. It's implicit subject $_
, though leaky, sometimes feels very
natural. It's supplemental or
and and
to the ||
and &&
operators make
for fun control flow without needing parentheses, and in general the abundance
of built-ins with overlapping functionality add a lot of flavour.
Perl is big, with many strange corners and odd behaviors, and that kind of space is fun to explore in puzzles.
If I was a project manager, working on a big "Software Engineering" project, I would ban Perl. But for single-author or small-team utilities with a limited scope, I think it works well. Most importantly, I think it's just more fun to write Perl than Python.
It's like the Vim of programming languages. Esoteric, hard to master, organically developed, but it makes boring things more fun. I think the additional cognitive load added because of all the novel ways to express solutions is comparable to a stress ball or fidget toy when working.
If you're just a hacker looking at a problem that a 200 line script can solve, Perl is still a really good option, and damn fun too.