DTrace shows that readable Perl code is fastest.

I always wondered why some people write unreadable Perl. The most common reason given seems to be ‘Its faster that way’.

And so… using DTrace, and the extra probes (see the subversion repository with a patched Perl 5.8.8) I added, I thought I’d take a look.

# dtrace -l | grep perl
85614   perl1226        libperl.so                      Perl_sv_free del_sv
85615   perl1226        libperl.so                   Perl_sv_replace del_sv
85616   perl1226        libperl.so                          perl_run main_enter
85617   perl1226        libperl.so                        perl_parse main_enter
85618   perl1226        libperl.so                     perl_destruct main_enter
85619   perl1226        libperl.so                    perl_construct main_enter
85620   perl1226        libperl.so                        perl_alloc main_enter
85621   perl1226        libperl.so                          perl_run main_exit
85622   perl1226        libperl.so                        perl_parse main_exit
85623   perl1226        libperl.so                     perl_destruct main_exit
85624   perl1226        libperl.so                    perl_construct main_exit
85625   perl1226        libperl.so                        perl_alloc main_exit
85626   perl1226        libperl.so                       Perl_sv_dup new_sv
85627   perl1226        libperl.so                      Perl_newSVrv new_sv
85628   perl1226        libperl.so                      Perl_newSVsv new_sv
85629   perl1226        libperl.so                  Perl_newRV_noinc new_sv
85630   perl1226        libperl.so                      Perl_newSVuv new_sv
85631   perl1226        libperl.so                      Perl_newSViv new_sv
85632   perl1226        libperl.so                      Perl_newSVnv new_sv
85633   perl1226        libperl.so                    Perl_vnewSVpvf new_sv
85634   perl1226        libperl.so               Perl_newSVpvn_share new_sv
85635   perl1226        libperl.so                     Perl_newSVhek new_sv
85636   perl1226        libperl.so                     Perl_newSVpvn new_sv
85637   perl1226        libperl.so                      Perl_newSVpv new_sv
85638   perl1226        libperl.so                 Perl_sv_newmortal new_sv
85639   perl1226        libperl.so                Perl_sv_mortalcopy new_sv
85640   perl1226        libperl.so                        Perl_newSV new_sv
85641   perl1226        libperl.so                      Perl_pp_sort sub-entry
85642   perl1226        libperl.so                   Perl_pp_dbstate sub-entry
85643   perl1226        libperl.so                  Perl_pp_entersub sub-entry
85644   perl1226        libperl.so                      Perl_pp_last sub-return
85645   perl1226        libperl.so                    Perl_pp_return sub-return
85646   perl1226        libperl.so                     Perl_dounwind sub-return
85647   perl1226        libperl.so                Perl_pp_leavesublv sub-return
85648   perl1226        libperl.so                  Perl_pp_leavesub sub-return

Using these probes, we can write some ‘D’ that tells us what Perl is doing at each of its phases – startup, parsing, execution, and cleanup.

First off, accessing function call parameters:

Given 3 essentially identical programs

#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";

sub func {
    $_[0] =~ s/there/There/;
    return $_[0];
}
#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";

sub func {
    my ($val) = @_;
    $val =~ s/there/There/;
    return $val;
}
#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";

sub func {
    my $val = shift;
    $val =~ s/there/There/;
    return $val;
}

There is a myth that using $_[0] is faster, as it doesn’t create a temporary variable…
Dtrace (using the general perl stats gathering dtrace script) shows this to be untrue:

== call1.pl ==========================================================
  perl*::perl_alloc:main_enter
  perl*::perl_alloc:main_exit,  (0/0) (53119 nS)
  perl*::perl_construct:main_enter
  perl*::perl_construct:main_exit,  (12/0) (564370 nS)
  perl*::perl_parse:main_enter
   --> BEGIN, ./call1.pl
    --> bits, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (48060 nS)
    --> import, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (15398 nS)
   <-- BEGIN, ./call1.pl (160/80) (1025874 nS)
  perl*::perl_parse:main_exit,  (299/42) (2856399 nS)
  perl*::perl_run:main_enter
   --> func, ./call1.pl
   <-- func, ./call1.pl (1/0) (47723 nS)
  perl*::perl_run:main_exit,  (0/1) (265677 nS)
  perl*::perl_destruct:main_enter
  perl*::perl_destruct:main_exit,  (0/2) (20763 nS)
total, total (0/0) (3789064 nS)
== call2.pl ==========================================================
  perl*::perl_alloc:main_enter
  perl*::perl_alloc:main_exit,  (0/0) (53251 nS)
  perl*::perl_construct:main_enter
  perl*::perl_construct:main_exit,  (12/0) (509684 nS)
  perl*::perl_parse:main_enter
   --> BEGIN, ./call2.pl
    --> bits, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (36748 nS)
    --> import, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (9797 nS)
   <-- BEGIN, ./call2.pl (160/80) (924250 nS)
  perl*::perl_parse:main_exit,  (299/38) (2545953 nS)
  perl*::perl_run:main_enter
   --> func, ./call2.pl
   <-- func, ./call2.pl (1/0) (42165 nS)
  perl*::perl_run:main_exit,  (0/1) (142393 nS)
  perl*::perl_destruct:main_enter
  perl*::perl_destruct:main_exit,  (0/2) (20851 nS)
total, total (0/0) (3301007 nS)
== call3.pl ==========================================================
  perl*::perl_alloc:main_enter
  perl*::perl_alloc:main_exit,  (0/0) (52927 nS)
  perl*::perl_construct:main_enter
  perl*::perl_construct:main_exit,  (12/0) (607783 nS)
  perl*::perl_parse:main_enter
   --> BEGIN, ./call3.pl
    --> bits, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (37066 nS)
    --> import, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (10171 nS)
   <-- BEGIN, ./call3.pl (160/80) (924824 nS)
  perl*::perl_parse:main_exit,  (297/37) (2543981 nS)
  perl*::perl_run:main_enter
   --> func, ./call3.pl
   <-- func, ./call3.pl (1/0) (41833 nS)
  perl*::perl_run:main_exit,  (0/1) (140527 nS)
  perl*::perl_destruct:main_enter
  perl*::perl_destruct:main_exit,  (0/2) (20273 nS)
total, total (0/0) (3395310 nS)

allocations / deallocations:
     474 /      122 call3.pl
     476 /      123 call2.pl
     476 /      127 call1.pl

Counting up the number of allocations and deallocations in the (0/1) output – and
“<– func, ./call2.pl (1/0) ” is always the same… one allocation.

After all the test runs, I also print out the total allocations for the script,
and it seems that the “my $val = shift” version is the most efficient –
using two fewer allocations (apparently during the parse phase).

The deallocation count is interesting too – with “$[0]” using 5 more deallocations during
the parse phase and “my ($val) = @
;” using one more than the “my $val = shift” option.

In an attempt to reduce the allocations doesn’t seem to help – the following code resulting in 474 allocations,
shift case, but with 3 extra deallocations, again in the parsing phase. Increasing the number of times that func
is called only increases the benefits of using shift.

#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
$_ = $initial;
my $post = func();
print "$post\n";

sub func {
    s/there/There/;
    return $_;
}

Interestingly, “my $val = shift” is not only the fastest of the conventions tested, but it also seems that none of the conventions tested cause allocations at run time – they are all done during the parse phase. I guess I’ll have to construct a more complex case, using references / hashes – next time 🙂