I always wondered why some people write unreadable Perl. The most common reason given seems to be ‘Its faster that way’.
And so… using DTrace, and the extra probes (see the subversion repository with a patched Perl 5.8.8) I added, I thought I’d take a look.
# dtrace -l | grep perl
85614 perl1226 libperl.so Perl_sv_free del_sv
85615 perl1226 libperl.so Perl_sv_replace del_sv
85616 perl1226 libperl.so perl_run main_enter
85617 perl1226 libperl.so perl_parse main_enter
85618 perl1226 libperl.so perl_destruct main_enter
85619 perl1226 libperl.so perl_construct main_enter
85620 perl1226 libperl.so perl_alloc main_enter
85621 perl1226 libperl.so perl_run main_exit
85622 perl1226 libperl.so perl_parse main_exit
85623 perl1226 libperl.so perl_destruct main_exit
85624 perl1226 libperl.so perl_construct main_exit
85625 perl1226 libperl.so perl_alloc main_exit
85626 perl1226 libperl.so Perl_sv_dup new_sv
85627 perl1226 libperl.so Perl_newSVrv new_sv
85628 perl1226 libperl.so Perl_newSVsv new_sv
85629 perl1226 libperl.so Perl_newRV_noinc new_sv
85630 perl1226 libperl.so Perl_newSVuv new_sv
85631 perl1226 libperl.so Perl_newSViv new_sv
85632 perl1226 libperl.so Perl_newSVnv new_sv
85633 perl1226 libperl.so Perl_vnewSVpvf new_sv
85634 perl1226 libperl.so Perl_newSVpvn_share new_sv
85635 perl1226 libperl.so Perl_newSVhek new_sv
85636 perl1226 libperl.so Perl_newSVpvn new_sv
85637 perl1226 libperl.so Perl_newSVpv new_sv
85638 perl1226 libperl.so Perl_sv_newmortal new_sv
85639 perl1226 libperl.so Perl_sv_mortalcopy new_sv
85640 perl1226 libperl.so Perl_newSV new_sv
85641 perl1226 libperl.so Perl_pp_sort sub-entry
85642 perl1226 libperl.so Perl_pp_dbstate sub-entry
85643 perl1226 libperl.so Perl_pp_entersub sub-entry
85644 perl1226 libperl.so Perl_pp_last sub-return
85645 perl1226 libperl.so Perl_pp_return sub-return
85646 perl1226 libperl.so Perl_dounwind sub-return
85647 perl1226 libperl.so Perl_pp_leavesublv sub-return
85648 perl1226 libperl.so Perl_pp_leavesub sub-return
Using these probes, we can write some ‘D’ that tells us what Perl is doing at each of its phases – startup, parsing, execution, and cleanup.
First off, accessing function call parameters:
Given 3 essentially identical programs
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";
sub func {
$_[0] =~ s/there/There/;
return $_[0];
}
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";
sub func {
my ($val) = @_;
$val =~ s/there/There/;
return $val;
}
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";
sub func {
my $val = shift;
$val =~ s/there/There/;
return $val;
}
There is a myth that using $_[0] is faster, as it doesn’t create a temporary variable…
Dtrace (using the general perl stats gathering dtrace script) shows this to be untrue:
== call1.pl ==========================================================
perl*::perl_alloc:main_enter
perl*::perl_alloc:main_exit, (0/0) (53119 nS)
perl*::perl_construct:main_enter
perl*::perl_construct:main_exit, (12/0) (564370 nS)
perl*::perl_parse:main_enter
--> BEGIN, ./call1.pl
--> bits, /usr/local/lib/perl5/5.8.8/strict.pm
<-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (48060 nS)
--> import, /usr/local/lib/perl5/5.8.8/strict.pm
<-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (15398 nS)
<-- BEGIN, ./call1.pl (160/80) (1025874 nS)
perl*::perl_parse:main_exit, (299/42) (2856399 nS)
perl*::perl_run:main_enter
--> func, ./call1.pl
<-- func, ./call1.pl (1/0) (47723 nS)
perl*::perl_run:main_exit, (0/1) (265677 nS)
perl*::perl_destruct:main_enter
perl*::perl_destruct:main_exit, (0/2) (20763 nS)
total, total (0/0) (3789064 nS)
== call2.pl ==========================================================
perl*::perl_alloc:main_enter
perl*::perl_alloc:main_exit, (0/0) (53251 nS)
perl*::perl_construct:main_enter
perl*::perl_construct:main_exit, (12/0) (509684 nS)
perl*::perl_parse:main_enter
--> BEGIN, ./call2.pl
--> bits, /usr/local/lib/perl5/5.8.8/strict.pm
<-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (36748 nS)
--> import, /usr/local/lib/perl5/5.8.8/strict.pm
<-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (9797 nS)
<-- BEGIN, ./call2.pl (160/80) (924250 nS)
perl*::perl_parse:main_exit, (299/38) (2545953 nS)
perl*::perl_run:main_enter
--> func, ./call2.pl
<-- func, ./call2.pl (1/0) (42165 nS)
perl*::perl_run:main_exit, (0/1) (142393 nS)
perl*::perl_destruct:main_enter
perl*::perl_destruct:main_exit, (0/2) (20851 nS)
total, total (0/0) (3301007 nS)
== call3.pl ==========================================================
perl*::perl_alloc:main_enter
perl*::perl_alloc:main_exit, (0/0) (52927 nS)
perl*::perl_construct:main_enter
perl*::perl_construct:main_exit, (12/0) (607783 nS)
perl*::perl_parse:main_enter
--> BEGIN, ./call3.pl
--> bits, /usr/local/lib/perl5/5.8.8/strict.pm
<-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (37066 nS)
--> import, /usr/local/lib/perl5/5.8.8/strict.pm
<-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (10171 nS)
<-- BEGIN, ./call3.pl (160/80) (924824 nS)
perl*::perl_parse:main_exit, (297/37) (2543981 nS)
perl*::perl_run:main_enter
--> func, ./call3.pl
<-- func, ./call3.pl (1/0) (41833 nS)
perl*::perl_run:main_exit, (0/1) (140527 nS)
perl*::perl_destruct:main_enter
perl*::perl_destruct:main_exit, (0/2) (20273 nS)
total, total (0/0) (3395310 nS)
allocations / deallocations:
474 / 122 call3.pl
476 / 123 call2.pl
476 / 127 call1.pl
Counting up the number of allocations and deallocations in the (0/1) output – and
“<– func, ./call2.pl (1/0) ” is always the same… one allocation.
After all the test runs, I also print out the total allocations for the script,
and it seems that the “my $val = shift” version is the most efficient –
using two fewer allocations (apparently during the parse phase).
The deallocation count is interesting too – with “$[0]” using 5 more deallocations during
the parse phase and “my ($val) = @;” using one more than the “my $val = shift” option.
In an attempt to reduce the allocations doesn’t seem to help – the following code resulting in 474 allocations,
shift case, but with 3 extra deallocations, again in the parsing phase. Increasing the number of times that func
is called only increases the benefits of using shift.
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
$_ = $initial;
my $post = func();
print "$post\n";
sub func {
s/there/There/;
return $_;
}
Interestingly, “my $val = shift” is not only the fastest of the conventions tested, but it also seems that none of the conventions tested cause allocations at run time – they are all done during the parse phase. I guess I’ll have to construct a more complex case, using references / hashes – next time π