I always wondered why some people write unreadable Perl. The most common reason given seems to be ‘Its faster that way’.
And so… using DTrace, and the extra probes (see the subversion repository with a patched Perl 5.8.8) I added, I thought I’d take a look.
# dtrace -l | grep perl 85614 perl1226 libperl.so Perl_sv_free del_sv 85615 perl1226 libperl.so Perl_sv_replace del_sv 85616 perl1226 libperl.so perl_run main_enter 85617 perl1226 libperl.so perl_parse main_enter 85618 perl1226 libperl.so perl_destruct main_enter 85619 perl1226 libperl.so perl_construct main_enter 85620 perl1226 libperl.so perl_alloc main_enter 85621 perl1226 libperl.so perl_run main_exit 85622 perl1226 libperl.so perl_parse main_exit 85623 perl1226 libperl.so perl_destruct main_exit 85624 perl1226 libperl.so perl_construct main_exit 85625 perl1226 libperl.so perl_alloc main_exit 85626 perl1226 libperl.so Perl_sv_dup new_sv 85627 perl1226 libperl.so Perl_newSVrv new_sv 85628 perl1226 libperl.so Perl_newSVsv new_sv 85629 perl1226 libperl.so Perl_newRV_noinc new_sv 85630 perl1226 libperl.so Perl_newSVuv new_sv 85631 perl1226 libperl.so Perl_newSViv new_sv 85632 perl1226 libperl.so Perl_newSVnv new_sv 85633 perl1226 libperl.so Perl_vnewSVpvf new_sv 85634 perl1226 libperl.so Perl_newSVpvn_share new_sv 85635 perl1226 libperl.so Perl_newSVhek new_sv 85636 perl1226 libperl.so Perl_newSVpvn new_sv 85637 perl1226 libperl.so Perl_newSVpv new_sv 85638 perl1226 libperl.so Perl_sv_newmortal new_sv 85639 perl1226 libperl.so Perl_sv_mortalcopy new_sv 85640 perl1226 libperl.so Perl_newSV new_sv 85641 perl1226 libperl.so Perl_pp_sort sub-entry 85642 perl1226 libperl.so Perl_pp_dbstate sub-entry 85643 perl1226 libperl.so Perl_pp_entersub sub-entry 85644 perl1226 libperl.so Perl_pp_last sub-return 85645 perl1226 libperl.so Perl_pp_return sub-return 85646 perl1226 libperl.so Perl_dounwind sub-return 85647 perl1226 libperl.so Perl_pp_leavesublv sub-return 85648 perl1226 libperl.so Perl_pp_leavesub sub-return
Using these probes, we can write some ‘D’ that tells us what Perl is doing at each of its phases – startup, parsing, execution, and cleanup.
First off, accessing function call parameters:
Given 3 essentially identical programs
#!/usr/local/bin/perl -Tw use strict; my $initial = "there once was a fish. Its feet were small"; my $post = func($initial); print "$post\n"; sub func { $_[0] =~ s/there/There/; return $_[0]; }
#!/usr/local/bin/perl -Tw use strict; my $initial = "there once was a fish. Its feet were small"; my $post = func($initial); print "$post\n"; sub func { my ($val) = @_; $val =~ s/there/There/; return $val; }
#!/usr/local/bin/perl -Tw use strict; my $initial = "there once was a fish. Its feet were small"; my $post = func($initial); print "$post\n"; sub func { my $val = shift; $val =~ s/there/There/; return $val; }
There is a myth that using $_[0] is faster, as it doesn’t create a temporary variable…
Dtrace (using the general perl stats gathering dtrace script) shows this to be untrue:
== call1.pl ========================================================== perl*::perl_alloc:main_enter perl*::perl_alloc:main_exit, (0/0) (53119 nS) perl*::perl_construct:main_enter perl*::perl_construct:main_exit, (12/0) (564370 nS) perl*::perl_parse:main_enter --> BEGIN, ./call1.pl --> bits, /usr/local/lib/perl5/5.8.8/strict.pm <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (48060 nS) --> import, /usr/local/lib/perl5/5.8.8/strict.pm <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (15398 nS) <-- BEGIN, ./call1.pl (160/80) (1025874 nS) perl*::perl_parse:main_exit, (299/42) (2856399 nS) perl*::perl_run:main_enter --> func, ./call1.pl <-- func, ./call1.pl (1/0) (47723 nS) perl*::perl_run:main_exit, (0/1) (265677 nS) perl*::perl_destruct:main_enter perl*::perl_destruct:main_exit, (0/2) (20763 nS) total, total (0/0) (3789064 nS) == call2.pl ========================================================== perl*::perl_alloc:main_enter perl*::perl_alloc:main_exit, (0/0) (53251 nS) perl*::perl_construct:main_enter perl*::perl_construct:main_exit, (12/0) (509684 nS) perl*::perl_parse:main_enter --> BEGIN, ./call2.pl --> bits, /usr/local/lib/perl5/5.8.8/strict.pm <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (36748 nS) --> import, /usr/local/lib/perl5/5.8.8/strict.pm <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (9797 nS) <-- BEGIN, ./call2.pl (160/80) (924250 nS) perl*::perl_parse:main_exit, (299/38) (2545953 nS) perl*::perl_run:main_enter --> func, ./call2.pl <-- func, ./call2.pl (1/0) (42165 nS) perl*::perl_run:main_exit, (0/1) (142393 nS) perl*::perl_destruct:main_enter perl*::perl_destruct:main_exit, (0/2) (20851 nS) total, total (0/0) (3301007 nS) == call3.pl ========================================================== perl*::perl_alloc:main_enter perl*::perl_alloc:main_exit, (0/0) (52927 nS) perl*::perl_construct:main_enter perl*::perl_construct:main_exit, (12/0) (607783 nS) perl*::perl_parse:main_enter --> BEGIN, ./call3.pl --> bits, /usr/local/lib/perl5/5.8.8/strict.pm <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (37066 nS) --> import, /usr/local/lib/perl5/5.8.8/strict.pm <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (10171 nS) <-- BEGIN, ./call3.pl (160/80) (924824 nS) perl*::perl_parse:main_exit, (297/37) (2543981 nS) perl*::perl_run:main_enter --> func, ./call3.pl <-- func, ./call3.pl (1/0) (41833 nS) perl*::perl_run:main_exit, (0/1) (140527 nS) perl*::perl_destruct:main_enter perl*::perl_destruct:main_exit, (0/2) (20273 nS) total, total (0/0) (3395310 nS) allocations / deallocations: 474 / 122 call3.pl 476 / 123 call2.pl 476 / 127 call1.pl
Counting up the number of allocations and deallocations in the (0/1) output – and
“<– func, ./call2.pl (1/0) ” is always the same… one allocation.
After all the test runs, I also print out the total allocations for the script,
and it seems that the “my $val = shift” version is the most efficient –
using two fewer allocations (apparently during the parse phase).
The deallocation count is interesting too – with “$[0]” using 5 more deallocations during
the parse phase and “my ($val) = @;” using one more than the “my $val = shift” option.
In an attempt to reduce the allocations doesn’t seem to help – the following code resulting in 474 allocations,
shift case, but with 3 extra deallocations, again in the parsing phase. Increasing the number of times that func
is called only increases the benefits of using shift.
#!/usr/local/bin/perl -Tw use strict; my $initial = "there once was a fish. Its feet were small"; $_ = $initial; my $post = func(); print "$post\n"; sub func { s/there/There/; return $_; }
Interestingly, “my $val = shift” is not only the fastest of the conventions tested, but it also seems that none of the conventions tested cause allocations at run time – they are all done during the parse phase. I guess I’ll have to construct a more complex case, using references / hashes – next time 🙂