Closing the first DTrace loop
I ended up spending what time I had for a couple of days at LCA optimizing Puppet's lexer. As I mentioned in my first DTrace post, I'm mostly just trying to use existing scripts for now and worry about understanding the darn thing later.
It turns out that the Swiss Army Knife of available scripts for Ruby is the rb_calltime.d script in the DTraceToolkit. This one gives us pretty much everything we might want to know in a first pass of debugging a Ruby program, including (most importantly for me) the inclusive and exclusive elapsed times for each method in the system (DTrace insists upon calling them functions, but it's Ruby, so we know better).
Unfortunately, I'd already hacked out most of the optimization before I discovered this script, and it doesn't seem to want to run at the moment, so all I can do is show the current data. Here are the methods that take the most time inclusively (meaning that it counts the time from entry to exit, and thus just tells us where the time is being spent in the overall program, not where the actual problems might lie):
lexer.rb func Puppet::Parser::Lexer::munge_token 12062622 lexer.rb func Puppet::Parser::Lexer::TokenList::lookup 12410291 branch.rb func Puppet::Parser::AST::Branch::initialize 19084001 methodhelper.rb func Hash::each 19144097 methodhelper.rb func Object::set_options 20712067 ast.rb func Puppet::Parser::AST::initialize 22431107 lexer.rb func Array::each 23868735 parser_support.rb func Class::new 28520157 lexer.rb func Puppet::Parser::Lexer::find_string_token 29610463 lexer.rb func Puppet::Parser::Lexer::find_regex_token 29844180 parser_support.rb func Puppet::Parser::Parser::ast 36132924 lexer.rb func Puppet::Parser::Lexer::find_token 44581509 lexer.rb func Object::catch 46187059
And here are the exclusive times (i.e., only counting the time spent in each method, not the time between entry and exit):
ast.rb func Puppet::Parser::AST::initialize 1719039 lexer.rb func Puppet::Parser::Lexer::Token::convert 1920189 branch.rb func Puppet::Parser::AST::Branch::initialize 2062436 lexer.rb func Object::catch 2262789 lexer.rb func StringScanner::match? 2471109 parser_support.rb func Class::new 3112051 lexer.rb func Puppet::Parser::Lexer::skip 3282333 lexer.rb func Puppet::Parser::Lexer::find_token 4930508 lexer.rb func Puppet::Parser::Lexer::munge_token 5232851 lexer.rb func Puppet::Parser::Lexer::find_regex_token 5572052 lexer.rb func Puppet::Parser::Lexer::TokenList::lookup 6659161 parser_support.rb func Puppet::Parser::Parser::ast 7144763 lexer.rb func Hash::[] 7179650 methodhelper.rb func Hash::each 14954253 lexer.rb func Puppet::Parser::Lexer::find_string_token 17045065 lexer.rb func Array::each 20768233 - total - 132806477
Given this data, we're spending about 1/7th of the total parsing time just in the find_string_token method, which currently looks like this:
def find_string_token
matched_token = value = nil
# We know our longest string token is three chars, so try each size in turn
# until we either match or run out of chars. This way our worst-case is three
# tries, where it is otherwise the number of string chars we have. Also,
# the lookups are optimized hash lookups, instead of regex scans.
[3, 2, 1].each do |i|
str = @scanner.peek(i)
if matched_token = TOKENS.lookup(str)
value = @scanner.scan(matched_token.regex)
break
end
end
return matched_token, value
end
The method is responsible for determining if the next token is a simple string-based token. I've optimized it by taking the fact that the longest string-based tokens are three characters (the <<| and |>> tokens), so I look for three character matches, then two, then one. If I don't get a match by then, then we don't have a match. I could probably optimize further, since these three character tokens are pretty darn rare, especially compared to the one character tokens, but I'd need to hard-code a lot more knowledge about the token list, and really, this iteration should be delimited by an automatic determination of the longest token, rather than hard-coding it.
This really isn't a very good write-up of what I did with DTrace or how it was helpful, other than showing the interesting differences between the exclusive and inclusive data, and letting you know that the rb_calltimes.d script is the one to start with, but hey, that's more than I could find when I started looking, so hopefully this will get you somewhere.
I expect to continue spending more time using DTrace for optimizations, and I'll hopefully start uploading my data so I don't have to worry about taking these snapshots. Graphs would sure be nice....
Tue, 05 Feb 2008 | Tags: tools, dtrace, puppet, performance, optimization, profiling, profiler
A bit more DTrace
(This should have been posted a while ago, but I guess I had a problem and it's been sitting uncommitted for a while.)
After pulling apart the skip method in the lexer, so that the various parts are in separate methods, I get this as my count:
Puppet::Parser::Lexer munge_token 56778 358 20335592 Class new 28242 889 25132822 Puppet::Parser::Parser ast 25881 1147 29695496 Fixnum < 1817071 16 30723097 StringScanner check 1829886 26 48732560 String length 3757782 20 78611361 Puppet::Parser::Lexer::TokenList each 56778 6618 375813485 Puppet::Parser::Lexer find_token 56778 6714 381227038 Hash each 84949 4563 387630769 Puppet::Parser::Parser import 9 45754308 411788774 Puppet::Parser::Parser _reduce_132 9 45755009 411795083 Object catch 56018 8086 452970031 Puppet::Parser::Lexer scan 173 2751816 476064309 Racc::Parser _racc_yyparse_c 173 2751907 476080064 Object __send__ 173 2751984 476093248 Racc::Parser yyparse 173 2752322 476151712 Puppet::Parser::Parser parse 173 2752742 476224530 Array collect 331 1446548 478807659 Array each 26303 18476 485983221
The interesting one there is the Lexer.find_token method -- I just created that, and it looks like it's taking 38/48 of the total parse time, which is a helluva lot.
This method is responsible for picking the token to return, and the complicated aspect of the method is that it has to return the longest match, which is currently done by matching each token in turn (skipping those that don't match), and picking the longest match. This is expensive, because it means that every token is iterated over for every returned token, which means it scales at O(N^2), which is bad.
Mon, 28 Jan 2008 | Tags: tools, dtrace, ruby, programming
A first pass at DTrace
I've never really spent much time optimizing Puppet except in those areas that get particular complaints (and not always then), but now that I'm forced to run Leopard I figured I should see if I can put DTrace to use.
The first pass used the functime.d script, which tells me how long Puppet spends in each function. I couldn't get the file to execute directly, and I also couldn't get it to execute my script for me (which is a pretty good indication that I don't really know how to use DTrace), so I added the ability to pause my test script, giving me time to start dtrace. So, I run my test script, which I'm using to test parse time:
~/puppet/ext/puppet-test --modulepath /Users/luke/Desktop/puppet-stanford/modules/ -s parser -t parse --manifest ~/Desktop/puppet-stanford/master/manifests/site.pp -p
Then I run the dtrace script:
sudo dtrace -s ./functime.d -p 45847 2>&1 | tee functimes.log
This takes a heckuva long time to run (380 seconds or so, vs. about 6 normally), but in the end I get a big file that has histograms for all of the classes and methods, along with a sorted list of how long Puppet spends in each method. E.g., here's a histogram:
Puppet::Parser::Parser parse
value ------------- Distribution ------------- count
8388608 | 0
16777216 | 1
33554432 | 0
67108864 | 1
134217728 |@@@@@@@ 30
268435456 |@@@@@@@@@ 39
536870912 |@@@@@@@@@@@ 49
1073741824 |@@@@@ 21
2147483648 |@@@ 14
4294967296 |@@ 10
8589934592 |@ 6
17179869184 | 2
34359738368 | 0
And here's a few of the methods:
Puppet::Parser::Parser ast 25881 1090 28219110 NilClass nil? 1982238 19 38713614 StringScanner check 2044008 24 50380323 Hash each 84949 2789 236945697 Puppet::Parser::Parser import 9 29288385 263595467 Puppet::Parser::Parser _reduce_132 9 29289048 263601440 Object catch 56018 5403 302711262 Puppet::Parser::Lexer scan 173 1752645 303207689 Racc::Parser _racc_yyparse_c 173 1752730 303222439 Object __send__ 173 1752803 303234951 Racc::Parser yyparse 173 1753138 303292971 Puppet::Parser::Parser parse 173 1753536 303361785 Array collect 331 925551 306357434 Array each 26303 11912 313340970
The first annoying thing to notice about this is that this test is clearly collecting total time between method entry and exit, not the total time that we're in a method, which makes it a bit less useful for testing.
The next thing to notice is that we're calling nil? and check a ton of times, which adds up even though they're individually very cheap.
If we add up all of the calls to check and nil?, we get a bit less than half of the total run time of the parse method (which is the entry point to all of this code), which means they're having a big impact.
This really isn't anything I couldn't get from normal Ruby profiling, but based on my experience working with Brendan a bit at OSCON last year, I know there's much more available.
My next post on DTrace will hopefully include me covering how I used it to drill down a bit further.