A bit more DTrace
(This should have been posted a while ago, but I guess I had a problem and it's been sitting uncommitted for a while.)
After pulling apart the skip method in the lexer, so that the various parts are in separate methods, I get this as my count:
Puppet::Parser::Lexer munge_token 56778 358 20335592 Class new 28242 889 25132822 Puppet::Parser::Parser ast 25881 1147 29695496 Fixnum < 1817071 16 30723097 StringScanner check 1829886 26 48732560 String length 3757782 20 78611361 Puppet::Parser::Lexer::TokenList each 56778 6618 375813485 Puppet::Parser::Lexer find_token 56778 6714 381227038 Hash each 84949 4563 387630769 Puppet::Parser::Parser import 9 45754308 411788774 Puppet::Parser::Parser _reduce_132 9 45755009 411795083 Object catch 56018 8086 452970031 Puppet::Parser::Lexer scan 173 2751816 476064309 Racc::Parser _racc_yyparse_c 173 2751907 476080064 Object __send__ 173 2751984 476093248 Racc::Parser yyparse 173 2752322 476151712 Puppet::Parser::Parser parse 173 2752742 476224530 Array collect 331 1446548 478807659 Array each 26303 18476 485983221
The interesting one there is the Lexer.find_token method -- I just created that, and it looks like it's taking 38/48 of the total parse time, which is a helluva lot.
This method is responsible for picking the token to return, and the complicated aspect of the method is that it has to return the longest match, which is currently done by matching each token in turn (skipping those that don't match), and picking the longest match. This is expensive, because it means that every token is iterated over for every returned token, which means it scales at O(N^2), which is bad.
Mon, 28 Jan 2008 | Tags: tools, dtrace, ruby, programming