Ragel generates very inefficient code (in Ruby), so I think you're better off writing a tokenizer by hand or using a generator specifically for Ruby. I was able to speed GraphQL's lexing by 2x by "using Ruby" https://github.com/rmosolgo/graphql-ruby/pull/4369
A lovely demonstration of using StringScanner for parsing. One of the less-appreciated libraries in Ruby’s stdlib, but one I wind up using surprisingly often.
RE: https://mastodon.social/users/tenderlove/statuses/109961965253467776
@tenderlove Your matcher for an unknown character is interesting. This seems to be a pain when using strscan where it gets stuck. I've seen a few different riffs on it (checking if pos changed and raising, for example) but never seen it just match and emit a token. Very nice!
@adam12 thanks, I think it's a common tactic among tokenizers (though I've not surveyed many tokenizers 😅), also it's what the existing tokenizer did. Nice thing is if it's a streaming tokenizer, the parser can give you a good error without trying to tokenize the entire stream