Greeb is a simple yet awesome and Unicode-aware regexp-based tokenizer, written in Ruby.
Dmitry Ustalov
MIT