quicktok: a faster tokenizer

quicktok is a fast, exact BPE tokenizer written in C++. Token ids are byte-identical to tiktoken, and encoding runs 2–3.5× faster than bpe-openai (the fastest alternative I know of) and 4–11× faster than tiktoken itself. I believe it’s the fastest exact CPU tokenizer available today for these encodings. It ships cl100k, o200k, GPT-OSS (o200k_harmony), Llama-3, and Qwen2.5/3, all byte-exact, plus bring-your-own Llama-4. This is useful for anyone doing large amounts of CPU-bound data processing — search indexing, ingesting corpora, token counting/billing — and can significantly reduce the time and cost of data ingestion. It can also be used for online request serving, such as CPU-bound inference paths (token counting, embedding serving). ...

June 11, 2026 · 3 min · dmatth1