It's a little visualizer that's like WinDirStat but for LM rollouts.
Right now, it's backed by Qwen3-4b-base.
Each tile represents a rollout of the prompt you provide to the depth you set, with each tile's area proportional to the joint probability of that rollout among all rollouts of that depth under the sampling params you set.
Type a prompt and hit Go to start exploring!
How does it work?
Literally just exhaustive BFS on the tree of possible rollouts under the sampling conditions! vLLM does the heavy lifting with continuous batching, prefix caching, and (theoretically) paged KV caching although I think at the depths we're rolling out to, paging doesn't really have an effect. I've limited each rollout to 3000 nodes which is plenty to get ~exhaustive sampling under the worst case conditions I allow in the API.
What hardware does this run on?
1x 5090. Pretty good but nowhere near the throughput of an H100. I have a surplus MI250X I'm trying to get working, and I might shift the backend over to use it if/when that happens.
Why not run the model on the frontend?
Lack of existing efficient batched inference frameworks in browser! I'm working on one and this will be the first use-case.
How did you implement this?
I had the overarching idea come to mind, but this is all Claude Opus 4.6's code on the frontend and backend, and of course the many talented folks at vLLM who actually make inference run really fast.