training environments to raise general intelligence.
We are an applied research lab in San Francisco. We build training environments and evaluations for autonomous systems.
Our environments are high-fidelity representations of real-world workflows: long-horizon, multi-app, built with world-class domain experts. Inside them, models carry out the work. Automated verifiers, designed by those same experts, score how it was done. Those scores become reward signals during training and evidence of what frontier models can and can't yet do.
Most environments today are built around work that fits in one tool, with rewards that fit in one function. Long-horizon, multi-app work breaks both assumptions. Building it correctly takes the people who do that work as a profession, not as researchers studying it.
Every environment we ship is a contract: train inside it, and the model gets meaningfully better at work a person is paid to do. The frontier moves where the work is.
- 01The verification ceiling2026