What we believe
AI progress is not being measured correctly.
Models are getting better on benchmarks, but not reliably better to use. This is dangerous. We risk building systems that perform well on tests while missing what people actually want.
The most important problem in AI right now is simple:
Align the models with the capabilities that matter most for furthering humanity.
What we do
We build products used by millions of people around the world to give us direct access to how people actually use models. We built our flagship product, DesignArena, because we believed taste, despite being one of the hardest things to measure, was too valuable to be overlooked. We've now scaled DesignArena to over 3 million users across 190+ countries and counting, becoming the most widely cited benchmark for AI design. We've announced model releases with OpenAI, xAI, Google DeepMind, and more.
Why this matters
Every major model improvement is driven by evaluations.
Whoever defines the evaluations defines what models become good at.
This is a leverage point over the entire field—and it sits directly on the critical path of getting AI to actually serve humanity.
If you want to help steer the model capabilities that matter the most, join us.
