If you're excited by making AI systems actually work in production, not just building demos, this is a high-impact role in a small, elite team.
You'll be joining a venture-backed, AI-native product company building a new way for users to interact with complex systems through multi-agent AI. The product is already live, and now the focus is scaling quality, reliability, and performance.
Why this role is different
Most AI roles stop at prototypes. This one doesn't.
You'll:
- Own real production AI systems used by customers
- Work on multi-agent architectures, not isolated models
- Drive evaluation, metrics, and system quality, the hardest part of applied AI
- Operate in a small, senior team (~10 engineers) with real autonomy
This is where strong engineers come to turn AI into a product, not just a demo.
What you'll actually do
- Diagnose where AI systems break and fix them systematically
- Design evaluation frameworks, metrics, and KPIs
- Improve LLM prompts, agents, and workflows
- Run experiments and ship measurable improvements
- Work on recommendation, optimisation, and reasoning systems
You'll sit at the intersection of ML, product, and engineering, with direct ownership of outcomes.
What they're looking for
- Strong ML fundamentals (evaluation, ranking, embeddings, experimentation)
- Experience with LLMs in production (prompts, agents, evaluation)
- An evaluation-first mindset (you think in metrics, not guesses)
- Ability to ship and iterate quickly in real systems
Why it's compelling
- High ownership from day one
- Work directly on core product systems, not side projects
- Solve some of the hardest problems in applied AI:
- reliability
- evaluation
- agent behaviour
- Join at the point where prototypes become scalable systems
