Tired of LLMs breaking in the wild? Come build the platform that prevents it.
One of the most exciting early-stage startups in the AI infrastructure space is hiring a Founding Machine Learning Engineer to help shape the future of LLM observability, testing, and evaluation.
They're backed by Y Combinator and founded by IIT Bombay alumni with experience at ETH Zurich and top-tier quant trading firms. The mission? To make sure LLM-powered voice agents actually work - before they go live.
Their platform automatically simulates thousands of real-world conversations - from ordering food to handling job interviews - to stress-test agents with scale and depth. Think load testing meets GPT, with full evaluation, benchmarking, and monitoring.
🛠 What you'll build:
- AI tools to test, evaluate and benchmark large language models (LLMs)
- Scalable pipelines for real-time agent monitoring and performance feedback
- Core infrastructure for LLM agent reliability
- Customer-facing features, working directly with users and founders
🙋♂️ Who they're looking for:
- Strong Python and ML engineering experience
- Hands-on background in LLM product development or deployment
- Interest in agent infrastructure, evaluation frameworks, or LLM testing
- Bonus if you've worked in early-stage startups or on AI tooling
💡 This is your chance to be the technical co-founder of a product every AI team will need building at the edge of what's possible in AI reliability.
If you (or someone you rate highly) is excited by the intersection of AI agents, LLM infrastructure, and startup ownership - drop me a message. Happy to share more.
