Tired of brittle, unpredictable LLM agents? Come help build the IDE that makes them actually work.
I'm partnering with one of the most exciting early-stage startups in the AI tooling space - a YC-backed company that's on a mission to bring structure (and joy) to human-AI collaboration.
They're building an IDE that lets anyone design, test, and deploy sophisticated AI agents using natural language - not code. Imagine the Notion of AI systems, empowering the next billion knowledge workers to create with AI.
They're now hiring an Applied AI Engineer to help architect and scale the core agent infrastructure - from memory and evaluation to real-world reliability.
🚀 What you'll build:
Multi-step, tool-using agents that call real APIs, manage auth, retries, timeouts, and all the tricky edge cases.
RAG pipelines that turn messy data into grounded, useful answers.
Memory systems that persist context - scratchpads, summary buffers, embedding stores.
Deterministic execution and replay tools so users can trace exactly how an agent thinks.
A robust eval framework blending automated checks with human-in-the-loop scoring.
Plus whatever greenfield ideas you want to bring to life.
🧑💻 Who they're looking for:
3+ years of engineering experience shipping production software. You've built agent-like systems: multi-step LLM workflows, tool-using bots, or scripted assistants.
Hands-on with:
RAG (embeddings, vector DBs, chunking)
Agent memory (scratchpads, history compression, summaries)
Orchestrating real tools + APIs (auth flows, plugins)
Evaluation - defining success metrics, running regression tests, iterating on agent behavior
Obsessed with fast response times, predictable outputs, traceability, and uptime. This is production, not research.
Thrive in fast-moving, product-first teams that bias for shipping.
Bonus points for:
Experience with (or strong opinions about) LangChain, CrewAI, DSPy.
Shipped agents used by actual customers - beyond internal demos.
Deep familiarity with LLM ops, tracing, observability.
Been a founder or early engineer who sweats the details of product quality.
🏢 The details:
Full-time, in-person role in San Francisco (Presidio) - 5 days a week.
Must have US work authorization (open to O-1 visas for exceptional folks).
You'll do the best work of your life alongside genuinely sharp, friendly people.
🎯 Why it matters:
Most LLM agents break in the wild. Here, you'll help build the platform that ensures they don't - enabling the world to create smarter AI systems without ever writing a line of code.
