Full Stack DevTools Engineer

Mission

We are seeking a Full-Stack DevTools Engineer to work on the control plane for an agentic V&V (verification and validation) system.

This role involves taking the outputs from our agentic testing engines and turning them into an intuitive SaaS platform that enables developers to measure and improve the quality of their AI agents.

Role

Overview: This role is about designing and building the backend and frontend of our SaaS product, bridging between deep V&V technology and developers using it.

Backend Architecture: Implement Python backend services that can coordinate and track execution environments, testing workloads and results, and telemetry data, potentially including cloud and local LLM inference strategies.

Frontend Design: Implement intuitive frontend interfaces allowing users to visualize software quality KPIs, test results, and execution telemetry, without getting overwhelmed by the data.

AI Integration: Work closely with V&V and integration engineers to weave appropriate AI capabilities into the platform, ensuring a smooth flow from agentic frameworks onto the developer dashboards.

Qualities

  • Complexity Tamer: You are comfortable organizing non-deterministic data, fault clusters, and statistical output, and distilling it into a clean, actionable UI.
  • High-Assurance Builder: You want to make reliable, production-grade software. You prioritize robustness, performance, and rigor over chasing trendy frontend hype.
  • AI Pragmatist: You understand LLMs and agentic systems. You do not need to be an ML expert, but you need to understand the AI stack well enough to build tools that can serve agentic AI developers.

Bonus Points

  • Experience building developer tools, observability platforms, or data-heavy SaaS products from the ground up.
  • Familiarity with complex data visualization or canvas-based UI rendering.
  • Experience with local LLM orchestration and inference engines, such as vLLM, Ollama, or llama.cpp.

Experience

Extensive background as a Full-Stack Software Engineer, with deep proficiency in Python on the backend, modern frontend frameworks such as React or Vue, and a history of shipping complex, production-grade systems.

Familiarity with AI concepts, integrating LLM APIs, or working alongside agentic orchestration frameworks.

Agentic Integration Engineer

Mission

We are seeking an Agentic Integration Engineer, interested in instrumenting AI agent frameworks for V&V (verification and validation).

This role involves diving deep into the source code of emerging AI tools, building robust integrations between the agentic ecosystem and our verification and validation systems.

Role

Overview: This is a code-focused role centered on architecting, building, and maintaining hooks and integrations with popular AI agent frameworks and developer tools.

Low-Level Framework Mastery: Dive into the internals of major AI libraries, such as LangGraph, LangChain, Strands, and other popular frameworks. Build instrumentation, telemetry capture, and execution control logic required to power our platform.

Tooling Ecosystem: Design and implement elegant SDKs and APIs, ensuring our fault-finding and fuzzing systems integrate cleanly into the tools and workflows developers already use.

Agentic Testing: Select, configure, fine-tune, and integrate LLM-as-a-judge apparatus into agentic V&V pipelines.

Qualities

  • Systems Architect: You care about building resilient infrastructure that can reliably control and observe non-deterministic AI behaviors.
  • Open Source Hacker: You like to lurk inside repos, reading, writing, and debugging complex AI code, and you are not afraid to find and fix upstream bugs, even in poorly documented systems.
  • Execution Focused: You are excited about clean, performant solutions that prioritize quality over hype.

Bonus Points

  • Experience building and maintaining public-facing SDKs, APIs, or developer-focused CLI tools.
  • A background in low-level system instrumentation, AST manipulation, or extending coverage-guided fuzz testing into new environments.

Experience

  • Extensive background as a Software Engineer, with deep proficiency in Python and a history of shipping complex, production-grade systems.
  • Hands-on experience using, building, modifying, or integrating with Agentic AI and LLM orchestration frameworks, or related software.

Agentic Reliability Engineer

Mission

We are seeking an engineer to work at the intersection of Core Engineering and QA.

This role involves finding the patterns behind agentic AI software bugs, using the output of agentic V&V (verification and validation) systems to cluster faults, identify common root causes, and suggest remediations.

Role

Overview: This is a diverse role, starting from understanding the telemetry of failing agents, and moving into developing automated tools and methodologies for identifying and isolating faults, enabling our users or their AI coding tools to remediate faults so they cannot happen again.

Fault Clustering: Analyze failed agent trajectories and group them into distinct root causes.

Fault Remediation: Harness software development best practices and coding agents to devise and prescribe long-term solutions for identified fault clusters in service of our users.

Qualities

  • Code Detective: You get a rush out of condensing piles of fault data into a few simple, elegant fixes.
  • AI & LLM Connoisseur: You are comfortable testing and bounding non-determinism and navigating the latest Python AI underpinnings.
  • Statistics Sorcerer: You like to use data analysis tools and statistics to transform complex telemetry data into actionable root cause insights.
  • Quality Quarterback: Robust, high-quality products and features are more important to you than trendy, hyped ones.

Bonus Points

  • Experience with coverage-guided fuzz testing.
  • A background in SRE or SDET, but you also like to code.

Experience

Experience with existing telemetry, logging, tracing, debugging, static or dynamic analysis, quality, testing, failure injection, and performance monitoring tools.

Experience with developing AI or agentic systems or tools in Python.

Agentic Systems Researcher

Mission

We are seeking an Agentic Systems Researcher who wants to apply statistics, econometrics, and/or data science to the software quality discipline.

This role involves conquering the combinatorial explosion of agentic AI states, by using advanced probabilistic modeling to intelligently sample, prioritize, and generate the most critical test cases for a given AI agent.

Role

Overview: This is a highly quantitative, code-forward role focused on building a statistical test prioritization and generation engine for a targeted agentic V&V (verification and validation) system.

Combinatorial Optimization: Design variance reduction, heuristic sampling, and Extreme Value Theory (EVT), or other relevant strategies that can prioritize mission-critical and/or black swan failure modes, instead of inefficiently executing potentially infinite test cases.

Structural Mapping: Adapt structural econometric models, discrete choice frameworks, state-space models, or other approaches that can probabilistically map the decision trees of AI agents, identifying high-leverage risk surfaces.

Coverage Quantification: Develop defensible indices, capture-recapture, or other techniques that can quantify the surface area of tested versus untested agent logic, and provide ISO 9126 and ISO 25010 software quality KPIs and coverage measurements to our users.

Qualities

  • Quantitative Rigor: You love applying econometrics, causal inference, and advanced predictive modeling to messy, real-world systems to manage risk and ensure reliable behavior.
  • Dimensionality Destroyer: You are obsessed with finding the signal in the noise and intuitively understand how to reduce infinite state spaces into manageable, high-value heuristics.
  • Python Pragmatist: You do not stop at academic papers; you like to translate ideas into code that is easy to connect to real-world tech stacks to bring theory into reality.

Bonus Points

  • A background in econometrics, structural choice modeling, or advanced predictive modeling.
  • Experience adapting statistical models to evaluate AI, ML, LLM, and agentic systems.
  • Experience generating novel IP or published research in our domain.

Experience

  • Extensive background in quantitative analysis, data science, or statistics, with a history of deploying mathematical models into production-grade software.
  • Understanding of software testing, fuzzing, or LLM orchestration.

About Us

We are an early-stage, stealth-mode agentic AI QA startup, backed by Breyer Zou Ventures and successful entrepreneurs, investors, and operators such as Scott Sandell, Sheryl Sandberg, Jerry Yang, and KR Sridhar.

Contact: