Full Stack DevTools Engineer
Mission
We are seeking a Full-Stack DevTools Engineer to work on the
control plane for an agentic V&V (verification and
validation) system.
This role involves taking the outputs from our agentic testing
engines and turning them into an intuitive SaaS platform that
enables developers to measure and improve the quality of their
AI agents.
Role
Overview: This role is about designing and
building the backend and frontend of our SaaS product, bridging
between deep V&V technology and developers using it.
Backend Architecture: Implement Python backend
services that can coordinate and track execution environments,
testing workloads and results, and telemetry data, potentially
including cloud and local LLM inference strategies.
Frontend Design: Implement intuitive frontend
interfaces allowing users to visualize software quality KPIs,
test results, and execution telemetry, without getting
overwhelmed by the data.
AI Integration: Work closely with V&V and
integration engineers to weave appropriate AI capabilities into
the platform, ensuring a smooth flow from agentic frameworks
onto the developer dashboards.
Qualities
-
Complexity Tamer: You are comfortable
organizing non-deterministic data, fault clusters, and
statistical output, and distilling it into a clean,
actionable UI.
-
High-Assurance Builder: You want to make
reliable, production-grade software. You prioritize
robustness, performance, and rigor over chasing trendy
frontend hype.
-
AI Pragmatist: You understand LLMs and
agentic systems. You do not need to be an ML expert, but you
need to understand the AI stack well enough to build tools
that can serve agentic AI developers.
Bonus Points
-
Experience building developer tools, observability platforms,
or data-heavy SaaS products from the ground up.
-
Familiarity with complex data visualization or canvas-based UI
rendering.
-
Experience with local LLM orchestration and inference engines,
such as vLLM, Ollama, or llama.cpp.
Experience
Extensive background as a Full-Stack Software Engineer, with
deep proficiency in Python on the backend, modern frontend
frameworks such as React or Vue, and a history of shipping
complex, production-grade systems.
Familiarity with AI concepts, integrating LLM APIs, or working
alongside agentic orchestration frameworks.
Agentic Integration Engineer
Mission
We are seeking an Agentic Integration Engineer, interested in
instrumenting AI agent frameworks for V&V (verification and
validation).
This role involves diving deep into the source code of emerging
AI tools, building robust integrations between the agentic
ecosystem and our verification and validation systems.
Role
Overview: This is a code-focused role centered
on architecting, building, and maintaining hooks and
integrations with popular AI agent frameworks and developer
tools.
Low-Level Framework Mastery: Dive into the
internals of major AI libraries, such as LangGraph, LangChain,
Strands, and other popular frameworks. Build instrumentation,
telemetry capture, and execution control logic required to power
our platform.
Tooling Ecosystem: Design and implement elegant
SDKs and APIs, ensuring our fault-finding and fuzzing systems
integrate cleanly into the tools and workflows developers
already use.
Agentic Testing: Select, configure, fine-tune,
and integrate LLM-as-a-judge apparatus into agentic V&V
pipelines.
Qualities
-
Systems Architect: You care about building
resilient infrastructure that can reliably control and observe
non-deterministic AI behaviors.
-
Open Source Hacker: You like to lurk inside
repos, reading, writing, and debugging complex AI code, and
you are not afraid to find and fix upstream bugs, even in
poorly documented systems.
-
Execution Focused: You are excited about
clean, performant solutions that prioritize quality over hype.
Bonus Points
-
Experience building and maintaining public-facing SDKs, APIs,
or developer-focused CLI tools.
-
A background in low-level system instrumentation, AST
manipulation, or extending coverage-guided fuzz testing into
new environments.
Experience
-
Extensive background as a Software Engineer, with deep
proficiency in Python and a history of shipping complex,
production-grade systems.
-
Hands-on experience using, building, modifying, or integrating
with Agentic AI and LLM orchestration frameworks, or related
software.
Agentic Reliability Engineer
Mission
We are seeking an engineer to work at the intersection of Core
Engineering and QA.
This role involves finding the patterns behind agentic AI
software bugs, using the output of agentic V&V
(verification and validation) systems to cluster faults,
identify common root causes, and suggest remediations.
Role
Overview: This is a diverse role, starting
from understanding the telemetry of failing agents, and moving
into developing automated tools and methodologies for
identifying and isolating faults, enabling our users or their AI
coding tools to remediate faults so they cannot happen again.
Fault Clustering: Analyze failed agent
trajectories and group them into distinct root causes.
Fault Remediation: Harness software
development best practices and coding agents to devise and
prescribe long-term solutions for identified fault clusters in
service of our users.
Qualities
-
Code Detective: You get a rush out of
condensing piles of fault data into a few simple, elegant
fixes.
-
AI & LLM Connoisseur: You are
comfortable testing and bounding non-determinism and
navigating the latest Python AI underpinnings.
-
Statistics Sorcerer: You like to use data
analysis tools and statistics to transform complex telemetry
data into actionable root cause insights.
-
Quality Quarterback: Robust, high-quality
products and features are more important to you than trendy,
hyped ones.
Bonus Points
- Experience with coverage-guided fuzz testing.
- A background in SRE or SDET, but you also like to code.
Experience
Experience with existing telemetry, logging, tracing,
debugging, static or dynamic analysis, quality, testing, failure
injection, and performance monitoring tools.
Experience with developing AI or agentic systems or tools in
Python.
Agentic Systems Researcher
Mission
We are seeking an Agentic Systems Researcher who wants to apply
statistics, econometrics, and/or data science to the software
quality discipline.
This role involves conquering the combinatorial explosion of
agentic AI states, by using advanced probabilistic modeling to
intelligently sample, prioritize, and generate the most critical
test cases for a given AI agent.
Role
Overview: This is a highly quantitative,
code-forward role focused on building a statistical test
prioritization and generation engine for a targeted agentic
V&V (verification and validation) system.
Combinatorial Optimization: Design variance
reduction, heuristic sampling, and Extreme Value Theory (EVT),
or other relevant strategies that can prioritize
mission-critical and/or black swan failure modes, instead of
inefficiently executing potentially infinite test cases.
Structural Mapping: Adapt structural
econometric models, discrete choice frameworks, state-space
models, or other approaches that can probabilistically map the
decision trees of AI agents, identifying high-leverage risk
surfaces.
Coverage Quantification: Develop defensible
indices, capture-recapture, or other techniques that can
quantify the surface area of tested versus untested agent logic,
and provide ISO 9126 and ISO 25010 software quality KPIs and
coverage measurements to our users.
Qualities
-
Quantitative Rigor: You love applying
econometrics, causal inference, and advanced predictive
modeling to messy, real-world systems to manage risk and
ensure reliable behavior.
-
Dimensionality Destroyer: You are obsessed
with finding the signal in the noise and intuitively
understand how to reduce infinite state spaces into
manageable, high-value heuristics.
-
Python Pragmatist: You do not stop at
academic papers; you like to translate ideas into code that is
easy to connect to real-world tech stacks to bring theory into
reality.
Bonus Points
-
A background in econometrics, structural choice modeling, or
advanced predictive modeling.
-
Experience adapting statistical models to evaluate AI, ML,
LLM, and agentic systems.
-
Experience generating novel IP or published research in our
domain.
Experience
-
Extensive background in quantitative analysis, data science,
or statistics, with a history of deploying mathematical models
into production-grade software.
-
Understanding of software testing, fuzzing, or LLM
orchestration.