Services

Three engagement shapes. One discipline.

Each engagement begins with a written threat model and a measurable success criterion. We refuse work that can't be evaluated.

Adversarial Red Teaming

Pre-deployment elicitation of harmful, deceptive, or off-policy capabilities.

01Multi-turn elicitation strategies with documented prompt lineages.
02Jailbreak research across direct, indirect, and prompt-injection vectors.
03Capability extraction (cyber tooling, biosecurity reasoning, deception proxies).
04Findings ship with severity rubrics, reproduction transcripts, and mitigation hypotheses.

Cadence — 2–6 week engagements, sometimes longer for capability-frontier work.

Capability Evaluations

Task suites built to survive successor models without redesign.

01Custom evaluation harnesses for dangerous-capability bands.
02Long-horizon agentic task suites (tool-use, planning, multi-step exfiltration).
03Calibration suites for measuring elicitation gap between baseline and post-RLHF behavior.
04Versioned datasets and harnesses delivered as code, runnable on your infrastructure.

Cadence — 4–12 weeks depending on suite scope. Maintenance retainers available.

Agentic Safety Infrastructure

Tooling for labs deploying autonomous agents where rollback is expensive.

01Sandboxing primitives with provenance-tracked tool calls.
02Action-log review tooling, alerting, and rollback-affordance design.
03Interpretability hooks integrated into agent loops without measurable latency cost.
04Operational playbooks for the human-on-the-loop reviewer.

Cadence — 6–16 weeks. Hand-off includes engineering knowledge transfer.

How a study runs

Threat model in writing

Two-page document. Adversary capabilities, in-scope behaviors, out-of-scope behaviors, falsifiable success criterion. Signed by both parties before work begins.

Bounded engagement

Fixed scope, fixed duration, fixed deliverable. Mid-engagement findings are reported on a defined cadence (usually weekly) rather than at the end.

III

Reproducible artifact

Every finding ships with the prompts, the transcripts, the harness code, and the rubric used. Anyone in your org can re-run the study and verify the result without us in the room.

Mitigation handoff

Where you ask, we draft hypothesized mitigations — and where possible, the eval that would falsify the mitigation. The work doesn't end at the diagnosis.

On commercial shape

We work fixed-scope when the threat model permits it and time-and- materials when the work is genuinely exploratory. We will not run an engagement on margins that incentivize us to find something where there is nothing.

We don't do retainer work that doesn't produce artifacts. We don't do compliance theater. We don't badge-rent our findings to people who weren't in the room.

The right engagement begins with the question we'd try to answer first.