InterpretabilityFeb 2026
Feature dictionaries for frontier models at production scale
A method for extracting human-interpretable features from large models with low enough overhead to run on live traffic.
Research
Envariant is built on mechanistic interpretability research. We publish the methods that power the SDK.
A method for extracting human-interpretable features from large models with low enough overhead to run on live traffic.
We show that bounded activation edits can reliably shift named behaviors while preserving general capability.
A framework for declaring and enforcing behavioral guarantees — like PII non-disclosure — at inference time.