How Laminae Actually Works: Architecture of a Rust AI Safety SDK
Table of contents
Most AI safety SDKs describe what they do. This post describes how.
I'm going to walk through the internals of Laminae's six crates, the actual data flow, the design decisions that aren't obvious from the API surface, and the places where I made trade-offs I'm still not sure about. If you've read the intro post, you know what Laminae is. This is about how the machinery works.
#Psyche: The Invisible Pipeline
The Psyche pipeline has three agents, but the interesting part isn't the agents themselves. It's the context injection protocol.
When a message arrives, the first thing Psyche does is classification. Before any LLM call happens, a pure-Rust function decides how much processing this message deserves:
pub fn classify_tier(message: &str) -> ResponseTier {
let len = message.len();
if len < 100 {
return ResponseTier::Light;
}
let lower = message.to_lowercase();
let complex_markers = [
"explain", "analyze", "compare", "design",
"architect", "refactor", "review", "debug",
"implement", "build", "strategy", "plan",
"optimize", "evaluate", "critique",
"help me think", "what do you think",
"pros and cons", "trade-offs",
// ... plus a few more
];
if complex_markers.iter().any(|m| lower.contains(m)) {
return ResponseTier::Full;
}
if len < 300 { ResponseTier::Light } else { ResponseTier::Full }
}Three tiers: Skip (known greetings and acknowledgments), Light (short messages that get compressed Id/Superego signals), and Full (complex requests that get the complete three-agent pipeline). The should_skip_psyche function catches things like "hello," "thanks," and "ok" against an explicit allowlist before any LLM call fires. This is deterministic, zero-cost routing.
For messages that do enter the pipeline, Id and Superego run concurrently via tokio::join!. Both hit a local Ollama instance (Qwen2.5:7b by default). Id runs at temperature 0.9 (divergent, creative). Superego runs at 0.3 (conservative, cautious). Their outputs never reach the user directly.
Instead, the outputs get compressed into a context block that's prepended to the Ego's system prompt:
pub fn ego_context(id_output: &str, superego_output: &str, config: &PsycheConfig) -> String {
let weight_note = config.weight_instruction();
let mut ctx = String::with_capacity(id_output.len() + superego_output.len() + 200);
ctx.push_str("[COGNITIVE CONTEXT — invisible to user]\n");
ctx.push_str(&weight_note);
ctx.push('\n');
if !id_output.is_empty() {
ctx.push_str("\n[Creative signals]\n");
ctx.push_str(id_output);
ctx.push('\n');
}
if !superego_output.is_empty() {
ctx.push_str("\n[Safety assessment]\n");
ctx.push_str(superego_output);
ctx.push('\n');
}
ctx.push_str("\n[END COGNITIVE CONTEXT]");
ctx
}The Ego (your actual LLM, Claude or GPT or whatever) receives this context block as part of its system prompt. It doesn't know it's being shaped. The user doesn't see the shaping. The weight instruction tells the Ego how much to lean on each signal: "Creative influence: significant (60%). Safety influence: moderate (40%)." These weights are configurable per-instance.
The COP (Compressed Output Protocol) variant is the Light-tier optimization. Instead of free-form prose, Id and Superego use structured output formats ("ANGLES: ... REFRAME: ... TONE: ..." for Id, "VERDICT: PASS/BLOCK RISKS: ... BOUNDS: ..." for Superego) capped at 80 tokens by default with a 15-second timeout (both configurable via PsycheConfig). If either times out or errors, the pipeline degrades gracefully: a double-unwrap chain handles both timeout and backend failures, falling back to an empty string so the Ego runs without that signal.
One thing I still go back and forth on: the Superego can issue a hard BLOCK. If it outputs "VERDICT: BLOCK" or starts with "BLOCK:", Psyche returns Err(PsycheError::Blocked(reason)) and the Ego never fires. This means a 7B local model has veto power over a frontier model. That's by design, but it means a bad Superego prompt or a hallucinating local model can block legitimate requests. The safety-side bias is intentional. Keep that in mind.
#Shadow: Multi-Stage Red-Teaming
Shadow's architecture is a pipeline of Analyzer trait implementations, each independent and fallible:
pub trait Analyzer: Send + Sync {
fn name(&self) -> &'static str;
async fn is_available(&self) -> bool;
async fn analyze(
&self,
ego_output: &str,
code_blocks: &[ExtractedBlock],
) -> Result<Vec<VulnFinding>, AnalyzerError>;
}Three analyzers run in the default ShadowEngine pipeline, gated by an aggressiveness level (1-3):
Stage 1 (aggressiveness >= 1): StaticAnalyzer. Pre-compiled regex patterns via LazyLock. The rules are defined as const arrays and compiled into Regex objects exactly once, on first access. SQL injection, XSS, hardcoded secrets, path traversal, insecure deserialization, weak crypto, infinite loops. Each rule carries a CWE identifier and remediation text. Custom rules plug in via ShadowRule, which uses Cow<'static, str> for zero-copy when rules are compile-time constants.
Stage 2 (aggressiveness >= 2): LlmReviewer. A local Ollama model with an attacker-mindset prompt reviews the output for exploitability that regex can't catch (logic flaws, social engineering vectors, subtle data exfiltration).
Stage 3 (aggressiveness >= 3): SandboxManager. Ephemeral container execution with --network=none, --memory=128m, --read-only, --cap-drop=ALL.
Two additional standalone analyzers implement the same Analyzer trait for use outside the default pipeline:
SecretsAnalyzer. Format-specific credential detection for GitHub PATs (ghp_), OpenAI keys (sk-), Anthropic keys (sk-ant-api), Slack tokens (xox[bpoas]-), Stripe keys (sk_live_), JWTs, database connection strings. Evidence is redacted before it enters the report: redact_secret("ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZab") produces "ghp_ABCD***YZab".
DependencyAnalyzer. Catches pipe-to-shell installs (curl ... | bash), insecure package indices (HTTP instead of HTTPS), previously-compromised npm packages (event-stream, ua-parser-js, coa), and git dependencies over unencrypted protocols.
The deduplication is straightforward but important. After all stages run, findings are sorted by (category, title, evidence) and deduplicated via dedup_by. This prevents the same SQL injection from showing up three times because it was caught by the scanner, the static rules, and the LLM reviewer. The composite key is the tricky part: two findings with the same category and title but different evidence are distinct (same vulnerability class, different instances).
The entire pipeline runs in a detached tokio::spawn. The caller gets an mpsc::Receiver<ShadowEvent> back immediately. Findings stream in as they're discovered. The report store is a bounded VecDeque<VulnReport> behind an Arc<RwLock<_>>, capped at 100 reports. When it fills up, the oldest report is evicted.
#Glassbox: LLM Containment in 150 Nanoseconds
Glassbox is the LLM containment layer, and architecturally the simplest crate. That's the point. Every validation method does the same thing: normalize the input with Unicode NFKC, lowercase it, scan against pattern lists, return Ok(()) or Err(GlassboxViolation).
The NFKC normalization is the non-obvious part. Without it, an attacker can bypass pattern matching using Unicode confusables. The fullwidth Latin letter "s" (U+FF53) looks like "s" but won't match a lowercase ASCII comparison. NFKC normalization collapses these into their canonical ASCII equivalents before comparison:
pub fn validate_input(&self, text: &str) -> Result<(), GlassboxViolation> {
let normalized: String = text.nfkc().collect();
let lower = normalized.to_lowercase();
for pattern in &self.config.input_injection_patterns {
if lower.contains(pattern) {
return Err(GlassboxViolation::Blocked {
category: "prompt_injection".to_string(),
reason: "Input contains an attempt to bypass safety systems.".to_string(),
});
}
}
Ok(())
}Path validation uses std::path::Path::canonicalize() to resolve symlinks before checking against immutable zones. If the path doesn't exist yet (can't canonicalize), it falls back to manual component-based normalization that resolves .. segments. Double-slash normalization catches /protected//zone attempts.
The rate limiter is a Mutex<HashMap<String, Vec<Instant>>>. Every tool call records a timestamp. On check, it prunes timestamps older than 60 seconds, counts recent calls per-tool and globally, and enforces four separate limits: per-tool (30/min), total (100/min), writes (5/min), shells (10/min). The mutex uses unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks rather than panicking, because a containment layer that panics under stress is worse than useless.
The benchmarked 150ns for validate_input comes from the fact that there's no allocation in the hot path for short inputs (NFKC of ASCII is a no-op), and pattern matching against the default 5-element injection list is effectively constant time.
#Ironclad: Process Sandboxing Across Three Platforms
Ironclad handles process sandboxing through a single SandboxProvider trait with three methods: sandboxed_command, is_available, name. Each platform implements it differently.
On macOS, SeatbeltProvider generates a Seatbelt profile string at runtime. The profile starts with (deny default) and explicitly allows only what's needed. File writes are restricted to the project directory, /tmp, /private/tmp, and /var/folders. Network outbound is limited to localhost; whitelisted hosts get port 443 only. All inbound connections are denied. The profile is passed to sandbox-exec -p which applies it at the kernel level before exec.
On Linux, LinuxSandboxProvider uses pre_exec hooks that run between fork() and exec() in the child process. Three layers apply in sequence: prctl(PR_SET_NO_NEW_PRIVS) prevents privilege escalation through setuid binaries. unshare(CLONE_NEWUSER | CLONE_NEWNET) creates an isolated network namespace (no interfaces, no connectivity). Resource limits via setrlimit cap file size (256MB), CPU time (600s), address space (4GB), open FDs (256), and process count (64, to prevent fork bombs).
The Linux provider makes a critical fail-closed decision: if NetworkPolicy::None is requested but unshare fails (kernel doesn't allow unprivileged user namespaces), the child is never spawned. The function returns an error rather than running with full network access. For Restricted and LocalhostOnly, the failure is logged as a warning but execution continues, because those policies are best-effort by nature.
On Windows, WindowsSandboxProvider uses Job Objects for memory limits, process count limits, and environment variable scrubbing. It's the weakest of the three: no filesystem isolation, no network isolation. Those would require AppContainers or Windows Filtering Platform, which are significantly more complex to set up.
default_provider() returns the right implementation via cfg gates. The platform selection is compile-time, though method calls go through Box<dyn SandboxProvider> for API ergonomics.
#The Error Philosophy
Every crate has its own error enum: PsycheError, ShadowError, ClaudeError, OpenAIError, GlassboxViolation. No anyhow::Error in public APIs (though anyhow is used internally for plumbing).
The reason is match exhaustiveness. When Psyche returns Err(PsycheError::Blocked(reason)), the caller can distinguish "the Superego blocked this" from "Ollama is down" from "the Ego backend timed out." With anyhow, you'd be string-matching error messages.
Config structs are #[non_exhaustive]:
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct ClaudeConfig {
pub api_key: String,
pub model: String,
pub max_tokens: u32,
// ...
}This means adding a new field in a future release is a non-breaking change. Users can't construct the struct with literal syntax (they have to use Default::default() or builder methods), so a new field with a default value doesn't break their code.
The convenience vs. config constructor split is deliberate. ClaudeBackend::new("key") calls .expect() internally because the only way it can fail is if the TLS runtime can't initialize (catastrophic). ClaudeBackend::with_config(config) returns Result<Self, ClaudeError> because user-provided config can be invalid in recoverable ways. from_env() returns Result because the environment variable might not be set. The rule: if the failure is the caller's fault (missing env var, bad config), return Result. If the failure means the system is fundamentally broken (no TLS), panic.
#Backend Abstraction
The EgoBackend trait is a single required method with one optional streaming override:
pub trait EgoBackend: Send + Sync {
fn complete(
&self,
system_prompt: &str,
user_message: &str,
psyche_context: &str,
) -> impl std::future::Future<Output = Result<String>> + Send;
fn complete_streaming(/* ... */)
-> impl std::future::Future<Output = Result<mpsc::Receiver<String>>> + Send
{ /* default: wraps complete() */ }
}Both ClaudeBackend and OpenAIBackend implement it. The psyche_context parameter is merged with system_prompt using a shared build_system pattern: if both are non-empty, they're joined with \n\n. If either is empty, the other is used alone. This means the Psyche context injection works identically regardless of which backend you use.
OpenAIBackend has convenience constructors for providers: ::groq("key"), ::together("key"), ::deepseek("key"), ::local("http://localhost:8080/v1"). Each sets the right base URL and default model. The ::local() constructor takes no API key because most local servers don't need one.
API keys are #[serde(skip_serializing)] on the config structs. If you serialize a ClaudeConfig to JSON (for debugging, logging, config files), the key is omitted. Deserialization fills it with the default empty string.
That's the architecture. Six crates, each with a clear boundary, each usable independently. The pipeline is Glassbox(input) -> Psyche(Id + Superego -> Ego) -> Glassbox(output) -> Shadow(async). LLM containment enforced in compiled Rust, at the syscall level, where no amount of prompt engineering can reach.
The code is on GitHub. If something here doesn't match the source, the source wins.
New to Laminae? Start with the origin story. For the API overview, read The Missing Layer. To build something hands-on, try the chatbot tutorial.
Related posts
Laminae: The Missing Layer Between Raw LLMs and Production AI
Why I built a modular Rust SDK for AI safety, personality, and containment. And what I learned from building it from scratch twice before getting it right.
MCPDome: Why Your AI Agents Need a Security Gateway
AI agents talk to tools over MCP with zero security in the middle. MCPDome is a Rust proxy that intercepts every JSON-RPC message and enforces auth, policy, rate limiting, and injection detection — without touching either side.
Build a Safe Chatbot with Laminae in 15 Minutes
A step-by-step tutorial: wire up Glassbox containment, Psyche personality, and Shadow red-teaming into a working Rust chatbot. Real code, real safety, no prompt engineering.