Skip to content

MITM Proxy

The MITM proxy is Capsem’s HTTPS inspection layer. The Network Engine terminates TLS from the guest, parses HTTP/DNS/model traffic, lifts it into typed Security Events, asks the Security Engine for a decision, applies validated rewrites or blocks, forwards allowed traffic to the real upstream, and records resolved telemetry to the session database.

Each guest HTTPS connection flows through this pipeline:

graph TD
    A["Guest connection<br/>vsock:5002"] --> B["Read metadata prefix<br/>(optional process name)"]
    B --> C["TLS handshake<br/>MitmCertResolver captures SNI"]
    C --> D["Read HTTP request<br/>method, path, headers, body"]
    D --> E["Build http.request SecurityEvent"]
    E --> F{"Security Engine decision"}
    F -->|block| X["403 Forbidden<br/>+ resolved event"]
    F -->|ask| X
    F -->|rewrite| R["Validate/apply mutations"]
    F -->|allow| H["Upstream TLS connection<br/>(cached per-connection)"]
    R --> H
    H --> I["Forward request"]
    I --> J["Stream response to guest<br/>(inline SSE parsing for AI traffic)"]
    J --> K["Emit resolved telemetry<br/>SecurityEvent + projections"]

The proxy uses hyper for HTTP parsing and tokio-rustls for TLS. Each vsock connection can carry multiple HTTP requests via keep-alive — upstream connections are cached per-connection to avoid re-establishing TLS for each request.

graph LR
    CA["CertAuthority<br/>(static CA keypair)"]
    SEC["Security Engine<br/>(rules + detections + ask)"]
    DB["DbWriter<br/>(async telemetry)"]
    TLS["Upstream TLS config<br/>(webpki roots)"]
    PRICE["PricingTable<br/>(embedded JSON)"]
    TRACE["TraceState<br/>(multi-turn linking)"]

    CA --> CFG["MitmProxyConfig"]
    SEC --> CFG
    DB --> CFG
    TLS --> CFG
    PRICE --> CFG
    TRACE --> CFG
FieldTypePurpose
caArc<CertAuthority>Static Capsem CA for leaf cert minting
dbArc<DbWriter>Async telemetry writer to session.db
upstream_tlsArc<rustls::ClientConfig>Shared TLS config with webpki root CAs
telemetryTelemetryDepsPricing, trace state, and canonical evidence writers
pipelineArc<Pipeline>Transport chunk processing and telemetry hooks
mcp_endpointOption<Arc<McpEndpointState>>Framed MCP endpoint for guest traffic

The proxy mints per-domain TLS certificates on-the-fly, signed by a static Capsem CA.

sequenceDiagram
    participant G as Guest
    participant R as MitmCertResolver
    participant CA as CertAuthority
    participant C as Cache

    G->>R: TLS ClientHello (SNI: github.com)
    R->>C: Lookup github.com
    alt Cache hit
        C-->>R: Arc<CertifiedKey>
    else Cache miss
        R->>CA: mint_leaf("github.com")
        CA-->>R: CertifiedKey [leaf, ca]
        R->>C: Store in cache
    end
    R-->>G: TLS ServerHello + cert chain
ParameterValue
AlgorithmECDSA P-256
Validity24 hours
Back-dating1 hour (clock skew tolerance)
SANDNS name of the target domain
Extended key usageServerAuth
Chain[leaf, CA] (2 certificates)
CA key sourceconfig/capsem-ca.key (committed, compile-time include_str!)

The cache uses double-checked locking: read lock for hits, write lock only on miss with a second check after acquiring the write lock. Concurrent requests for the same domain never mint duplicate certs.

The MITM proxy CA private key is committed to the repository. This is intentional — the CA is only trusted inside Capsem’s own air-gapped VMs and has zero trust outside them. A public key provides transparency: anyone can verify there is no hidden interception. Per-installation key generation would reduce auditability.

The Network Engine owns parsing and transmission. It does not own policy semantics. For each synchronous decision point it builds a typed SecurityEvent and expects one of four final actions from the Security Engine:

ActionNetwork behavior
allowForward the request or response unchanged.
askPause/fail closed until the confirm path resolves the decision.
blockStop transmission and return the protocol-appropriate denial.
rewriteApply only validated declarative mutations to allowlisted fields.

The resolved event records the input, matched rule/finding ids, final decision, allowed mutations, and attribution before telemetry/log projections are written.

Profile-owned enforcement rules provide request and response control. Rules use canonical policy roots such as http.request.host, http.request.url, http.request.path, http.request.header("authorization").exists(), and http.request.body.text.contains("secret"). Authored rules do not target internal event.* fields.

Subject fieldExample use
http.request.hostBlock a specific host or suffix.
http.request.methodBlock write methods such as POST or DELETE.
http.request.pathMatch repository, API, or organization paths.
http.request.urlMatch the full normalized URL.
http.request.header(name)Match, require, or strip request headers.
http.response.statusMatch upstream status on response policy.

Example:

[security.rules.http.block_openai_github]
on = "http.request"
if = 'http.request.host == "github.com" && http.request.path.startsWith("/openai")'
decision = "block"
priority = 10

Header stripping is a rewrite rule and runs before the stripped headers are forwarded or captured in telemetry:

[security.rules.http.strip_auth]
on = "http.request"
if = 'http.request.host == "api.example.com"'
decision = "rewrite"
priority = 20
strip_request_headers = ["authorization", "x-api-key"]

For AI provider domains, the proxy parses SSE response streams inline to extract structured telemetry.

DomainProviderAPI paths
api.anthropic.comAnthropic/v1/messages
api.openai.comOpenAI/v1/responses, /v1/chat/completions
generativelanguage.googleapis.comGoogle/v1beta/*
graph LR
    A["HTTP response body<br/>(chunked)"] --> B["AiResponseBody<br/>(hyper Body wrapper)"]
    B --> C["SseParser<br/>(stateful wire format)"]
    C --> D["ProviderStreamParser<br/>(Anthropic/OpenAI/Google)"]
    D --> E["Vec&lt;LlmEvent&gt;<br/>(accumulated)"]
    E --> F["collect_summary()<br/>(pure function)"]
    F --> G["StreamSummary<br/>(text, tools, tokens, cost)"]

Parsing runs inline during poll_frame() — response bytes pass through unchanged to the guest with zero added latency.

EventFieldsDescription
MessageStartmessage_id, modelStream began
TextDeltaindex, textIncremental text output
ThinkingDeltaindex, textReasoning/chain-of-thought output
ToolCallStartindex, call_id, nameModel invoked a tool
ToolCallArgumentDeltaindex, deltaIncremental tool call JSON arguments
ToolCallEndindexTool call arguments complete
ContentBlockEndindexContent block finished
Usageinput_tokens, output_tokens, detailsToken usage update (details: cache_read, thinking, etc.)
MessageEndstop_reasonStream finished (EndTurn, ToolUse, MaxTokens, ContentFilter)
Unknownevent_type, rawUnrecognized SSE event (logged, not parsed)
OriginCriteriaExample
nativeDefault for tool names without __write_file, bash
localMatches is_builtin_tool()fetch_http, grep_http, http_headers
mcp_proxyName contains __ (MCP namespace separator)github__list_repos

Model pricing is loaded from config/genai-prices.json (embedded at compile time via include_str!). Cost = (input_tokens * input_price + output_tokens * output_price). Updated via just update_prices.

The TraceState tracks multi-turn agent conversations across request/response cycles:

sequenceDiagram
    participant Agent
    participant Proxy
    participant State as TraceState

    Agent->>Proxy: Request (no tool_responses)
    Note over Proxy: New trace_id = UUID
    Proxy->>State: Register tool call_ids
    Proxy-->>Agent: Response (stop: ToolUse, calls: [A, B])

    Agent->>Proxy: Request (tool_responses for A, B)
    Proxy->>State: Lookup call_ids [A, B]
    Note over State: Found trace_id from previous turn
    Proxy->>State: Register new call_ids [C]
    Proxy-->>Agent: Response (stop: ToolUse, calls: [C])

    Agent->>Proxy: Request (tool_responses for C)
    Proxy->>State: Lookup call_id [C]
    Proxy-->>Agent: Response (stop: EndTurn)
    Proxy->>State: Complete trace (cleanup)

All model_calls rows in the same trace share a trace_id, enabling per-turn cost and token aggregation.

Telemetry is emitted asynchronously after the response body completes (not during streaming):

Event typeWhenData
SecurityEventEvery enforced HTTP/model decisionEvent family/type, subject, context, findings, decision, mutations, attribution
NetEvent projectionEvery HTTP requestDomain, method, path, status, bytes, latency, final decision, body previews
ModelCall projectionAI provider requests onlyProvider, model, tokens, cost, tool calls, text content, trace_id

The TelemetryBody wrapper around the hyper response body triggers tokio::spawn(emitter.emit()) when the body stream reaches EOF.

OptimizationMechanism
Connection reuseUpstream reqwest sender cached per-connection for keep-alive
TLS session reuseShared rustls::ClientConfig with webpki roots
Cert cachingDouble-checked locking; each domain minted once
Inline parsingSSE parsing runs in poll_frame(), zero-copy passthrough
Async telemetryDB writes happen on a dedicated thread; never blocks the proxy
Compiled rule snapshotsArc clone per request avoids holding registry locks during I/O
FilePurpose
capsem-core/src/net/mitm_proxy.rsConnection handling, HTTP forwarding, telemetry emission
capsem-core/src/net/cert_authority.rsCA loading, leaf cert minting, cache
crates/capsem-security-engine/SecurityEvent decisions, CEL/Sigma matching, resolved-event evidence
capsem-core/src/net/mitm_proxy/HTTP/model SecurityEvent projection and proxy pipeline
capsem-core/src/net/ai_traffic/SSE parsing, provider parsers, events, pricing
capsem-core/src/net/ai_traffic/mod.rsTraceState for multi-turn linking
config/capsem-ca.key, config/capsem-ca.crtStatic ECDSA P-256 CA keypair