Rule Corpus Workflow
The rule corpus is the shared test ledger for Capsem enforcement and
detection. It prevents capsem-admin, Detection IR, Rust CEL evaluation, and
expected backtest output from drifting apart.
Layout
Section titled “Layout”| Path | Purpose |
|---|---|
data/policy-context/canonical-policy-contexts.jsonl | Typed policy-context event fixtures. |
data/policy-context/session-*.jsonl | Stable session-export fixtures captured from the installed-service policy-context export shape. |
data/enforcement/cel/ | CEL conditions consumed by Rust runtime tests. |
data/enforcement/packs/ | Enforcement pack fixtures consumed by capsem-admin. |
data/enforcement/backtest-expected/ | Expected enforcement backtest reports without timing fields. |
data/detection/sigma/ | Sigma-backed detection pack fixtures. |
data/detection/ir/ | Compiled capsem.detection.ir.v1 fixtures. |
data/detection/backtest-expected/ | Expected detection backtest reports without timing fields. |
data/detection/hunt-expected/ | Expected session-backed detection hunt reports and projection-path summaries. |
Policy-context fixtures must use canonical roots such as
http.request.host, http.request.header("authorization").exists(), and
http.request.body.text. Internal event.* and legacy subject.* paths are
test failures. Unknown canonical-looking paths and cross-family roots are also
test failures: the admin enforcement compiler has an explicit family-scoped
allowlist, so a typo like http.request.raw must fail before replay.
Update Order
Section titled “Update Order”-
Add or edit policy-context rows in
data/policy-context/canonical-policy-contexts.jsonl. -
Update enforcement CEL and enforcement packs together:
Terminal window uv run capsem-admin enforcement compile data/enforcement/packs/http-google-secret-enforcement.toml --jsonuv run capsem-admin enforcement backtest data/enforcement/packs/http-google-secret-enforcement.toml --events data/policy-context/canonical-policy-contexts.jsonl --json -
Update detection Sigma and Detection IR together:
Terminal window uv run capsem-admin detection compile data/detection/sigma/google-secret-egress.ymluv run capsem-admin detection backtest data/detection/sigma/google-secret-egress.yml --events data/policy-context/canonical-policy-contexts.jsonl --json -
Refresh the matching expected artifacts under
data/enforcement/backtest-expected/anddata/detection/backtest-expected/. If the change affects session-backed forensic search, refreshdata/detection/hunt-expected/as well. -
When a real VM/session behavior should graduate into the corpus, export the installed service’s typed policy contexts:
Terminal window capsem export-policy-contexts <session-id> > data/policy-context/<name>.jsonlcapsem export-policy-contexts <session-id> --jsonThe JSONL form is for committed fixture rows. The
--jsonform keeps the export envelope withfixture_countfor local inspection. -
Run both language gates:
Terminal window uv run pytest tests/test_admin_cli.py tests/test_security_packs.py tests/test_admin_docs.py tests/test_admin_hygiene.py -qcargo test -p capsem-core --test security_packscargo test -p capsem-security-engine
capsem-admin works offline. It validates public pack schemas, compiles the
admin-supported policy subset, compiles Sigma with pySigma into Detection IR,
and replays fixtures. It is not a substitute for the installed service’s
runtime rule registry.
Rust runtime tests remain the authority for CEL semantics. When a new CEL construct is added, add the fixture first, then add the Rust parity assertion, then decide whether the offline admin subset should support it or reject it with a clear diagnostic.
Expected artifacts omit timing so they stay deterministic. Keep event ids, session ids, rule ids, pack ids, decisions, findings, and matched fields exact. If the expected row changes, both the Python and Rust tests must explain why.