90 Million Characters.
No Shortcuts.
New patterns added every release.
Every number on this page has a source. Every claim has a test behind it. We don't publish accuracy metrics we can't trace back to a specific document, a specific test run, and a specific version of our engine.
The benchmark set expands as new document patterns are reviewed.
Detection Architecture
15 Layers. One Mission.
Nothing Gets Through.
Redactorr doesn't rely on a single detection pass. The AEGIS pipeline runs fifteen verification layers in sequence — intake and classification, twelve specialist engines orchestrated by COMMANDER, then a final validation gate. Every layer has one job. Nothing gets through that hasn't earned it.
Pre-Detection
Intake & Classification
RECON maps every word, number, and structural element to an exact position before detection runs. SENTINEL reads the document structure and classifies what kind of document it is — activating the right specialist engines for what follows.
Orchestrated by Commander
12 Specialist Engines
COMMANDER routes each document through AEGIS’s pipeline stages. Broad identifier scan. Australian-specific formats — TFNs, Medicare, ABNs, BSBs. Industry-specific entities across 36 industry packs. Spatial layout analysis. Named entity recognition. False positive elimination. Each stage owns one thing and does it completely.
Final Gate
WARDEN Validation
Every entity claim passes through WARDEN before redaction. Detections are consolidated, confidence-scored, and context-arbitrated. Nothing is released on a guess. If it hasn't earned its place, it doesn't pass.
Universal Identifiers
Caught across every document, regardless of industry.
Australian-Specific Formats
Formats uniquely Australian and invisible to generic tools.
Industry-Specific Entities
Sensitive information that only appears in specific verticals.
What We Measure
We Measure What Matters — Not Just What Looks Good.
Four metrics. Every test run. They don't just measure what passes — they measure what the engine misses.
“Did we catch everything?”
Of all the sensitive information in a document, what percentage did we actually find? Missing even one entity means a potential compliance breach. This is the metric we optimise for above all others.
“Was each flag correct?”
Of everything we flagged as sensitive, what percentage actually was? High precision means less noise — you spend less time reviewing false alarms.
“The balance of both.”
The harmonic mean of recall and precision. A single number that tells you whether the engine is both thorough and accurate. Every engine must pass our F1 threshold before release.
“Overall correctness.”
Across every character in the document — flagged and unflagged — what percentage did we classify correctly? The broadest measure of detection quality.
How We Train
One Engine Doesn't Fit All Industries.
A hospital discharge summary looks nothing like a construction site permit. The PII in each is different. The formats are different. The stakes are different.
AEGIS is a single detection engine with 36 industry packs, each configured for the document patterns that industry actually produces. Australian identifiers, sector formats, and review expectations are handled without generic templates.
“Configured for Australian document patterns and the formats your industry actually uses.”
How We Test
We Test Against Documents
We Can't Cheat On.
Our control group is a frozen set of documents with known sensitive information — checksummed so they can never be altered. Every improvement to the engine is tested against this exact same group. If accuracy drops on even one document, we investigate before shipping.
Control Group
- Frozen document corpus
- Every entity hand-labelled
- Integrity-verified — never changes
- The benchmark that keeps us honest
“The benchmark that keeps us honest.”
Automated Pipeline
- Automated test runs on every change
- Base engine + domain engine tests
- Regression detection built in
- Accuracy trend tracking over time
“The watchdog that never sleeps.”
Latest Validation Run — Spatial Detection Engine
The numbers from our most recent test cycle.
Every engine rewrite is validated end-to-end before release. These results come from the most recent full validation run — 483 real Australian documents across every supported format.
483
Documents
Latest validation run
3.78M
Characters
Across all document types
6,601
Entities
Detected and verified
94.7%
F1 Score — Structured Forms
Contracts, permits, applications
100%
F1 Score — Narrative
Reports, correspondence, records
0
Failures
No detection collapses
These results are added to the running totals above. Every test run compounds the evidence.
How We Improve
Edge Cases Don't Get Filed Away.
They Get Fixed.
Test
Run the full detection suite against the control group and live documents.
→Analyse
Identify missed entities, false positives, and edge cases by domain.
↓Verify
Regression-test every change against the full control group. No fix ships if it breaks existing detections.
←Improve
Retrain engines, add new patterns, fine-tune detection thresholds.
↑“Every improvement is regression-tested against the full control group. No fix ships if it breaks something that was already working.”
The corpus grows with every cycle
Each testing cycle adds new documents, new entities, and new edge cases. 90 million characters and counting. The engines never stop improving.
For Your Security Team
The AEGIS Engine — Full 15-Layer Methodology
The AEGIS engine runs a deterministic 15-layer pipeline on every document processed through it. Layers L1–L5 and L7–L15 are universal; L6 SPECTRE is domain-gated and activates only when SENTINEL classifies the document as matching a configured industry domain pack. All 15 layers are listed here because a procurement-grade evaluation requires the complete roster. Source: packages/engine/aegis/commander/stage-descriptors.ts
Document intake and structural mapping
RECON ingests the document and builds a coordinate-resolved word map — every token assigned a page position, bounding box, and structural element type — before any detection begins.
Document type classification and specialist routing
SENTINEL classifies the document type and activates the appropriate specialist detection configuration, including the domain gate that controls whether L6 SPECTRE fires.
Broad structured identifier scan
VANGUARD runs a broad identifier pass over normalised document text, targeting contact identifiers and government-format numbers present across document types; candidates are recorded for later fusion.
Australian-specific identifier detection
MARSHAL targets Australian government-issued identifiers — TFN, Medicare, ABN, ACN, BSB, and AHPRA — applying country-specific format validation where applicable.
Service credential and access token detection
CIPHER detects service credentials, API tokens, and platform access keys embedded in document text using tuned patterns across common cloud and SaaS formats, with reduced-confidence output flagged for review.
Domain-gated industry-specific entity detection
SPECTRE applies domain-gated detection for industry-vertical entities — coverage activates when SENTINEL classifies the document as matching the relevant domain pack, and does not run outside its configured scope.
Spatial layout and label-proximity analysis
NAVIGATOR uses the coordinate map from RECON to analyse spatial token relationships — financial identifiers adjacent to field labels receive elevated detection confidence; isolated numbers in running prose do not — backed by the engine's 97%+ measured F1 baseline.
Structured form field extraction
ADJUTANT processes structured input elements — form fields, table cells, checkbox groups — extracting values paired with their field-label context as validated detection candidates, distinct from visually similar values in free text.
Context-aware names — narrative prose
OVERWATCH detects person and organisation names in narrative prose using context-aware signals; marked critical — failure triggers pipeline abort.
Context-aware names — boundary-spanning
TEMPEST runs a second pass to catch names spanning line breaks or chunk boundaries that L9 OVERWATCH may fragment; marked critical.
Context validation — false positive elimination
GUARDIAN validates pooled detection candidates from L3–L10 in context to filter false positives — road names, generic job titles, organisational terms — before fusion; marked critical.
Multi-layer detection fusion
NEXUS consolidates detections from all preceding detection layers (L3–L11) into a unified candidate set — overlapping detections deduplicated, false-negative and false-positive corrections applied, with a safety limit of 5,000 mentions sorted by confidence descending.
Confidence scoring and score emission
HERALD scores the fused detection clusters — weighting by zone (header, body, footer, form field), domain profile, and configuration mode — producing the finalClusters set, each carrying a composite confidence score; nothing proceeds to redaction without a quantified confidence.
Context signal arbitration and confidence adjustment
ARBITER applies co-occurrence signals and contextual rules as the final confidence adjustment — boosting label-adjacent values and suppressing footer noise — logging signal match counts for observability; non-critical, pipeline continues if no signals match.
Allowlist enforcement — final validation gate
WARDEN is the last stage before redaction output — applying the configured allowlist to remove any permitted values from the final set; nothing proceeds to redaction without passing every preceding layer and clearing the allowlist gate.
Verify It Yourself
Open DevTools and watch the Network tab while you process a document. The detection and redaction workflow runs locally in your browser — verify the boundary in your own environment before relying on it for a high-risk workflow.
Architecture diagram and AI-features data-handling scope → /trust“We'd rather flag something that isn't sensitive than miss something that is.”
This is a deliberate engineering decision. Our engines are tuned for recall — catching every instance of sensitive data, even at the cost of occasionally flagging something that turns out to be harmless.
For regulated industries, a missed detection isn't an inconvenience. It's a compliance failure. You always have the final say — review, accept, or dismiss any detection before redaction is applied.
Recall-First. Compliance-Driven. Human-Verified.
See For Yourself
See It Run.
Upload a document. All 15 verification layers run live in your browser. No upload, no wait.