Skip to content
Accuracy Methodology

90 Million Characters.
No Shortcuts.

New patterns added every release.

Every number on this page has a source. Every claim has a test behind it. We don't publish accuracy metrics we can't trace back to a specific document, a specific test run, and a specific version of our engine.

0Characters Analysed
0+Entities Detected
0+Real-World Documents Tested
0Industries Covered

The benchmark set expands as new document patterns are reviewed.

Detection Architecture

15 Layers. One Mission.
Nothing Gets Through.

Redactorr doesn't rely on a single detection pass. The AEGIS pipeline runs fifteen verification layers in sequence — intake and classification, twelve specialist engines orchestrated by COMMANDER, then a final validation gate. Every layer has one job. Nothing gets through that hasn't earned it.

Pre-Detection

Intake & Classification

RECON maps every word, number, and structural element to an exact position before detection runs. SENTINEL reads the document structure and classifies what kind of document it is — activating the right specialist engines for what follows.

L1–L2AEGIS Pipeline Phase
feeds into

Orchestrated by Commander

12 Specialist Engines

COMMANDER routes each document through AEGIS’s pipeline stages. Broad identifier scan. Australian-specific formats — TFNs, Medicare, ABNs, BSBs. Industry-specific entities across 36 industry packs. Spatial layout analysis. Named entity recognition. False positive elimination. Each stage owns one thing and does it completely.

L3–L14AEGIS Pipeline Phase
builds on

Final Gate

WARDEN Validation

Every entity claim passes through WARDEN before redaction. Detections are consolidated, confidence-scored, and context-arbitrated. Nothing is released on a guess. If it hasn't earned its place, it doesn't pass.

L15AEGIS Pipeline Phase
15 AEGIS layers — three pipeline phases, one pass. Every detection runs client-side, entirely in your browser.100%

Universal Identifiers

Caught across every document, regardless of industry.

NamesEmailsPhone NumbersAddressesDatesOrganisationsand more — the identifiers present in every Australian professional document.

Australian-Specific Formats

Formats uniquely Australian and invisible to generic tools.

TFNMedicareABNBSBACNAU Addresses+ 40 more AU formats

Industry-Specific Entities

Sensitive information that only appears in specific verticals.

Policy NumbersClaim IDsPatient IDsCase NumbersPermit IDs+ varies per industry across 36 industry packs

What We Measure

We Measure What Matters — Not Just What Looks Good.

Four metrics. Every test run. They don't just measure what passes — they measure what the engine misses.

“For regulated industries, a missed detection isn't an inconvenience — it's a compliance failure.”
RecallNorth Star

“Did we catch everything?”

Of all the sensitive information in a document, what percentage did we actually find? Missing even one entity means a potential compliance breach. This is the metric we optimise for above all others.

Precision

“Was each flag correct?”

Of everything we flagged as sensitive, what percentage actually was? High precision means less noise — you spend less time reviewing false alarms.

F1 ScoreQuality Gate

“The balance of both.”

The harmonic mean of recall and precision. A single number that tells you whether the engine is both thorough and accurate. Every engine must pass our F1 threshold before release.

Accuracy

“Overall correctness.”

Across every character in the document — flagged and unflagged — what percentage did we classify correctly? The broadest measure of detection quality.

How We Train

One Engine Doesn't Fit All Industries.

A hospital discharge summary looks nothing like a construction site permit. The PII in each is different. The formats are different. The stakes are different.

AEGIS is a single detection engine with 36 industry packs, each configured for the document patterns that industry actually produces. Australian identifiers, sector formats, and review expectations are handled without generic templates.

“Configured for Australian document patterns and the formats your industry actually uses.”

How We Test

We Test Against Documents
We Can't Cheat On.

Our control group is a frozen set of documents with known sensitive information — checksummed so they can never be altered. Every improvement to the engine is tested against this exact same group. If accuracy drops on even one document, we investigate before shipping.

Control Group

  • Frozen document corpus
  • Every entity hand-labelled
  • Integrity-verified — never changes
  • The benchmark that keeps us honest

“The benchmark that keeps us honest.”

Automated Pipeline

  • Automated test runs on every change
  • Base engine + domain engine tests
  • Regression detection built in
  • Accuracy trend tracking over time

“The watchdog that never sleeps.”

Latest Validation Run — Spatial Detection Engine

The numbers from our most recent test cycle.

Every engine rewrite is validated end-to-end before release. These results come from the most recent full validation run — 483 real Australian documents across every supported format.

483

Documents

Latest validation run

3.78M

Characters

Across all document types

6,601

Entities

Detected and verified

94.7%

F1 Score — Structured Forms

Contracts, permits, applications

100%

F1 Score — Narrative

Reports, correspondence, records

0

Failures

No detection collapses

These results are added to the running totals above. Every test run compounds the evidence.

How We Improve

Edge Cases Don't Get Filed Away.
They Get Fixed.

Test

Run the full detection suite against the control group and live documents.

Analyse

Identify missed entities, false positives, and edge cases by domain.

Verify

Regression-test every change against the full control group. No fix ships if it breaks existing detections.

Improve

Retrain engines, add new patterns, fine-tune detection thresholds.

CONTINUOUS

“Every improvement is regression-tested against the full control group. No fix ships if it breaks something that was already working.”

The corpus grows with every cycle

Each testing cycle adds new documents, new entities, and new edge cases. 90 million characters and counting. The engines never stop improving.

For Your Security Team

The AEGIS Engine — Full 15-Layer Methodology

The AEGIS engine runs a deterministic 15-layer pipeline on every document processed through it. Layers L1–L5 and L7–L15 are universal; L6 SPECTRE is domain-gated and activates only when SENTINEL classifies the document as matching a configured industry domain pack. All 15 layers are listed here because a procurement-grade evaluation requires the complete roster. Source: packages/engine/aegis/commander/stage-descriptors.ts

Pre-Detection
Orchestrated by COMMANDER
Fusion
Final Gate
Pre-Detection
L1
RECON

Document intake and structural mapping

RECON ingests the document and builds a coordinate-resolved word map — every token assigned a page position, bounding box, and structural element type — before any detection begins.

Pre-detectionCoordinate mappingStructural element captureNo detection output
L2
SENTINEL

Document type classification and specialist routing

SENTINEL classifies the document type and activates the appropriate specialist detection configuration, including the domain gate that controls whether L6 SPECTRE fires.

Document classificationSpecialist routingIndustry contextSPECTRE gate
Orchestrated by COMMANDER
L3
VANGUARD

Broad structured identifier scan

VANGUARD runs a broad identifier pass over normalised document text, targeting contact identifiers and government-format numbers present across document types; candidates are recorded for later fusion.

Phone numbersEmail addressesDates of birthGovernment identifiersBroad patterns
L4
MARSHAL

Australian-specific identifier detection

MARSHAL targets Australian government-issued identifiers — TFN, Medicare, ABN, ACN, BSB, and AHPRA — applying country-specific format validation where applicable.

TFNMedicareABN / ACNBSBAHPRAAU formatsFormat validation
L5
CIPHER

Service credential and access token detection

CIPHER detects service credentials, API tokens, and platform access keys embedded in document text using tuned patterns across common cloud and SaaS formats, with reduced-confidence output flagged for review.

API tokensAccess keysPlatform secretsCredential patterns
L6
SPECTRE

Domain-gated industry-specific entity detection

SPECTRE applies domain-gated detection for industry-vertical entities — coverage activates when SENTINEL classifies the document as matching the relevant domain pack, and does not run outside its configured scope.

Domain-gatedIndustry entitiesScope-aware detectionSENTINEL-activated
L7
NAVIGATOR

Spatial layout and label-proximity analysis

NAVIGATOR uses the coordinate map from RECON to analyse spatial token relationships — financial identifiers adjacent to field labels receive elevated detection confidence; isolated numbers in running prose do not — backed by the engine's 97%+ measured F1 baseline.

BSB numbersAccount numbersLabel-proximity scoringPage coordinatesSpatial analysis
L8
ADJUTANT

Structured form field extraction

ADJUTANT processes structured input elements — form fields, table cells, checkbox groups — extracting values paired with their field-label context as validated detection candidates, distinct from visually similar values in free text.

Form labelsStructured tablesCheckbox groupsField-confirmed values
L9
OVERWATCH

Context-aware names — narrative prose

OVERWATCH detects person and organisation names in narrative prose using context-aware signals; marked critical — failure triggers pipeline abort.

Person namesOrganisation namesContext-aware detectionCritical layer
L10
TEMPEST

Context-aware names — boundary-spanning

TEMPEST runs a second pass to catch names spanning line breaks or chunk boundaries that L9 OVERWATCH may fragment; marked critical.

Line-break spanningSecond passBoundary-split namesCritical layer
L11
GUARDIAN

Context validation — false positive elimination

GUARDIAN validates pooled detection candidates from L3–L10 in context to filter false positives — road names, generic job titles, organisational terms — before fusion; marked critical.

Context-based filteringConfidence adjustmentFalse positive removalCritical layer
Fusion
L12
NEXUS

Multi-layer detection fusion

NEXUS consolidates detections from all preceding detection layers (L3–L11) into a unified candidate set — overlapping detections deduplicated, false-negative and false-positive corrections applied, with a safety limit of 5,000 mentions sorted by confidence descending.

Multi-layer fusionDeduplicationFalse-negative injectionFalse-positive correctionCritical layer
L13
HERALD

Confidence scoring and score emission

HERALD scores the fused detection clusters — weighting by zone (header, body, footer, form field), domain profile, and configuration mode — producing the finalClusters set, each carrying a composite confidence score; nothing proceeds to redaction without a quantified confidence.

Confidence scoringZone-weighted signalsDomain profile adjustmentFinal cluster outputCritical layer
L14
ARBITER

Context signal arbitration and confidence adjustment

ARBITER applies co-occurrence signals and contextual rules as the final confidence adjustment — boosting label-adjacent values and suppressing footer noise — logging signal match counts for observability; non-critical, pipeline continues if no signals match.

Co-occurrence signalsConfidence boostsConfidence suppressionsContext arbitration
Final Gate
L15
WARDEN

Allowlist enforcement — final validation gate

WARDEN is the last stage before redaction output — applying the configured allowlist to remove any permitted values from the final set; nothing proceeds to redaction without passing every preceding layer and clearing the allowlist gate.

Allowlist enforcementFinal validationUser-configured exclusionsReviewed output only

Verify It Yourself

Open DevTools and watch the Network tab while you process a document. The detection and redaction workflow runs locally in your browser — verify the boundary in your own environment before relying on it for a high-risk workflow.

Architecture diagram and AI-features data-handling scope → /trust

“We'd rather flag something that isn't sensitive than miss something that is.”

This is a deliberate engineering decision. Our engines are tuned for recall — catching every instance of sensitive data, even at the cost of occasionally flagging something that turns out to be harmless.

For regulated industries, a missed detection isn't an inconvenience. It's a compliance failure. You always have the final say — review, accept, or dismiss any detection before redaction is applied.

Recall-First. Compliance-Driven. Human-Verified.

See For Yourself

See It Run.

Upload a document. All 15 verification layers run live in your browser. No upload, no wait.