Accuracy Methodology
90 Million Characters.
Zero Shortcuts.
And growing with every training cycle.
Every number on this page has a source. Every claim has a test behind it. We don't publish accuracy metrics we can't trace back to a specific document, a specific test run, and a specific version of our engine.
These numbers grow with every training cycle. Our engines never stop learning.
Detection Architecture
Three Engines. One Mission.
Nothing Gets Through.
Redactorr doesn't rely on a single detection pass. Three purpose-built engines work in concert — each responsible for a different layer of sensitive information. Together, they deliver comprehensive detection that a single engine never could.
Layer 3 — Industry-Specific
Domain Engine
29 industry configurations, one per vertical. Insurance claims, medical records, legal contracts, construction permits — each domain has sensitive information that generic detection misses.
Layer 2 — Australian-Specific
Country Engine
Built for Australian entity formats. TFNs, Medicare numbers, ABNs, BSBs, ACNs, Australian addresses, state-specific formats. This is what makes Redactorr Australian-first, not a US product with AU bolted on.
Layer 1 — Universal Foundation
Core Engine
Universal PII detection. Catches names, emails, phones, addresses, dates — the sensitive information that exists in every document, regardless of industry. Everything else builds on this.
Core Engine Catches
The universal PII that exists in every document, everywhere.
Country Engine Catches
The formats that are uniquely Australian and invisible to generic tools.
Domain Engine Catches
The sensitive information that only appears in specific industries.
What We Measure
We Measure What Matters — Not Just What Looks Good.
Four metrics govern every test run. Together they tell the full story of detection quality — not just the parts that look impressive.
“Did we catch everything?”
Of all the sensitive information in a document, what percentage did we actually find? Missing even one entity means a potential compliance breach. This is the metric we optimise for above all others.
“Was each flag correct?”
Of everything we flagged as sensitive, what percentage actually was? High precision means less noise — you spend less time reviewing false alarms.
“The balance of both.”
The harmonic mean of recall and precision. A single number that tells you whether the engine is both thorough and accurate. Every engine must pass our F1 threshold before release.
“Overall correctness.”
Across every character in the document — flagged and unflagged — what percentage did we classify correctly? The broadest measure of detection quality.
How We Train
One Engine Doesn't Fit All Industries.
A hospital discharge summary looks nothing like a construction site permit. The PII in each is different. The formats are different. The stakes are different.
Each of Redactorr's 29 industry engines is trained on the document types that industry actually produces. Not synthetic data. Not templates. Real Australian documents in the formats your team works with every day.
“Real Australian documents. Not synthetic data. Not templates. Trained on the formats your industry actually uses.”
How We Test
We Test Against Documents
We Can't Cheat On.
Our control group is a frozen set of documents with known sensitive information — checksummed so they can never be altered. Every improvement to the engine is tested against this exact same group. If accuracy drops on even one document, we investigate before shipping.
Control Group
- Frozen document corpus
- Every entity hand-labelled
- Integrity-verified — never changes
- The benchmark that keeps us honest
“The benchmark that keeps us honest.”
Automated Pipeline
- Automated test runs on every change
- Base engine + domain engine tests
- Regression detection built in
- Accuracy trend tracking over time
“The watchdog that never sleeps.”
Latest Validation Run — Spatial Detection Engine
The numbers from our most recent test cycle.
Every engine rewrite is validated end-to-end before release. These results come from the most recent full validation run — 483 real Australian documents across every supported format.
483
Documents
Latest validation run
3.78M
Characters
Across all document types
6,601
Entities
Detected and verified
94.7%
F1 — Structured Forms
Contracts, permits, applications
100%
F1 — Narrative
Reports, correspondence, records
0
Failures
Zero detection collapses
These results are added to the running totals above. Every test run compounds the evidence.
How We Improve
Every Edge Case Makes
the Engine Stronger.
Test
Run the full detection suite against the control group and live documents.
→Analyse
Identify missed entities, false positives, and edge cases by domain.
↓Verify
Regression-test every change against the full control group. No fix ships if it breaks existing detections.
←Improve
Retrain engines, add new patterns, fine-tune detection thresholds.
↑“Every improvement is regression-tested against the full control group. No fix ships if it breaks something that was already working.”
The corpus grows with every cycle
Each testing cycle adds new documents, new entities, and new edge cases. 90 million characters and counting. The engines never stop improving.
“We'd rather flag something that isn't sensitive than miss something that is.”
This is a deliberate engineering decision. Our engines are tuned for recall — catching every instance of sensitive data, even at the cost of occasionally flagging something that turns out to be harmless.
For regulated industries, a missed detection isn't an inconvenience. It's a compliance failure. You always have the final say — review, accept, or dismiss any detection before redaction is applied.
Recall-First. Compliance-Driven. Human-Verified.
See For Yourself
Ready to See the Proof?
Upload a document and watch three engines work together — live in your browser.