Still Building

We Built This Because
Nobody Else Would.

The story of how a frustrated observation about AI privacy turned into a spatial detection engine that reads 600+ entity types across 29 industries, all without your documents ever leaving your computer. Told through the milestones that mattered and the problems that almost broke us.

Chapter 01

How We Got Here

Thousands of improvements. Every one driven by a real document that deserved better protection.

Scroll to follow the trail
MILESTONE 01: THE SPARK

Day One

It started with a simple question: what if you could strip sensitive data from a document before it ever left your desk? We picked a country, picked a document type, and started writing the rules. We had no idea how deep this would go.

MILESTONE 02: THE REALITY CHECK

Not Good Enough

Five days in, we ran a real insurance claim through the system. It found the email address and the phone number. It missed the policy number, the claim reference, and three people’s names. So we tore the whole thing apart and started over. The second version caught what the first one couldn’t.

MILESTONE 03: THE PROMISE

Nothing Leaves Your Machine

Most redaction tools send your documents to a server somewhere. We refused. We moved everything into the browser itself. Your files never leave your computer. Not to us, not to anyone. This was the moment the product became what it is.

See our privacy architecture →
MILESTONE 04: GOING GLOBAL

50 Countries. One Tool.

A Tax File Number in Australia looks nothing like a Social Security number in the US or an NHS number in the UK. Every country has its own ID formats, its own regulations, its own edge cases. We built recognition for 50+ of them, because your documents don’t stay inside one border.

MILESTONE 05: THE GREY AREA

How Sure Is Sure Enough?

Is "123-456-789" a phone number, a case reference, or a government ID? A blunt tool just highlights everything and leaves you to sort it out. We added confidence scoring. Now the system tells you how certain it is about each detection, so you can make the call. Your judgement, informed by ours.

?
MILESTONE 06: THE WALL

Teaching It What to Ignore

The system was catching too much. Street addresses in the middle of paragraphs. Common words flagged as names. "Victoria" the state, not "Victoria" the person. We spent weeks on over a thousand targeted improvements, teaching it the difference between sensitive data and ordinary text. We broke it twice along the way. Worth it.

MILESTONE 07: THE BREAKTHROUGH

Knowing, Not Guessing

A random 9-digit number is just a number. But a real Tax File Number follows a specific mathematical formula. We built validators that check whether an ID is structurally genuine, not just whether it looks like one. The difference matters. This single change cut false flags in half.

937
MILESTONE 08: LAYER BY LAYER

8 Layers of Verification

A single scan catches maybe 40% of the sensitive data in a document. That’s not good enough when your client’s Medicare number is on the line. So we built eight verification layers. Each one catches what the last one missed. By the eighth pass, what’s flagged is real, and what’s clean is actually clean.

See how our 8 layers work →
8
MILESTONE 09: 3,000 AND COUNTING

3,000 Things It Recognises

Medicare numbers. BSBs. ABNs. Passport numbers. Licence plates. Medical record IDs. Insurance claim references. Internal case numbers. Each one researched against real documents, each one validated. The library quietly crossed 3,000 recognised data types.

3K
MILESTONE 10: ACROSS BORDERS

UK. Canada. New Zealand. Ireland.

An Australian medical referral is not a British one. A Canadian tax return has different identifiers than a New Zealand one. We added dedicated country configurations, each one tested against real document formats from that jurisdiction. Not textbook examples. The real thing.

MILESTONE 11: THE OBSESSION

Starting Over. Again.

Good enough wasn’t good enough. We redesigned the entire detection system from scratch. Industry-specific configurations. Smarter analysis. A completely new approach. The fourth major version. The one that finally works the way documents actually work.

V4
MILESTONE 12: YOUR INDUSTRY, YOUR RULES

29 Industry-Specific Engines

An insurance claim has different sensitive data than a legal contract. A medical referral has nothing in common with a construction invoice. We built 29 industry-specific engines: legal, healthcare, government, finance, HR, insurance, and more. Each one tuned for exactly the documents your team handles every day.

Explore all 29 industries →
29 engines
MILESTONE 13: THE FINAL ARCHITECTURE

Back to the Drawing Board. One Last Time.

We tore the whole thing apart again. Five detection layers, each doing one thing ruthlessly well. Three detection layers for speed and precision. A contextual recognition layer for the patterns rules can’t catch. And 29 industry-trained engines, each scoring above 0.997 F1. 600+ entity types. This is the architecture we were always building toward.

V4
Chapter 02

Why This Exists

We watched a colleague paste production server logs straight into ChatGPT. They needed help writing a SQL query. Reasonable enough, except those logs contained customer names, IP addresses, and database credentials. All of it, sent to a cloud server, stored who-knows-where.

The next week, a director composed a customer response using AI. The entire complaint email (name, account number, billing address, the whole thing) pasted in as context. They weren't being reckless. They were trying to do their job faster, and the AI was genuinely good at it.

That's the thing. They weren't careless. They needed AI to help with real work, and real work contains real data. There was no practical way to strip the sensitive parts first. We come from infrastructure. We know exactly where that data goes.

The “redaction tools” that existed were either server-side (the irony, trusting another cloud with your data) or basic keyword matching that missed everything that wasn't an obvious email address. A Tax File Number? A BSB? A person's name mid-paragraph? Invisible.

There had to be a better way. One that runs entirely in the browser. One where nothing, not a single character, ever leaves your machine.

Chapter 03

The First Attempt

The first commit. Country selection, document type patterns, and a detection engine that could find phone numbers and email addresses. It was basic. It was slow. But it worked. Kind of.

The first real test was humbling. We ran a genuine insurance claim through the engine. It found the obvious things: the email, the phone number. It completely missed the policy number, the claim reference, the BSB, the Medicare number, and three people's names. Pattern matching alone catches maybe 40% of the sensitive data in a real document. The other 60% blends in, hiding in plain text where pattern matching will never find it.

Honest Assessment, Week 1
~40%
Detection rate
High
False positive rate
Stubborn
Determination level

Two days later, we ran a legal contract through the updated engine and it flagged a Tax File Number that would have gone straight into an AI prompt. That was the first moment we knew this mattered. Not because the technology was impressive. Because the absence of this tool was genuinely dangerous.

Chapter 04

The Hard Problems

Every breakthrough was preceded by a wall. These are the questions that kept us up at night, and the ones that, eventually, made the product what it is.

?

How do you detect a person's name without sending it to a server?

We found a way to identify names and sensitive entities entirely in the browser. No API calls. No data leaves. The entire screening engine runs inside your browser.

+
#

How do you tell the difference between a random 9-digit number and a Tax File Number?

Mathematics. Real government IDs follow specific checksum formulas. We validate the structure, not just the shape. A random number fails the checksum. A real TFN passes.

+
&

How do you handle 50+ countries, each with their own formats?

We built dedicated country packs. Not one-size-fits-all patterns, but region-specific validators that understand the difference between an Australian Medicare number and a UK NHS number.

+
!

How do you stop false positives from making the tool useless?

You verify at eight different levels. Each layer acts as a filter for the last. Pattern match, then validate the structure, then check the context, then cross-reference with other detections. By the eighth layer, what's left is real.

+
*

How do you make a detection engine that understands insurance documents differently from legal documents?

You build 29 of them. Purpose-built engines, each tuned for a specific industry. What matters in a construction invoice is different from what matters in a medical referral.

+
^

How do you handle the things no engine can predict?

Every business has proprietary codes, internal references, and custom identifiers that no pre-built engine will ever know about. You let users build their own patterns and manually redact anything else. The engine proposes. You decide. Custom patterns for your business, manual redaction for surgical precision.

+
Chapter 05

What We Believe

Nothing leaves your machine.

Ever.

Not the document, not the detections, not the metadata. Everything runs in your browser. We don't have a server to send data to even if we wanted to.

Detection without compromise.

8 layers, not 1.

A single detection pass is guesswork. Eight verification layers is rigour. We chose rigour.

Your industry, your rules.

Purpose-built, not generic.

A legal firm and a hospital handle completely different data. The detection engine should know that.

The engine proposes.

You decide.

Every detection is surfaced for review. Accept it, reject it, reclassify it, or add your own. Manual redaction for anything the engine misses. Custom patterns for your business-specific identifiers. Your document, your rules.

Chapter 06

The Final Iteration

We went back to the drawing board. Again. Because close enough isn't good enough when you're handling someone's most sensitive data.

The previous architecture worked. It caught most things. But “most things” is a dangerous phrase when the thing you missed is someone's Medicare number sitting in paragraph four of an insurance claim. We kept asking the same question: is this the best we can do? And the answer kept being no.

So we scrapped the single-pass approach entirely and rebuilt around a layered screening system. Each checkpoint does one thing, and does it well. The checkpoints don't compete — they compound. What one misses, the next catches.

This is not another pivot. This is the architecture we were always building toward. Every false positive we fought, every edge case we debugged, every domain we studied — it all led here.

How It Screens
01
Core Screening
First pass. Thousands of format rules, checksum validators, and structural checks. Fast, precise, zero ambiguity. If a Tax File Number passes the mathematical check, it’s a Tax File Number.
02
Australian Identifiers
Jurisdiction-specific screening. Tax File Numbers, Medicare cards, ABNs, BSBs, driver’s licences — every format unique to this country, validated against real-world registries bundled in the engine.
03
Industry Screening Bundle
Pre-compiled protocols for your industry. Insurance claim references, medical record IDs, construction certifications — the identifiers your colleagues see every day, handled automatically.
04
Context Verification
600+ entity types that format checks alone can’t catch. Names in context. Addresses that don’t follow templates. Whether “Victoria” is a state or a person depends on what’s around it.
05
Deep Domain Analysis
29 industry-specific screening protocols. Each one built from real documents in that vertical. The final checkpoint. The one that catches everything else.
The Numbers
151ms
Avg. Screening Time
29
Industry Bundles
600+
Entity Types
100%
Runs In Your Browser

The first three checkpoints are pure rules — format validators, checksum checks, and jurisdiction-specific registries. They run instantly. They don't guess. If a Tax File Number passes the mathematical check, it's a Tax File Number. No ambiguity.

Checkpoint four handles the grey area — the names, the addresses, the entities that only make sense in context. “Victoria” the state vs “Victoria” the person. 600+ entity types that format rules alone can't resolve.

Checkpoint five is where it all comes together. Twenty-nine industry screening bundles, each built from real documents in its field. Healthcare. Legal. Construction. Finance. Not a general-purpose tool hoping for the best — twenty-nine specialists, each one built for its domain.

Everything still runs in your browser. Nothing leaves your machine. The architecture changed. The promise didn't.

Chapter 07

Where We're Going

More industries. More languages. More countries. The list of sensitive data types grows every time a new regulation drops or a new industry reaches out. We're building for all of them.

What won't change: everything stays in the browser. Everything stays private. The detection gets better every week, but the architecture stays true to the promise we made on day one. Your data is yours. Period.

The Next Peaks

We know exactly where the trail leads. Here's what's over the horizon.

Google Drive
Select files from your Drive, redact in-browser, save back. No downloads required.
SharePoint & OneDrive
Enterprise document workflows. Connect your SharePoint and redact before sharing.
Bulk Processing
Upload 100 files, get 100 redacted versions. Process entire folders in one pass.
API Access
Integrate Redactorr's detection engine into your own pipelines. Programmatic redaction for automation.
More Languages
Multilingual document redaction. Because sensitive data isn't written in English only.
More Countries
Beyond 50+ jurisdictions. Region-specific patterns for wherever your documents come from.
Team Workspaces
Shared engines, shared patterns, team-wide redaction policies. Privacy as a team sport.
Audit & Compliance
Detailed logs of what was redacted, when, and by whom. Enterprise compliance, out of the box.

Want to help shape what comes next? We're listening.

Try Redactorr Free