Skip to content
Private Preview
Join Waitlist →
Back to journal
DevOps note

Building Custom Detection Patterns with YAML

Learn how to define industry-specific or proprietary PII patterns using Redactorr's YAML-based pattern builder.

Redactorr Team12 min read

While Redactorr includes 3,000+ built-in detection rules across the 15-layer AEGIS engine, every organisation has unique PII requirements. Custom patterns let you detect proprietary identifiers, internal codes, and industry-specific data.

Why Custom Patterns?

Common use cases:

  • Internal IDs: Employee numbers, asset tags, ticket IDs
  • Industry-specific: Policy numbers, claim IDs, loan numbers
  • Proprietary formats: Custom date formats, internal codes
  • Business logic: Pricing rules, customer tiers, territory codes

YAML Pattern Structure

Custom patterns use a simple YAML format:

name: Employee ID
category: internal
description: Company employee identification numbers
pattern: "EMP-\\d{6}"
examples:
  - "EMP-123456"
  - "EMP-987654"
confidence: high
replacement: "[REDACTED_EMPLOYEE_ID]"

Required Fields

  • name: Human-readable pattern name
  • category: Group (e.g., "internal", "customer", "financial")
  • pattern: Regular expression (JavaScript flavor)
  • examples: Test cases for validation
  • replacement: Redaction placeholder text

Optional Fields

  • description: Pattern documentation
  • confidence: Detection confidence (low, medium, high)
  • context: Context-aware rules (see below)
  • validation: Custom validation logic

Pattern Examples

1. Custom ID Format

name: Customer Reference Number
category: customer
pattern: "CRN-[A-Z]{2}-\\d{8}"
examples:
  - "CRN-US-12345678"
  - "CRN-CA-87654321"
confidence: high
replacement: "[REDACTED_CUSTOMER_REF]"

2. Internal Ticket System

name: Support Ticket ID
category: internal
pattern: "TICKET-\\d{4}-\\d{6}"
examples:
  - "TICKET-2024-123456"
  - "TICKET-2023-987654"
confidence: high
replacement: "[REDACTED_TICKET]"
context:
  preceding: ["ticket", "case", "issue"]

3. Custom Date Format

name: Internal Date Code
category: internal
description: YYYYMMDD-LOC format for shipment tracking
pattern: "\\d{8}-[A-Z]{3}"
examples:
  - "20241225-NYC"
  - "20240101-LAX"
confidence: medium
replacement: "[REDACTED_SHIPMENT]"

4. Industry-Specific Code

name: Insurance Policy Number
category: financial
description: Custom format for PolicyCo policies
pattern: "POL-\\d{4}-[A-Z]-\\d{6}"
examples:
  - "POL-2024-A-123456"
  - "POL-2023-B-987654"
confidence: high
replacement: "[REDACTED_POLICY]"
validation:
  checksum: format_validation

Context-Aware Patterns

Use context rules to reduce false positives:

name: Account Number
category: financial
pattern: "\\d{10,12}"
confidence: medium
replacement: "[REDACTED_ACCOUNT]"
context:
  preceding: ["account", "acct", "account number", "account #"]
  following: ["balance", "statement", "transaction"]
  window: 20  # tokens before/after

Result: Only matches "123456789012" when near keywords like "account" or "balance"

Validation Functions

Add custom validation to patterns:

Checksum Validation

name: Credit Card
pattern: "\\d{4}-\\d{4}-\\d{4}-\\d{4}"
validation:
  checksum: format_validation

Format Validation

name: Custom Date
pattern: "\\d{8}"
validation:
  format: date
  format_string: "YYYYMMDD"

Range Validation

name: Employee Number
pattern: "EMP-\\d{6}"
validation:
  range:
    min: 100000
    max: 999999

Importing Custom Patterns

Method 1: Upload YAML File

  1. Navigate to Patterns page
  2. Click Import Custom Patterns
  3. Upload your YAML file
  4. Review and enable patterns

Method 2: Use Pattern Builder

  1. Navigate to Pattern Builder tool
  2. Fill in the form (name, category, pattern, etc.)
  3. Test against examples
  4. Save to your pattern library

Testing Custom Patterns

Use the built-in tester to validate patterns:

name: Test Pattern
pattern: "TEST-\\d{4}"
examples:
  # Should match
  - input: "Ticket TEST-1234 was closed"
    expected: "Ticket [REDACTED_TEST] was closed"

  # Should NOT match
  - input: "Test results: PASS"
    expected: "Test results: PASS"

Redactorr validates:

  • ✅ All examples match correctly
  • ✅ No unintended matches (false positives)
  • ✅ Performance impact (< 10ms added latency)

Pattern Library Management

Organizing Patterns

Create pattern collections by use case:

collections:
  - name: "Internal Systems"
    patterns:
      - employee_id
      - ticket_id
      - asset_tag

  - name: "Customer Data"
    patterns:
      - customer_ref
      - account_number
      - policy_number

Version Control

Track pattern changes over time:

name: Employee ID
version: 2
changelog:
  - version: 2
    date: 2024-12-01
    changes: "Added support for 7-digit IDs"
  - version: 1
    date: 2024-01-01
    changes: "Initial pattern"

Performance Considerations

  • Keep patterns specific: Broad patterns (e.g., \\d+) slow down detection
  • Use anchors: Start patterns with unique prefixes
  • Test performance: Redactorr shows latency impact for each pattern
  • Limit context windows: Large context windows increase processing time

Recommended limits:

  • Max 100 custom patterns per workspace
  • Max 50 tokens for context windows
  • Max 500 characters per pattern

Enterprise Features

Shared Pattern Libraries

Teams can share custom patterns across workspaces:

  • Centralized pattern repository
  • Role-based access (view, edit, admin)
  • Approval workflows for new patterns
  • Audit logs for pattern changes

Conclusion

Custom patterns extend Redactorr to handle any PII format, from internal IDs to proprietary codes. With YAML-based definitions, context-aware rules, and built-in testing, you can confidently detect organisation-specific sensitive data.

Ready to build? Try the Pattern Builder tool or explore our template library.

Next step

Test the workflow with your own sample.

Use the checker to inspect sensitive context before copying reviewed output into the next workflow.

Try the checkerBack to blog
Share

Copy a link to this article for a teammate.

Related notes