Automated PDF Compliance Test Harness
Multi-layer validation framework that programmatically tests 300 PDF documents against structural and content compliance rules — achieving 100% true positive and true negative rates across a controlled ground-truth corpus — with per-file failure diagnosis and a colour-coded Excel compliance report.
The Challenge
Automated document generation pipelines produce outputs at scale — but without systematic validation, defects accumulate silently. A PDF generator producing thousands of letters, certificates, or regulated notices may drift from specification without triggering any visible error: spacing breaks, required fields go missing, formatting rules are violated.
The question is not whether the documents were generated, but whether they are compliant. Answering that reliably requires a test harness that defines compliance precisely, executes deterministically, and distinguishes true defects from false alarms — the same standard applied to any quality-critical system.
This project built exactly that: a two-layer automated framework capable of parsing generated PDFs, asserting compliance against a defined specification, and reporting results at both aggregate and per-file granularity.
Why this matters for regulated environments: In financial services, insurance, and legal contexts, document compliance is not a cosmetic concern — it is an audit and regulatory obligation. Manual spot-checking at volume is statistically insufficient. Automated harnesses with quantified false-positive and false-negative rates are the only defensible approach.
Approach
Results
The harness achieved perfect classification across all 300 documents — 210 true positives and 90 true negatives, with zero false positives and zero false negatives. The two validation layers were complementary: structural validation (y-gap) flagged 58 failures, content integrity flagged 90, with the combined verdict correctly identifying all defective documents.
Failure diagnosis identified the most common root causes, enabling the document generator to be targeted for fixes rather than requiring broad rework. Weather-line content was the highest-frequency failure mode (74 instances), followed by salutation format errors (58) — findings that directly informed prioritised corrections upstream.
| Failure Mode | Count |
|---|---|
| Weather line 1 — missing required phrase | 74 |
| Weather line 2 — missing required phrase | 74 |
| Salutation — incorrect format or missing | 58 |
| Order number — absent or malformed | 36 |
| Date — absent or malformed | 36 |
| Property line — absent or malformed | 36 |
| Phone — absent or incorrect | 20 |
| Website — absent or incorrect | 20 |
| Bold formatting — expected markers missing | 16 |
Relevance to Production Contexts
This harness demonstrates capabilities that translate directly to production quality engineering: structured test design with explicit ground truth, multi-layer validation with independent failure modes, per-document failure attribution rather than aggregate-only reporting, and machine-readable output (Excel) for downstream audit processes.
The same architecture — parse, assert, report — scales to any document type where compliance rules can be formalised: regulated financial notices, insurance certificates, legal contracts, or automated correspondence systems. The pattern is domain-agnostic; only the assertion rules change.
For organisations running AI-assisted document generation pipelines, this kind of post-generation compliance testing is the QA layer that makes the pipeline auditable — not just functional.