Back to Projects
Applied AI / RAG Architecture

Private Document Analysis AI

A privacy-first document intelligence system that answers questions about uploaded documents and spreadsheets using a three-tier query architecture. Deterministic logic handles what it can, a schema engine handles what it should, and a local LLM with RAG handles the rest. No data ever leaves the server.

Type
Client Delivery
Domain
Document Intelligence
Infrastructure
RunPod (NVIDIA A40, 48 GB)
Status
Delivered
PRIVATE DOCUMENT AI
3-TIER RAG
T1
DETERMINISTIC • INSTANT • 100% ACCURATE
T2
SCHEMA ENGINE • NO AI
T3
AI + RAG • 6 SPECIALIST ROUTES

The Challenge

Organisations handling sensitive documents - legal transcripts, membership records, internal reports - need to query and analyse that data without sending it to third-party AI services. Off-the-shelf tools like ChatGPT require uploading content to external servers, which is unacceptable when confidentiality is non-negotiable.

The system needed to run entirely on private infrastructure, accept a wide range of file formats (spreadsheets, PDFs, Word documents, images), answer questions with verifiable accuracy where possible, and only invoke AI when simpler methods genuinely could not answer the question.

Approach

01
Three-Tier Query Architecture
Designed a layered system where questions pass through three tiers of increasing complexity. Tier 1 uses 16 deterministic patterns (counting, filtering, ranking) for instant, hallucination-free answers. Tier 2 applies a schema engine that learns column types, value meanings, and shorthand from uploaded data. Only questions that genuinely need reasoning reach Tier 3 and the AI model.
02
RAG Pipeline with Semantic Search
Text documents are chunked into passages, embedded using nomic-embed-text (running locally), and stored in per-user ChromaDB indexes. Questions retrieve the most relevant passages by meaning rather than keyword match, grounding AI responses in the user's actual documents rather than general knowledge.
03
Intelligent Query Router
Built a priority-based router that classifies each question and directs it to the optimal handler: split reports (data from Python, narrative from AI, kept strictly separate), cross-sheet comparisons, document search, or one of six specialised analysis routes (sentiment, risk, negotiation, deflection, consistency, summarisation).
04
Nine-Step Spreadsheet Cleaning Pipeline
Automated data preparation handling header detection, sub-header removal, empty column/row stripping, date standardisation, colour extraction from cell formatting, and summary row identification. No AI involved - pure deterministic logic that turns messy real-world spreadsheets into reliable analytical bases.
05
Privacy-First Deployment
Deployed on RunPod with an NVIDIA A40 GPU (48 GB). The LLM (Qwen 2.5 14B) and embedding model both run locally - no data is sent to OpenAI, Google, or any external API. Six layers of protection: local AI, server-only storage, per-user isolation, HTTPS encryption, PyArmor source code protection, and English-only enforcement across all 11 AI communication points.

Results

3 Tiers
Deterministic first, schema second, AI only when needed
25+
File formats supported including OCR for images
Zero
Data sent to external services

The three-tier architecture means the majority of data questions are answered instantly with guaranteed accuracy, reserving AI processing for questions that genuinely require reasoning or synthesis. Tiers 1 and 2 were validated against 96 test queries with zero hallucination by design - they use deterministic logic, not probabilistic generation.

The split report architecture solves a problem that plagues most AI document tools: when numbers and narrative are generated by the same model, the AI can invent statistics. Here, quantitative outputs (charts, counts, tables) are produced by Python code with guaranteed accuracy, while qualitative analysis (themes, quotes, insights) is generated by the AI from document passages. The two streams are kept strictly separate and stitched together only at presentation.

For the client, this delivered something commercially unavailable: the analytical capability of a modern AI assistant with the data sovereignty of a fully private, on-premises system. Sensitive documents could be queried, cross-referenced, and analysed without any content leaving their infrastructure.

Technology Stack

Python Qwen 2.5 14B ChromaDB nomic-embed-text RunPod NVIDIA A40 Ollama Pandas PyArmor Tesseract OCR ngrok HTTPS/TLS
Interested in this work or something similar?