Applied AI / RAG Architecture

Private Document Analysis AI

A privacy-first document intelligence system that answers questions about uploaded documents and spreadsheets using a three-tier query architecture. Deterministic logic handles what it can, a schema engine handles what it should, and a local LLM with RAG handles the rest. No data ever leaves the server.

Type

Client Delivery

Domain

Document Intelligence

Infrastructure

RunPod (NVIDIA A40, 48 GB)

Status

Delivered

PRIVATE DOCUMENT AI

3-TIER RAG

DETERMINISTIC • INSTANT • 100% ACCURATE

↓

SCHEMA ENGINE • NO AI

↓

AI + RAG • 6 SPECIALIST ROUTES

The Challenge

Organisations handling sensitive documents - legal transcripts, membership records, internal reports - need to query and analyse that data without sending it to third-party AI services. Off-the-shelf tools like ChatGPT require uploading content to external servers, which is unacceptable when confidentiality is non-negotiable.

The system needed to run entirely on private infrastructure, accept a wide range of file formats (spreadsheets, PDFs, Word documents, images), answer questions with verifiable accuracy where possible, and only invoke AI when simpler methods genuinely could not answer the question.

Approach

Three-Tier Query Architecture

Designed a layered system where questions pass through three tiers of increasing complexity. Tier 1 uses 16 deterministic patterns (counting, filtering, ranking) for instant, hallucination-free answers. Tier 2 applies a schema engine that learns column types, value meanings, and shorthand from uploaded data. Only questions that genuinely need reasoning reach Tier 3 and the AI model.

RAG Pipeline with Semantic Search

Text documents are chunked into passages, embedded using nomic-embed-text (running locally), and stored in per-user ChromaDB indexes. Questions retrieve the most relevant passages by meaning rather than keyword match, grounding AI responses in the user's actual documents rather than general knowledge.

Intelligent Query Router

Built a priority-based router that classifies each question and directs it to the optimal handler: split reports (data from Python, narrative from AI, kept strictly separate), cross-sheet comparisons, document search, or one of six specialised analysis routes (sentiment, risk, negotiation, deflection, consistency, summarisation).

Nine-Step Spreadsheet Cleaning Pipeline

Automated data preparation handling header detection, sub-header removal, empty column/row stripping, date standardisation, colour extraction from cell formatting, and summary row identification. No AI involved - pure deterministic logic that turns messy real-world spreadsheets into reliable analytical bases.

Privacy-First Deployment

Deployed on RunPod with an NVIDIA A40 GPU (48 GB). The LLM (Qwen 2.5 14B) and embedding model both run locally - no data is sent to OpenAI, Google, or any external API. Six layers of protection: local AI, server-only storage, per-user isolation, HTTPS encryption, PyArmor source code protection, and English-only enforcement across all 11 AI communication points.

Results

3 Tiers

Deterministic first, schema second, AI only when needed

25+

File formats supported including OCR for images

Zero

Data sent to external services

The three-tier architecture means the majority of data questions are answered instantly with guaranteed accuracy, reserving AI processing for questions that genuinely require reasoning or synthesis. Tiers 1 and 2 were validated against 96 test queries with zero hallucination by design - they use deterministic logic, not probabilistic generation.

The split report architecture solves a problem that plagues most AI document tools: when numbers and narrative are generated by the same model, the AI can invent statistics. Here, quantitative outputs (charts, counts, tables) are produced by Python code with guaranteed accuracy, while qualitative analysis (themes, quotes, insights) is generated by the AI from document passages. The two streams are kept strictly separate and stitched together only at presentation.

For the client, this delivered something commercially unavailable: the analytical capability of a modern AI assistant with the data sovereignty of a fully private, on-premises system. Sensitive documents could be queried, cross-referenced, and analysed without any content leaving their infrastructure.

Technology Stack

Python Qwen 2.5 14B ChromaDB nomic-embed-text RunPod NVIDIA A40 Ollama Pandas PyArmor Tesseract OCR ngrok HTTPS/TLS

Interested in this work or something similar?

Get in Touch View All Projects