G-SIB Risk Assessment System
An end-to-end NLP pipeline that analyses quarterly financial reports and earnings call transcripts from Global Systemically Important Banks, extracting regulatory-style risk insights that would take human analysts weeks to compile manually.
The Challenge
Financial regulators face a growing structural problem: the volume of unstructured data published by systemically important banks far exceeds what human analysts can process consistently. Quarterly earnings reports, investor call transcripts, risk disclosures, and supplementary filings contain critical signals about capital adequacy, liquidity risk, and emerging vulnerabilities.
Manual review of these documents is slow, inconsistent, and prone to oversight. Key risk indicators are often buried in dense financial language, and cross-bank comparisons require analysts to synthesise information across hundreds of pages per quarter.
Approach
Results
Processed 81 quarterly reports across three global systemically important banks (UBS, Morgan Stanley, Barclays) spanning 2023-2025, extracting financial metrics, sentiment trajectories, and emerging risk themes at a speed and consistency that manual analyst review cannot replicate.
A five-model NLP pipeline (FinBERT, VADER, BERTopic, GPT-2, ARIMA) delivered structured intelligence from unstructured filings - turning earnings transcripts and regulatory disclosures into comparable, queryable data across banks and quarters. Topic modelling surfaced cross-bank themes that siloed reading would miss.
For regulatory teams managing growing volumes of public disclosures, this approach replaces weeks of manual extraction per reporting cycle with automated, auditable analysis that scales without proportional headcount increases.
Important note: This is a prototype built using only publicly available data. It demonstrates the methodology and capability, not a production deployment.