Case study
A practical build spanning Python, BERT, PyTorch, Scikit-learn, Hadoop, Spark, Azure.
Overview
Built an intelligent transaction categorisation engine for Investec Private Banking that reduced manual labelling effort by 60%. The system uses a fine-tuned BERT model trained on proprietary tokenised transaction data, with a novel visual timeline labelling tool that solved the cold-start problem of having no labelled training data.
Context
Investec needed automated transaction categorisation for their private banking clients, who transact at very high volumes with multiple income sources. The fundamental challenge was a cold-start problem: Investec had NO labelled transaction data to train on. Before building the categorisation model, I had to create the labelled dataset from scratch.
Architecture
I built a visual timeline labelling tool showing 3-month windows of transaction data, allowing domain experts to see patterns in context. I then fine-tuned a BERT model on the proprietary tokenised transaction descriptions. The critical innovation was integrating the BERT model back into the labelling tool itself — enabling batch selection of similar transactions, which dramatically accelerated the labelling process. The 60% reduction in manual labelling was the result of both the categorisation model AND the BERT-accelerated labelling workflow.
Outcome
60% reduction in manual transaction labelling effort. The categorisation engine handles high-volume private banking transaction streams with multiple income sources and complex spending patterns. The labelling tool approach proved reusable for other classification problems within the bank.
Retrospective
The labelling tool was as valuable as the model itself — a reminder that ML infrastructure often matters more than model architecture. If starting again, I would explore active learning loops earlier to further reduce the labelling burden.
More projects
Led architecture and delivery of a production-grade RAG chatbot for John Lewis Partnership's internal workforce — from first …
Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model …
Designed and built a real-time model monitoring system at JUMO that reduced data anomaly detection time from weeks …
If you need secure GenAI delivery, RAG engineering, MLOps automation, or production ML systems support, feel free to get in touch.