Transaction Categorisation Engine — Investec

Overview

Why this project matters

Built an intelligent transaction categorisation engine for Investec Private Banking that reduced manual labelling effort by 60%. The system uses a fine-tuned BERT model trained on proprietary tokenised transaction data, with a novel visual timeline labelling tool that solved the cold-start problem of having no labelled training data.

Context

The problem

Investec needed automated transaction categorisation for their private banking clients, who transact at very high volumes with multiple income sources. The fundamental challenge was a cold-start problem: Investec had NO labelled transaction data to train on. Before building the categorisation model, I had to create the labelled dataset from scratch.

Architecture

How it was built

I built a visual timeline labelling tool showing 3-month windows of transaction data, allowing domain experts to see patterns in context. I then fine-tuned a BERT model on the proprietary tokenised transaction descriptions. The critical innovation was integrating the BERT model back into the labelling tool itself — enabling batch selection of similar transactions, which dramatically accelerated the labelling process. The 60% reduction in manual labelling was the result of both the categorisation model AND the BERT-accelerated labelling workflow.

Outcome

What was delivered

60% reduction in manual transaction labelling effort. The categorisation engine handles high-volume private banking transaction streams with multiple income sources and complex spending patterns. The labelling tool approach proved reusable for other classification problems within the bank.

Retrospective

What I would do differently

The labelling tool was as valuable as the model itself — a reminder that ML infrastructure often matters more than model architecture. If starting again, I would explore active learning loops earlier to further reduce the labelling burden.

More projects

Technology stack

PythonBERTPyTorchScikit-learnHadoopSparkAzure

Transaction Categorisation Engine — Investec

Why this project matters

The problem

How it was built

What was delivered

What I would do differently

Keep exploring

Production RAG Chatbot (Enterprise application)

The Prediction Factory: Designing an ML Platform from First Principles

Production ML Monitoring: From Weeks to Minutes

Technology stack

Next steps

Interested in similar work?