Ask me anything

Case study

Production RAG Chatbot (Enterprise application)

A practical build spanning Python, Google Cloud Run, Gemini API, RAG, LangChain, Vector Store, Google Chat API, GCP.

Overview

Why this project matters

Led architecture and delivery of a production-grade RAG chatbot for John Lewis Partnership's internal workforce — from first line of code to 90% adoption in one week. Drummie answers HR, policy, and operational questions via Google Chat, serving hundreds of non-technical staff across the organisation.

Context

The problem

John Lewis Partnership needed an internal knowledge assistant, but deploying a RAG solution in a risk-averse retail environment with strict data governance posed unique challenges. The CISO's primary concern was that RAG solutions were novel and sending PII to the Gemini API without clarity on PII handling was unacceptable. Google Chat was chosen over Slack because most non-tech staff already used it, and it integrated within existing GCP security boundaries.

Architecture

How it was built

The core architectural decision was building a PII-parsing and redaction layer at ingress within the Cloud Run function. This was internally built, tested, and validated — providing proof of obfuscation that satisfied the security review. The RAG pipeline uses an automated intranet-crawling pipeline that updates the knowledge base during evenings, ensuring answers reflect current policies. Responses are grounded in retrieved documents with source attribution.

Outcome

What was delivered

90% adoption in the first week. The DI&A management and HR teams had a definitive need for the tool, which made stakeholder buy-in easier. The approach was 'implement first, apologise later' — a fail-fast methodology with modular building that allowed rapid iteration. Delivered in 3 weeks from first line of code to production deployment.

Retrospective

What I would do differently

The PII-redaction pattern proved reusable across other GCP-based AI initiatives. If starting again, I would invest earlier in evaluation harnesses for RAG response quality — we relied heavily on user feedback loops in the first weeks. The Google Chat integration constraint actually simplified the deployment surface area compared to a standalone web app.

Published recognition

External coverage

This project is featured in independent third-party case studies:

More projects

Keep exploring

The Prediction Factory: Designing an ML Platform from First Principles

Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model …

Production ML Monitoring: From Weeks to Minutes

Designed and built a real-time model monitoring system at JUMO that reduced data anomaly detection time from weeks …

Transaction Categorisation Engine — Investec

Built an intelligent transaction categorisation engine for Investec Private Banking that reduced manual labelling effort by 60%. The …

Technology stack

PythonGoogle Cloud RunGemini APIRAGLangChainVector StoreGoogle Chat APIGCP

Next steps

Interested in similar work?

If you need secure GenAI delivery, RAG engineering, MLOps automation, or production ML systems support, feel free to get in touch.