Case study
A practical build spanning Python, Kubernetes, FastAPI, MLflow, Airflow, AWS S3, DynamoDB, EMR on EKS, Great Expectations, Datadog, PagerDuty, Docker.
Overview
Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model training from ~1 model per week to 50+ models per day, automated deployment pipelines, and established the artifact registry, feature store, and monitoring infrastructure that became the foundation for all production ML at JUMO.
Context
JUMO's data science team was bottlenecked: each model training run required manually launching EC2 instances, and deployments took a week per engineer per model. The journey went from isolated services to a unified platform. I needed to design a system that could handle any ML framework, automate the full lifecycle, and scale horizontally — while operating in a FinTech environment where credit scoring models carry significant bias and governance risks.
Architecture
Config-driven orchestration layer: models built in any framework plug into a unified SKLearn interface, assembled via YAML configs, executed as isolated Kubernetes pods. The artifact registry stores model binaries in S3 (partitioned), metadata in DynamoDB, with a versioned API for CRUD operations built around a model-feature set pair concept. Feature materialisation uses EMR on EKS with templated query construction (similar to modern dbt), exposed via FastAPI, with partitioned features written to S3. The scoring service is a FastAPI containerised service pulling daily features against registered model versions. Airflow automation: a polling script every 5 minutes queries the artifact registry for new versions, auto-generates DAGs with schedule cron — new deployments go live the next day, old versions are archived automatically.
Outcome
Scaled from ~1 model/week to 50+ models/day. Deployment time dropped from a week to hours. 70% faster experimentation through generalised champion model architectures I identified using deductive reasoning. The platform architecture was a key asset during due diligence, contributing to a £2M+ funding round. During a 5-day emergency portfolio refresh, the platform enabled ~60 experiments and a champion ensemble selection under extreme time pressure.
Retrospective
The config-driven approach proved essential for scaling — but the YAML configs became complex enough that a schema validation layer would have saved debugging time. I also identified bias risks in credit scoring that informed governance standards. If building again, I would add a model card system from day one and formalise the feature lineage tracking earlier.
More projects
Led architecture and delivery of a production-grade RAG chatbot for John Lewis Partnership's internal workforce — from first …
Designed and built a real-time model monitoring system at JUMO that reduced data anomaly detection time from weeks …
Built an intelligent transaction categorisation engine for Investec Private Banking that reduced manual labelling effort by 60%. The …
If you need secure GenAI delivery, RAG engineering, MLOps automation, or production ML systems support, feel free to get in touch.