Ask me anything

Case study

The Prediction Factory: Designing an ML Platform from First Principles

A practical build spanning Python, Kubernetes, FastAPI, MLflow, Airflow, AWS S3, DynamoDB, EMR on EKS, Great Expectations, Datadog, PagerDuty, Docker.

Overview

Why this project matters

Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model training from ~1 model per week to 50+ models per day, automated deployment pipelines, and established the artifact registry, feature store, and monitoring infrastructure that became the foundation for all production ML at JUMO.

Context

The problem

JUMO's data science team was bottlenecked: each model training run required manually launching EC2 instances, and deployments took a week per engineer per model. The journey went from isolated services to a unified platform. I needed to design a system that could handle any ML framework, automate the full lifecycle, and scale horizontally — while operating in a FinTech environment where credit scoring models carry significant bias and governance risks.

Architecture

How it was built

Config-driven orchestration layer: models built in any framework plug into a unified SKLearn interface, assembled via YAML configs, executed as isolated Kubernetes pods. The artifact registry stores model binaries in S3 (partitioned), metadata in DynamoDB, with a versioned API for CRUD operations built around a model-feature set pair concept. Feature materialisation uses EMR on EKS with templated query construction (similar to modern dbt), exposed via FastAPI, with partitioned features written to S3. The scoring service is a FastAPI containerised service pulling daily features against registered model versions. Airflow automation: a polling script every 5 minutes queries the artifact registry for new versions, auto-generates DAGs with schedule cron — new deployments go live the next day, old versions are archived automatically.

Outcome

What was delivered

Scaled from ~1 model/week to 50+ models/day. Deployment time dropped from a week to hours. 70% faster experimentation through generalised champion model architectures I identified using deductive reasoning. The platform architecture was a key asset during due diligence, contributing to a £2M+ funding round. During a 5-day emergency portfolio refresh, the platform enabled ~60 experiments and a champion ensemble selection under extreme time pressure.

Retrospective

What I would do differently

The config-driven approach proved essential for scaling — but the YAML configs became complex enough that a schema validation layer would have saved debugging time. I also identified bias risks in credit scoring that informed governance standards. If building again, I would add a model card system from day one and formalise the feature lineage tracking earlier.

More projects

Keep exploring

Production RAG Chatbot (Enterprise application)

Led architecture and delivery of a production-grade RAG chatbot for John Lewis Partnership's internal workforce — from first …

Production ML Monitoring: From Weeks to Minutes

Designed and built a real-time model monitoring system at JUMO that reduced data anomaly detection time from weeks …

Transaction Categorisation Engine — Investec

Built an intelligent transaction categorisation engine for Investec Private Banking that reduced manual labelling effort by 60%. The …

Technology stack

PythonKubernetesFastAPIMLflowAirflowAWS S3DynamoDBEMR on EKSGreat ExpectationsDatadogPagerDutyDocker

Next steps

Interested in similar work?

If you need secure GenAI delivery, RAG engineering, MLOps automation, or production ML systems support, feel free to get in touch.