The Prediction Factory: Designing an ML Platform from First Principles

Overview

Why this project matters

Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model training from ~1 model per week to 50+ models per day, automated deployment pipelines, and established the artifact registry, feature store, and monitoring infrastructure that became the foundation for all production ML at JUMO.

Context

The problem

JUMO's data science team was bottlenecked: each model training run required manually launching EC2 instances, and deployments took a week per engineer per model. The journey went from isolated services to a unified platform. I needed to design a system that could handle any ML framework, automate the full lifecycle, and scale horizontally — while operating in a FinTech environment where credit scoring models carry significant bias and governance risks.

Architecture

How it was built

Config-driven orchestration layer: models built in any framework plug into a unified SKLearn interface, assembled via YAML configs, executed as isolated Kubernetes pods. The artifact registry stores model binaries in S3 (partitioned), metadata in DynamoDB, with a versioned API for CRUD operations built around a model-feature set pair concept. Feature materialisation uses EMR on EKS with templated query construction (similar to modern dbt), exposed via FastAPI, with partitioned features written to S3. The scoring service is a FastAPI containerised service pulling daily features against registered model versions. Airflow automation: a polling script every 5 minutes queries the artifact registry for new versions, auto-generates DAGs with schedule cron — new deployments go live the next day, old versions are archived automatically.

Outcome

What was delivered

Scaled from ~1 model/week to 50+ models/day. Deployment time dropped from a week to hours. 70% faster experimentation through generalised champion model architectures I identified using deductive reasoning. The platform architecture was a key asset during due diligence, contributing to a £2M+ funding round. During a 5-day emergency portfolio refresh, the platform enabled ~60 experiments and a champion ensemble selection under extreme time pressure.

Retrospective

What I would do differently

The config-driven approach proved essential for scaling — but the YAML configs became complex enough that a schema validation layer would have saved debugging time. I also identified bias risks in credit scoring that informed governance standards. If building again, I would add a model card system from day one and formalise the feature lineage tracking earlier.

More projects

Technology stack

PythonKubernetesFastAPIMLflowAirflowAWS S3DynamoDBEMR on EKSGreat ExpectationsDatadogPagerDutyDocker

The Prediction Factory: Designing an ML Platform from First Principles

Why this project matters

The problem

How it was built

What was delivered

What I would do differently

Keep exploring

Production RAG Chatbot (Enterprise application)

Production ML Monitoring: From Weeks to Minutes

Transaction Categorisation Engine — Investec

Technology stack

Next steps

Interested in similar work?