Ask me anything

Case study

BiteByte: Multi-Agent Orchestration for 80% Latency Reduction

A practical build spanning Python, OpenAI APIs, LangChain, Agentic AI, Multi-Agent Orchestration, Latency Optimisation, Django, Docker.

Overview

Why this project matters

BiteByte v2.0 extends my original meal-planning concept into a structured multi-agent system, achieving ~80% latency reduction (from 1min+ to ~20 seconds) by decomposing responsibilities across specialised agents. The architectural patterns were subsequently adopted for a commercial product at a major UK retailer.

Context

The problem

The original v1.0 used a single monolithic prompt chain that took over a minute to generate a complete meal plan with nutrition calculations and shopping lists. This was unacceptable for a responsive user experience. The redesign explored whether agent specialisation could reduce latency while improving output quality.

Architecture

How it was built

The system separates concerns across four specialised agents: intent handling (parsing user dietary requirements), nutrition rules (applying constraints and calculations), meal composition (generating recipes and combinations), and shopping-list generation (aggregating ingredients). Each agent has a focused prompt and operates on a smaller context window, enabling parallel execution where possible. Input validation and security controls prevent prompt injection.

Outcome

What was delivered

~80% latency reduction: from 1min+ down to ~20 seconds for a complete meal plan. The agent specialisation also improved output quality — each agent produces more consistent results with a narrower responsibility.

Commercial adoption

Real-world impact

The multi-agent orchestration patterns demonstrated in BiteByte were subsequently adopted by a separate team to accelerate the development of a Recipe Builder tool for a major UK retailer. The architectural approach — separating responsibilities across intent handling, constraint validation, composition, and output generation — proved directly transferable to a commercial product context. It is rare for a consultant's personal R&D to directly influence a client's product roadmap. This adoption validated the architectural thinking behind BiteByte and demonstrated that the agentic patterns scale beyond the original use case.

Retrospective

What I would do differently

The biggest win wasn't parallelisation (which helped) but reducing context window size per agent. Smaller, focused prompts generate faster and more reliably. This mirrors the microservices insight: separation of concerns improves both performance and maintainability.

Related writing

Articles connected to this project

24 February 2026 · 16 min read

Tracing the Ethical Contours of Artificial Intelligence: From Antiquity to the Global Governance Paradigms of 2026

A long-form essay on AI ethics, governance, and the historical roots of responsible AI systems.

More projects

Keep exploring

Production RAG Chatbot (Enterprise application)

Led architecture and delivery of a production-grade RAG chatbot for John Lewis Partnership's internal workforce — from first …

The Prediction Factory: Designing an ML Platform from First Principles

Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model …

Production ML Monitoring: From Weeks to Minutes

Designed and built a real-time model monitoring system at JUMO that reduced data anomaly detection time from weeks …

Technology stack

PythonOpenAI APIsLangChainAgentic AIMulti-Agent OrchestrationLatency OptimisationDjangoDocker

Next steps

Interested in similar work?

If you need secure GenAI delivery, RAG engineering, MLOps automation, or production ML systems support, feel free to get in touch.