Case study
A practical build spanning Python, OpenAI APIs, LangChain, Agentic AI, Multi-Agent Orchestration, Latency Optimisation, Django, Docker.
Overview
BiteByte v2.0 extends my original meal-planning concept into a structured multi-agent system, achieving ~80% latency reduction (from 1min+ to ~20 seconds) by decomposing responsibilities across specialised agents. The architectural patterns were subsequently adopted for a commercial product at a major UK retailer.
Context
The original v1.0 used a single monolithic prompt chain that took over a minute to generate a complete meal plan with nutrition calculations and shopping lists. This was unacceptable for a responsive user experience. The redesign explored whether agent specialisation could reduce latency while improving output quality.
Architecture
The system separates concerns across four specialised agents: intent handling (parsing user dietary requirements), nutrition rules (applying constraints and calculations), meal composition (generating recipes and combinations), and shopping-list generation (aggregating ingredients). Each agent has a focused prompt and operates on a smaller context window, enabling parallel execution where possible. Input validation and security controls prevent prompt injection.
Outcome
~80% latency reduction: from 1min+ down to ~20 seconds for a complete meal plan. The agent specialisation also improved output quality — each agent produces more consistent results with a narrower responsibility.
Commercial adoption
The multi-agent orchestration patterns demonstrated in BiteByte were subsequently adopted by a separate team to accelerate the development of a Recipe Builder tool for a major UK retailer. The architectural approach — separating responsibilities across intent handling, constraint validation, composition, and output generation — proved directly transferable to a commercial product context. It is rare for a consultant's personal R&D to directly influence a client's product roadmap. This adoption validated the architectural thinking behind BiteByte and demonstrated that the agentic patterns scale beyond the original use case.
Retrospective
The biggest win wasn't parallelisation (which helped) but reducing context window size per agent. Smaller, focused prompts generate faster and more reliably. This mirrors the microservices insight: separation of concerns improves both performance and maintainability.
Related writing
24 February 2026 · 16 min read
A long-form essay on AI ethics, governance, and the historical roots of responsible AI systems.
More projects
Led architecture and delivery of a production-grade RAG chatbot for John Lewis Partnership's internal workforce — from first …
Defined and delivered JUMO's internal ML platform from first principles — a config-driven orchestration layer that scaled model …
Designed and built a real-time model monitoring system at JUMO that reduced data anomaly detection time from weeks …
If you need secure GenAI delivery, RAG engineering, MLOps automation, or production ML systems support, feel free to get in touch.