|
| 1 | ++++ |
| 2 | +date = '2025-07-11' |
| 3 | +draft = true |
| 4 | +title = 'LLM Orchestration and System Readiness' |
| 5 | +tags = ["AI", "LLM", "SonarQube", "Refactoring", "Agentic AI", "WatsonX", "Ollama", "vLLM", "UCL", "IBM"] |
| 6 | ++++ |
| 7 | + |
| 8 | +<a href="#tldr" class="btn">Jump to TL;DR</a> |
| 9 | + |
| 10 | +Following last week’s groundwork in static analysis and semantic retrieval, our focus this week has shifted toward multi-model orchestration, LLM infrastructure, and preparing for our upcoming IBM showcase on the 16th. |
| 11 | + |
| 12 | +## 1. Unified Provider Interface |
| 13 | +To enable flexible experimentation and future-proofing, we've implemented a unified interface for interacting with multiple LLM providers. |
| 14 | + |
| 15 | +Our abstraction currently supports **Ollama**, **vLLM**, and **WatsonX**, but, following the Open-Closed Principle, we've made sure to facilitate the addition of new ones with no friction. |
| 16 | + |
| 17 | +This interface lays the foundation for benchmarking, adaptive routing, and graceful fallback strategies based on availability and performance of providers, with easy plug-in of AI models for each. |
| 18 | + |
| 19 | +## 2. vLLM Integration for Local Scaling |
| 20 | + |
| 21 | +We've begun setting up [vLLM](https://docs.vllm.ai/en/latest/) as a high-throughput, low-latency local model provider, as an alternative to Ollama. |
| 22 | + |
| 23 | +By running it on specialised hardware, we gain the benefits of faster inference and more efficient memory use, which is crucial in scaling for real-world scenarios. |
| 24 | + |
| 25 | +vLLM now acts as a drop-in alternative to Ollama, helping us compare local model performance under different configurations. |
| 26 | + |
| 27 | +## 3. Preparing for the IBM presentation |
| 28 | + |
| 29 | +With the system architecture stabilising (see [our Agentic Workflow post]({{< relref "agentic-workflow/index.md" >}})), we've started preparing for our upcoming presentation to IBM next Wednesday (July 16th). |
| 30 | + |
| 31 | +This involved refining our workflows, scripting sample runs, and ensuring every component - especially our Scanner and Strategist agents - works seamlessly across models. |
| 32 | + |
| 33 | +## TL;DR |
| 34 | + |
| 35 | +This week we've focused on orchestration, flexibility, and preparing for next week's presentation: |
| 36 | +- **Unified LLM Interface** to switch between Ollama, WatsonX, and vLLM without modifying core agent logic. |
| 37 | +- **vLLM Integration Underway** to enable high-performance inference running on dedicated hardware. |
| 38 | +- **IBM Presentation Preparations** are in full swing, as we polish our end-to-end flows and solidify our system. |
0 commit comments