-
Notifications
You must be signed in to change notification settings - Fork 429
Open
Labels
Description
🚀 Feature Description and Motivation
Currently, we patch the configuration with EPP manually and static, see https://github.com/vllm-project/aibrix/blob/main/config/gateway/gateway.yaml#L100
Here is the summary of this proposal:
- v1 (currently): use envoy gateway EnvoyPatchPolicy (patch original cluster config) + EnvoyExtensionPolicy (add epp ext-proc config to gateway), this is static and not easy to maintain or orchestrate or extensible.
- v2 (planning): use envoy gateway btp (add host override lb policy) + eep (add epp ext-proc config to gateway, also add receiving/forwarding ns with 'envoy.lb' if needed), this can largely improve UX, and also add fallback abilities to the AIBrix GW.
- v3 (after v2): use aibrix controller to automatically do what we configure manually in v2, and support InferencePool API. Simplify UX and can adopt GIE conformance test.
Use Case
- Simplify configuration and improve UX.
- Support Fallback and improve reliabilities.
- Integration with GIE and inference upstream conformance tests.
varungup90