feat: add support for lora affinity routing strategy #1181

Xunzhuo · 2025-06-05T11:33:32Z

feat: add support for lora affinity routing strategy

Signed-off-by: bitliu <[email protected]>

gemini-code-assist

Hello @Xunzhuo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR introduces a new routing algorithm specifically designed for LoRA (Low-Rank Adaptation) workloads. The primary goal is to leverage pod affinity for specific LoRA models to improve performance by routing requests for a model to a pod that already has that model loaded or waiting to be loaded. It also incorporates a probabilistic approach to allow some requests to be routed to other available pods, enabling load balancing and preventing hot-spotting, even when affinity exists. The implementation involves adding a new routing algorithm file and making minor adjustments to metric definitions.

Highlights

New Routing Algorithm: Adds a new routing algorithm named lora-affinity to the gateway's routing plugins.
LoRA Affinity Logic: Implements logic to evaluate pods based on whether they are currently running or waiting for the requested LoRA model, using metrics like RunningLoraAdapters and WaitingLoraAdapters.
Probabilistic Routing: Introduces a configurable probability threshold (AIBRIX_LORA_AFFINITY_THRESHOLD) to decide whether to route a request to a pod with affinity or to a pod with available capacity (for load balancing), when both options exist.
Metric Integration: Utilizes existing LoRA-related metrics (RunningLoraAdapters, WaitingLoraAdapters, MaxLora) to determine pod affinity and capacity.
Metric Name Refactor: Refactors the metric names in pkg/metrics/metrics.go by introducing a constant LoraRequestsInfo for consistency.

Changelog

Click here to see the changelog

pkg/metrics/metrics.go
- Added a new constant LoraRequestsInfo on line 49.
- Updated the RawMetricName field for MaxLora, RunningLoraAdapters, and WaitingLoraAdapters to use the new LoraRequestsInfo constant instead of the hardcoded string "lora_requests_info" on lines 286, 295, and 304 respectively.
pkg/plugins/gateway/algorithms/lora_affinity.go
- Added a new file implementing the lora-affinity routing algorithm.
- Defined the algorithm name LoraAffinity on line 43.
- Defined a defaultLoraAffinityThreshold (0.999) and loaded the configurable loraAffinityThreshold from the environment variable AIBRIX_LORA_AFFINITY_THRESHOLD on lines 39 and 45.
- Registered the new router NewLoraAffinityRouter on line 49.
- Implemented the Route method (lines 74-138) which filters pods based on LoRA affinity and capacity, uses the probability threshold for selection, and falls back to a LeastRequest selection within the chosen group or a random selection if no suitable pods are found.
- Implemented the evaluatePodForLoraAffinity helper method (lines 143-171) to check a pod's affinity and capacity using LoRA metrics.
- Implemented the checkModelAffinity helper method (lines 174-176) to check if a target model is in the running or waiting sets.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A router's choice, a path to find,
For LoRA models, one of a kind.
Affinity calls, a favored place,
Or load balance, just in case.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a new LoRA affinity routing strategy, which aims to prioritize pods with existing LoRA model affinity while allowing for load balancing. The implementation appears well-structured and includes relevant logging for debugging. However, there are a few areas that could be improved for clarity and efficiency.

Summary of Findings

Incorrect Metric Usage: The evaluatePodForLoraAffinity function retrieves maxAdapters using metrics.WaitingLoraAdapters instead of metrics.MaxLora, potentially leading to incorrect capacity evaluations.
Missing Error Context in Logging: The log message for failed LoRA metric retrieval lacks specific error details, hindering debugging efforts.
Potential Division by Zero: The code does not explicitly check if maxCount is zero before performing capacity checks, which could lead to unexpected behavior.

Merge Readiness

The pull request introduces a valuable new routing strategy. However, the high severity issue regarding the incorrect metric usage in evaluatePodForLoraAffinity must be addressed before merging. Additionally, the medium severity issues should be considered for improvement. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

gemini-code-assist · 2025-06-05T11:35:07Z

pkg/plugins/gateway/algorithms/lora_affinity.go

+	// Get LoRA metrics for the pod
+	runningAdapters, runningErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.RunningLoraAdapters)
+	waitingAdapters, waitingErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.WaitingLoraAdapters)
+	maxAdapters, maxErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.WaitingLoraAdapters)


The maxAdapters value is retrieved using metrics.WaitingLoraAdapters, but the description suggests it should be metrics.MaxLora. This discrepancy could lead to incorrect capacity evaluations. It should be metrics.MaxLora to align with the intended logic.

Suggested change

maxAdapters, maxErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.WaitingLoraAdapters)

maxAdapters, maxErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.MaxLora)

gemini-code-assist · 2025-06-05T11:35:08Z

pkg/plugins/gateway/algorithms/lora_affinity.go

+	// Default threshold for LoRA affinity selection probability
+	// When both affinity and available pods exist, this probability determines
+	// how often we select from affinity pods vs available pods for load balancing
+	defaultLoraAffinityThreshold = 0.999


Consider adding a comment explaining why defaultLoraAffinityThreshold is set to 0.999. What is the reasoning behind this specific value? Is it empirically derived, or based on some theoretical consideration?

gemini-code-assist · 2025-06-05T11:35:08Z

pkg/plugins/gateway/algorithms/lora_affinity.go

+	} else {
+		targetPod, err = utils.SelectRandomPod(readyPods, rand.Intn)


In the case where len(targetPods) == 0, the code falls back to utils.SelectRandomPod(readyPods, rand.Intn). However, readyPods has already been filtered for routable pods. It might be more appropriate to use the original readyPods list to ensure a pod is always selected if available, or return an error indicating no routable pods are available.

gemini-code-assist · 2025-06-05T11:35:08Z

pkg/plugins/gateway/algorithms/lora_affinity.go

+	// Check if the pod has available capacity
+	totalActiveAdapters := runningModels.Len() + waitingModels.Len()
+	hasCapacity := totalActiveAdapters < maxCount


Consider adding a check to ensure that maxCount is not zero before proceeding with the capacity check. A zero value could lead to unexpected behavior.

Signed-off-by: bitliu <[email protected]>

kerthcet · 2025-06-05T14:36:44Z

pkg/plugins/gateway/algorithms/lora_affinity.go

+}
+
+// checkModelAffinity checks if the pod has the target model active or waiting
+func (r loraAffinityRouter) checkModelAffinity(targetModel string, runningModels, waitingModels sets.String) bool {


One tradeoff here is runningModels are already loaded but the waitingModels may not, based on this assumption, waitingModels are suboptimal.

Jeffwan · 2025-08-21T09:27:25Z

@Xunzhuo could you help update this PR? lora improvement is planned in v0.5.0 and let's keep moving and improve the quality

Xunzhuo · 2025-08-21T10:29:02Z

@Jeffwan sure

feat: add support for lora affinity strategy

3582dcc

Signed-off-by: bitliu <[email protected]>

gemini-code-assist bot reviewed Jun 5, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 5, 2025

View reviewed changes

feat: add support for lora affinity strategy

795e885

Signed-off-by: bitliu <[email protected]>

kerthcet reviewed Jun 5, 2025

View reviewed changes

Jeffwan self-requested a review June 19, 2025 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for lora affinity routing strategy #1181

feat: add support for lora affinity routing strategy #1181

Uh oh!

Xunzhuo commented Jun 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 5, 2025

Uh oh!

gemini-code-assist bot Jun 5, 2025

Uh oh!

gemini-code-assist bot Jun 5, 2025

Uh oh!

gemini-code-assist bot Jun 5, 2025

Uh oh!

kerthcet Jun 5, 2025

Uh oh!

Jeffwan commented Aug 21, 2025

Uh oh!

Xunzhuo commented Aug 21, 2025

Uh oh!

Uh oh!

	maxAdapters, maxErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.WaitingLoraAdapters)
	maxAdapters, maxErr := r.cache.GetMetricValueByPod(pod.Name, pod.Namespace, metrics.MaxLora)

		} else {
		targetPod, err = utils.SelectRandomPod(readyPods, rand.Intn)

feat: add support for lora affinity routing strategy #1181

Are you sure you want to change the base?

feat: add support for lora affinity routing strategy #1181

Uh oh!

Conversation

Xunzhuo commented Jun 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Jeffwan commented Aug 21, 2025

Uh oh!

Xunzhuo commented Aug 21, 2025

Uh oh!

Uh oh!