[Feat] Support adapter scaling to desired replicas #1132

dittops · 2025-05-23T15:48:18Z

Pull Request Description

Support adapter scaling to all replicas.

allow controller to sync adapter instances with all active pods
load adapter on each pod
update EndpointSlice with all pod IPs
adjust resources and tests for multi-pod support

Related Issues

Resolves: #1095

Jeffwan · 2025-05-28T22:32:26Z

@dittops Great! I will spend some time this week to review this change

Jeffwan · 2025-06-16T14:22:20Z

pkg/controller/modeladapter/modeladapter_controller.go

-		return nil
-	}
+		if targetPod.DeletionTimestamp != nil {
+			continue


TODO: add a log message here for any pods skipped

Jeffwan · 2025-06-16T14:23:53Z

pkg/controller/modeladapter/modeladapter_controller.go

-		if err := r.Get(ctx, types.NamespacedName{Namespace: instance.Namespace, Name: instance.Name}, found); err != nil {
-			if apierrors.IsNotFound(err) {
-				klog.Warningf("Failed to fetch the endpoint slice %s", klog.KObj(instance))
+	podList := []corev1.Pod{}


seems the logic has been simplified a lot. I will double check the logics

Jeffwan · 2025-06-16T14:25:34Z

pkg/controller/modeladapter/modeladapter_controller.go

 		}
-		if len(activePods) != 0 {
-			selectedPod, err = r.schedulePod(ctx, instance, activePods)


em. @dittops can you confirm the scheduling logic has been disabled in the new workflow? any reasons?

Jeffwan · 2025-06-16T14:26:25Z

@dittops I think the only part I was not that sure is the scheduling part. can you give more details?

dittops · 2025-06-18T09:43:29Z

Have used the following logic for Adapter loading/unloading

The pods are selected with Label-Based Matching - adapter.model.aibrix.ai/enabled: "true"
The selected pods are added to the Status.Instances list
Use the reconcileLoading function to iteratively load the adapters on each of the pods from the Instances list.

Jeffwan · 2025-06-19T00:34:25Z

@dittops the workflow sounds good. from the change change, I notice the lora scheduling logic has been deleted. In this case, how to select pods?

dittops · 2025-06-19T01:46:51Z

scheulePod was used to pick one pod and then assigned to instance.Status.Instances. Instead of choosing one pod, the new approach uses ALL pods that match the selector and adds all matching pods to instance.Status.Instances

If we need to keep the schedulePod, we can move the getActivePodsForModelAdapter inside schedulePod to select all pods and then return a list

Jeffwan · 2025-06-19T02:30:20Z

@dittops Yeah, I think the behavior has changed a bit recently.

Option 1: Schedule the LoRA model to specific pods based on the specified replicas.
Option 2: Load the LoRA into all base model replicas so that all models are identical — this is the approach you're switching to.

While Option 2 is a valid pattern that we can support, I strongly recommend sticking with Option 1 (with multi-replica support) as the primary solution for now. In our case, some high-rank LoRA models are quite large, and it's not practical to scale using Option 2. We could consider adding Option 2 as a separate feature later.

What do you think?

dittops · 2025-06-19T04:53:02Z

@Jeffwan Are you referring to adding a replica count in the ModelAdapter and use that for scheduling?
eg:

  spec:
    replicas: 3  # Only load on 3 pods
    podSelector:
      matchLabels:
        model.aibrix.ai/name: base-model

Jeffwan · 2025-06-19T07:03:22Z

@dittops exactly. https://github.com/vllm-project/aibrix/blame/main/api/model/v1alpha1/modeladapter_types.go#L53

Jeffwan · 2025-08-17T16:58:20Z

@dittops apologies for late response. I am recently refactoring lora work to provide better production level support. I want to merge this one first before I refactor the codes. However Seems I can not rebase main branch changes, could you help rebase the branch?

dittops · 2025-08-18T02:11:12Z

@Jeffwan, I have rebased. Could you take a look?

Jeffwan · 2025-08-18T07:30:44Z

@dittops I think the problem is vLLM community update the CI after this PR is out, result in some failures on the CI checking. I try to close this PR and reopen it, but seems it doesn't help that much. Could you help cut a new PR, we can directly work on new one and discard this one.

Signed-off-by: dittops <[email protected]>

…of the ModelAdapter Signed-off-by: dittops <[email protected]>

Signed-off-by: dittops <[email protected]>

Jeffwan · 2025-08-18T09:08:16Z

@dittops now it's working. thanks! no need to cut new PRs

Jeffwan · 2025-08-18T09:28:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for scaling model adapters to multiple replicas, which is a significant enhancement. The core logic in modeladapter_controller.go has been refactored to manage multiple pod instances, including scaling, loading/unloading adapters, and updating the EndpointSlice. The changes appear to be well-thought-out and correctly implemented for the most part. I've provided a few suggestions to improve efficiency and error handling in the controller logic. Overall, this is a great contribution.

gemini-code-assist · 2025-08-18T09:30:57Z

pkg/controller/modeladapter/modeladapter_controller.go

+		for j, p := range remainingPods {
+			if p.Name == pod.Name {
+				remainingPods = append(remainingPods[:j], remainingPods[j+1:]...)
+				break
+			}
+		}


The current implementation for removing a selected pod from remainingPods has a time complexity of O(N) for each removal because append needs to shift elements. Since this is inside a loop, the total complexity can be significant if the number of pods is large. A more efficient way to remove an element from a slice when the order doesn't matter is to swap the element with the last one and then shrink the slice. This operation is O(1) after finding the element.

for j := range remainingPods { if remainingPods[j].Name == pod.Name { // Swap with the last element for O(1) removal. remainingPods[j] = remainingPods[len(remainingPods)-1] remainingPods = remainingPods[:len(remainingPods)-1] break } }

gemini-code-assist · 2025-08-18T09:30:57Z

pkg/controller/modeladapter/modeladapter_controller.go

+			req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token))
+		}
+
+		httpClient := &http.Client{}


Creating a new http.Client inside a loop is inefficient. An http.Client is safe for concurrent use by multiple goroutines and for efficiency should be created once and reused. You should move the client initialization to before the for loop on line 766.

gemini-code-assist · 2025-08-18T09:30:57Z

pkg/controller/modeladapter/modeladapter_controller.go

+			}()
+
+			if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
+				body, _ := io.ReadAll(resp.Body)


The error returned by io.ReadAll is ignored. It's important to handle this error to avoid potential issues and to provide better diagnostics if reading the response body fails. You should check the error and log it if it's not nil.

gemini-code-assist · 2025-08-18T09:30:57Z

pkg/controller/modeladapter/modeladapter_controller.go

@@ -758,7 +848,7 @@ func (r *ModelAdapterReconciler) unloadModelAdapter(ctx context.Context, instanc

 	if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
 		body, _ := io.ReadAll(resp.Body)


The error returned by io.ReadAll is ignored. It's important to handle this error to avoid potential issues and to provide better diagnostics if reading the response body fails. You should check the error and log it if it's not nil.

gemini-code-assist · 2025-08-18T09:30:57Z

pkg/controller/modeladapter/modeladapter_controller.go

+		if err := r.Get(ctx, types.NamespacedName{Namespace: instance.Namespace, Name: podName}, &p); err == nil {
+			if p.DeletionTimestamp == nil {
+				podList = append(podList, p)
 			}
-			return ctrl.Result{}, err
 		}


The error from r.Get is silently ignored if it's not nil. While reconcileReplicas might clean up stale instances later, it's good practice to log other errors from r.Get (e.g., network issues, permissions) to aid in debugging. This helps to understand why a pod that is supposed to be an instance is not included in the EndpointSlice. Consider adding a log statement for non-NotFound errors.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Jeffwan · 2025-08-18T09:47:00Z

the change overall looks good to me. I will address gemini's feedback in later refactor

dittops force-pushed the feature/adapter-multi-pod-scaling branch from 52d03df to 1604126 Compare May 24, 2025 19:49

varungup90 requested a review from Jeffwan May 28, 2025 16:53

Jeffwan reviewed Jun 16, 2025

View reviewed changes

dittops force-pushed the feature/adapter-multi-pod-scaling branch 2 times, most recently from cab7429 to 2be1519 Compare August 18, 2025 01:55

Jeffwan closed this Aug 18, 2025

Jeffwan reopened this Aug 18, 2025

dittops force-pushed the feature/adapter-multi-pod-scaling branch from 2be1519 to f9c92f8 Compare August 18, 2025 08:18

dittops added 4 commits August 18, 2025 08:21

Remove unused labels import

289bf24

Signed-off-by: dittops <[email protected]>

update sample files to support scaling

75d94f5

Signed-off-by: dittops <[email protected]>

Update the lora scaling to be based on the replica count set as part …

dde347e

…of the ModelAdapter Signed-off-by: dittops <[email protected]>

fix linting

5ba12cd

Signed-off-by: dittops <[email protected]>

dittops force-pushed the feature/adapter-multi-pod-scaling branch from f9c92f8 to 5ba12cd Compare August 18, 2025 08:22

Jeffwan requested review from Jeffwan and Copilot August 18, 2025 09:28

gemini-code-assist bot reviewed Aug 18, 2025

View reviewed changes

Copilot AI reviewed Aug 18, 2025

View reviewed changes

Jeffwan changed the title ~~[Misc] Support adapter scaling to all replicas~~ [Feat] Support adapter scaling to desired replicas Aug 18, 2025

Jeffwan approved these changes Aug 18, 2025

View reviewed changes

Jeffwan merged commit a709e29 into vllm-project:main Aug 18, 2025
14 checks passed

		@@ -758,7 +848,7 @@ func (r *ModelAdapterReconciler) unloadModelAdapter(ctx context.Context, instanc

		if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
		body, _ := io.ReadAll(resp.Body)

[Feat] Support adapter scaling to desired replicas #1132

[Feat] Support adapter scaling to desired replicas #1132

Uh oh!

Conversation

dittops commented May 23, 2025

Pull Request Description

Related Issues

Uh oh!

Jeffwan commented May 28, 2025

Uh oh!

Jeffwan Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jeffwan Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jeffwan Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jeffwan commented Jun 16, 2025

Uh oh!

dittops commented Jun 18, 2025

Uh oh!

Jeffwan commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dittops commented Jun 19, 2025

Uh oh!

Jeffwan commented Jun 19, 2025

Uh oh!

dittops commented Jun 19, 2025

Uh oh!

Jeffwan commented Jun 19, 2025

Uh oh!

Jeffwan commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dittops commented Aug 18, 2025

Uh oh!

Jeffwan commented Aug 18, 2025

Uh oh!

Jeffwan commented Aug 18, 2025

Uh oh!

Jeffwan commented Aug 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Jeffwan commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

Jeffwan commented Jun 19, 2025 •

edited

Loading

Jeffwan commented Aug 17, 2025 •

edited

Loading