-
Notifications
You must be signed in to change notification settings - Fork 432
[Feat] Support adapter scaling to desired replicas #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Support adapter scaling to desired replicas #1132
Conversation
52d03df
to
1604126
Compare
@dittops Great! I will spend some time this week to review this change |
return nil | ||
} | ||
if targetPod.DeletionTimestamp != nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add a log message here for any pods skipped
if err := r.Get(ctx, types.NamespacedName{Namespace: instance.Namespace, Name: instance.Name}, found); err != nil { | ||
if apierrors.IsNotFound(err) { | ||
klog.Warningf("Failed to fetch the endpoint slice %s", klog.KObj(instance)) | ||
podList := []corev1.Pod{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the logic has been simplified a lot. I will double check the logics
} | ||
if len(activePods) != 0 { | ||
selectedPod, err = r.schedulePod(ctx, instance, activePods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. @dittops can you confirm the scheduling logic has been disabled in the new workflow? any reasons?
@dittops I think the only part I was not that sure is the scheduling part. can you give more details? |
Have used the following logic for Adapter loading/unloading
|
@dittops the workflow sounds good. from the change change, I notice the lora scheduling logic has been deleted. In this case, how to select pods? |
scheulePod was used to pick one pod and then assigned to instance.Status.Instances. Instead of choosing one pod, the new approach uses ALL pods that match the selector and adds all matching pods to instance.Status.Instances If we need to keep the schedulePod, we can move the getActivePodsForModelAdapter inside schedulePod to select all pods and then return a list |
@dittops Yeah, I think the behavior has changed a bit recently. Option 1: Schedule the LoRA model to specific pods based on the specified replicas. While Option 2 is a valid pattern that we can support, I strongly recommend sticking with Option 1 (with multi-replica support) as the primary solution for now. In our case, some high-rank LoRA models are quite large, and it's not practical to scale using Option 2. We could consider adding Option 2 as a separate feature later. What do you think? |
@Jeffwan Are you referring to adding a replica count in the ModelAdapter and use that for scheduling?
|
@dittops apologies for late response. I am recently refactoring lora work to provide better production level support. I want to merge this one first before I refactor the codes. However Seems I can not rebase main branch changes, could you help rebase the branch? ![]() |
cab7429
to
2be1519
Compare
@Jeffwan, I have rebased. Could you take a look? |
![]() @dittops I think the problem is vLLM community update the CI after this PR is out, result in some failures on the CI checking. I try to close this PR and reopen it, but seems it doesn't help that much. Could you help cut a new PR, we can directly work on new one and discard this one. |
2be1519
to
f9c92f8
Compare
Signed-off-by: dittops <[email protected]>
Signed-off-by: dittops <[email protected]>
…of the ModelAdapter Signed-off-by: dittops <[email protected]>
Signed-off-by: dittops <[email protected]>
f9c92f8
to
5ba12cd
Compare
@dittops now it's working. thanks! no need to cut new PRs |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for scaling model adapters to multiple replicas, which is a significant enhancement. The core logic in modeladapter_controller.go
has been refactored to manage multiple pod instances, including scaling, loading/unloading adapters, and updating the EndpointSlice. The changes appear to be well-thought-out and correctly implemented for the most part. I've provided a few suggestions to improve efficiency and error handling in the controller logic. Overall, this is a great contribution.
for j, p := range remainingPods { | ||
if p.Name == pod.Name { | ||
remainingPods = append(remainingPods[:j], remainingPods[j+1:]...) | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation for removing a selected pod from remainingPods
has a time complexity of O(N) for each removal because append
needs to shift elements. Since this is inside a loop, the total complexity can be significant if the number of pods is large. A more efficient way to remove an element from a slice when the order doesn't matter is to swap the element with the last one and then shrink the slice. This operation is O(1) after finding the element.
for j := range remainingPods {
if remainingPods[j].Name == pod.Name {
// Swap with the last element for O(1) removal.
remainingPods[j] = remainingPods[len(remainingPods)-1]
remainingPods = remainingPods[:len(remainingPods)-1]
break
}
}
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token)) | ||
} | ||
|
||
httpClient := &http.Client{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}() | ||
|
||
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { | ||
body, _ := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -758,7 +848,7 @@ func (r *ModelAdapterReconciler) unloadModelAdapter(ctx context.Context, instanc | |||
|
|||
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { | |||
body, _ := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := r.Get(ctx, types.NamespacedName{Namespace: instance.Namespace, Name: podName}, &p); err == nil { | ||
if p.DeletionTimestamp == nil { | ||
podList = append(podList, p) | ||
} | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from r.Get
is silently ignored if it's not nil
. While reconcileReplicas
might clean up stale instances later, it's good practice to log other errors from r.Get
(e.g., network issues, permissions) to aid in debugging. This helps to understand why a pod that is supposed to be an instance is not included in the EndpointSlice. Consider adding a log statement for non-NotFound errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
the change overall looks good to me. I will address gemini's feedback in later refactor |
Pull Request Description
Support adapter scaling to all replicas.
Related Issues
Resolves: #1095