Skip to content

[Intel][Gaudi] Remove HPU's dependency on vllm #6061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

yangw1234
Copy link

Motivation

This PR removes HPU's dependency on vllm.

This PR depends on #5252 and #5923

Modifications

Checklist

fix import error and improve contiguous pa

clean up

clean up

clean up

style

pre-commit

clean up

enable warmup

add moe layer

fix accuracy

disable radix cache automatically

remove sort

clean up

fix

fix model type

refine

style

style

add hpu test

add change

try directly install vllm-fork

clean up unnecessary changes

fix awq

address comments

fix acc

address comments

fix warmup

fix

cpu scheduler

fix style

fix device

remove block scales

refactor

refactor

fix

fix

remove change

fix attn mask

fix attn bias

add to device

fix style

address comments

address comments

Update python/sglang/srt/model_executor/forward_batch_info.py

Co-authored-by: JieXin Liang <[email protected]>

Update python/sglang/srt/model_executor/model_runner.py

Co-authored-by: JieXin Liang <[email protected]>

address comments

optimize allocator and rope

add profile

add heap based allocator
pin numpy

try running scheduler in hpu

disable overlap schedule

fix style

remove schedule func
update xgrammar

fix tests

fix warmup

add fix

optimize perf a bit

optimize some

remove
revert registry

fix rope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant