[Model] Add Ling implementation #20482

ant-yy · 2025-07-04T09:02:33Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

Purpose

Add [Model] Ling implementation

This PR adds support for the Ling Mixture-of-Experts (MoE) language model series open-sourced by InclusionAI (GitHub). The implementation includes:

Ling-lite: 16.8B total parameters (2.75B activated)
Ling-plus: 290B total parameters (28.8B activated)

Key features:

Scalable MoE architecture enabling flexible parameter allocation
State-of-the-art performance across NLP benchmarks
Task-adaptive structure for diverse applications
Apache 2.0 licensed open-source implementation

The implementation follows vLLM's model integration patterns and maintains compatibility with existing serving infrastructure. This addition will allow vLLM users to leverage Ling's efficient inference capabilities while benefiting from the framework's high-throughput serving optimizations.

Ling

Test Plan

We will conduct the following datasets tests in subsequent phases, and the results will be supplemented accordingly:
Datasets to be evaluated: MMLU(EM), GPQA(Pass@1), HumanEval(Pass@1), LiveCodeBench 2408-2502 (Pass@1), LCBench(pass@1), Math(EM), AIME2024(pass@1), OlympiadBench(pass@1), BBH(EM), IFEval(Prompt Strict), BFCL_live1.

Test Result

Here are the results based on vLLM 0.7.3.

Signed-off-by: vito.yy <[email protected]>

github-actions · 2025-07-04T09:02:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @ant-yy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for the Ling Mixture-of-Experts (MoE) language model series within the vLLM framework. It enables efficient inference for Ling-lite and Ling-plus models by integrating their specific architecture, including attention, MLP, and MoE layers, while leveraging vLLM's high-throughput serving optimizations. The changes involve adding new model definition files, registering the model, and defining its configuration parameters.

Highlights

New Model Integration: Added full support for the Ling Mixture-of-Experts (MoE) language model series, including Ling-lite and Ling-plus variants, enabling their efficient inference within vLLM.
MoE Architecture Implementation: Implemented the specific MoE architecture of Ling models, featuring BailingAttention, BailingMLP, and a BailingMoE layer with support for shared experts and FusedMoE.
vLLM Framework Integration: Integrated the Ling model into vLLM's serving infrastructure, including support for LoRA, quantization, and pipeline parallelism, ensuring compatibility with existing vLLM features.
Configuration Definition: Defined a new BailingMoeConfig class to specify the architectural parameters and default values for Ling MoE models, allowing vLLM to correctly load and configure them.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Ling (Bailing) Mixture-of-Experts model. The implementation looks solid and follows the existing patterns in vLLM. I've identified a few areas for improvement, mainly related to code cleanliness and consistency, such as removing unused imports, correcting LoRA configurations, and maintaining alphabetical order in registries and import statements. Addressing these points will enhance the maintainability of the new code.

vllm/model_executor/models/bailing_moe.py

vllm/model_executor/models/registry.py

vllm/transformers_utils/configs/__init__.py

Signed-off-by: vito.yy <[email protected]>

mergify · 2025-07-09T08:06:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ant-yy.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

… sizes (vllm-project#20640) Signed-off-by: Duncan Moss <[email protected]>

…roject#20434) Signed-off-by: bk-201 <[email protected]>

Signed-off-by: Dmitry Rogozhkin <[email protected]>

…m-project#20659) Signed-off-by: Kunshang Ji <[email protected]>

Signed-off-by: qingjun <[email protected]>

…ng is enabled and `tool_choice` is set to `'required'`. (vllm-project#20662) Signed-off-by: chaunceyjiang <[email protected]>

Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: vito.yy <[email protected]>

ant-yy · 2025-07-09T10:37:40Z

Due to an accidental Git operation, this pull request was mistakenly closed. A new PR will be resubmitted to add the Ling Model, and this existing PR is hereby marked as obsolete.

ant-yy · 2025-07-09T11:23:39Z

new pr: #20680

Add Bailing_moe

8bbd7e5

Signed-off-by: vito.yy <[email protected]>

gemini-code-assist bot reviewed Jul 4, 2025

View reviewed changes

jeejeelee changed the title ~~Add [Model] Ling implementation~~ [Model] Add Ling implementation Jul 4, 2025

jeejeelee added the new-model Requests to new models label Jul 4, 2025

ant-yy added 6 commits July 7, 2025 08:56

fix based on response

42f5f31

Signed-off-by: vito.yy <[email protected]>

Fix the response: E501 line too long

128f4cf

Signed-off-by: vito.yy <[email protected]>

Adjust import order

3173bc2

Signed-off-by: vito.yy <[email protected]>

Fix minor formatting issues

db76d88

Signed-off-by: vito.yy <[email protected]>

Merge remote-tracking branch 'upstream/main' into ling-vl

80d520f

Add content to supported_models.md and test files

adf9d27

Signed-off-by: vito.yy <[email protected]>

ant-yy requested review from hmellor, DarkLight1337 and ywang96 as code owners July 9, 2025 06:36

mergify bot added the documentation Improvements or additions to documentation label Jul 9, 2025

Small fix

0581586

Signed-off-by: vito.yy <[email protected]>

mergify bot added the needs-rebase label Jul 9, 2025

djmmoss and others added 10 commits July 9, 2025 09:37

[feat] enable SM100 CUTLASS block scaled group gemm for smaller batch…

bf58e57

… sizes (vllm-project#20640) Signed-off-by: Duncan Moss <[email protected]>

Fix bullets in incremental_build.md (vllm-project#20642)

cdff58b

[Misc] Fix the size of batched_dummy_mm_inputs in profile_run (vllm-p…

3e53b33

…roject#20434) Signed-off-by: bk-201 <[email protected]>

[XPU] Use spawn with XPU multiprocessing (vllm-project#20649)

0100e50

Signed-off-by: Dmitry Rogozhkin <[email protected]>

[Intel GPU] support ray as distributed executor backend for XPU. (vll…

a95d0d1

…m-project#20659) Signed-off-by: Kunshang Ji <[email protected]>

[Docs] fix minimax tool_calling docs error (vllm-project#20667)

bce930d

Signed-off-by: qingjun <[email protected]>

[Bugfix] Fix the issue where reasoning_content is None when Think…

63cfe24

…ng is enabled and `tool_choice` is set to `'required'`. (vllm-project#20662) Signed-off-by: chaunceyjiang <[email protected]>

[V1] [Doc] Update V1 docs for Mamba models (vllm-project#20499)

d57795f

Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Add content to supported_models.md and test files

9143ce6

Signed-off-by: vito.yy <[email protected]>

Resolve conflicts

483fe2e

Signed-off-by: vito.yy <[email protected]>

ant-yy requested a review from robertgshaw2-redhat as a code owner July 9, 2025 09:59

ant-yy requested review from simon-mo, aarnphm, WoosukKwon, njhill, comaniac and alexm-redhat as code owners July 9, 2025 09:59

mergify bot added ci/build frontend v1 tool-calling labels Jul 9, 2025

github-project-automation bot added this to Tool Calling Jul 9, 2025

ant-yy closed this Jul 9, 2025

github-project-automation bot moved this to Done in Tool Calling Jul 9, 2025

ant-yy reopened this Jul 9, 2025

ant-yy closed this Jul 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Add Ling implementation #20482

[Model] Add Ling implementation #20482

Uh oh!

ant-yy commented Jul 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 9, 2025

Uh oh!

ant-yy commented Jul 9, 2025

Uh oh!

ant-yy commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

[Model] Add Ling implementation #20482

[Model] Add Ling implementation #20482

Uh oh!

Conversation

ant-yy commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 9, 2025

Uh oh!

ant-yy commented Jul 9, 2025

Uh oh!

ant-yy commented Jul 9, 2025

Uh oh!

Uh oh!

ant-yy commented Jul 4, 2025 •

edited

Loading