Model client streaming from the selector of SelectorGroupChat #6145

yingjiewei · 2025-03-29T16:41:26Z

Feature Request

We can enable streaming for SelectoGroupChat's built-in selector by introducing an option in SelectorGroupChat, e.g., model_client_stream so the model client will be used in streaming model. It will use create_stream rather than create.

As the next step, we can enable streaming of orchestration events through run_stream so the streaming output will be visible from consumer of run_stream. Issue here: #6161

--- Below is the original bug report ---

What happened?

Describe the bug
Some llm models only support stream = True. The assistant agent supports this very well by setting model_client_stream = True. But the OpenAIChatCompletionClient does not allow to pass stream = True to it. Therefore, it's not very possible to use llm models which only supports stream = True.

To Reproduce

    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'code': 'invalid_parameter_error', 'param': None, 'message': 'This model only support stream mode, please enable the stream parameter to access the model. ', 'type': 'invalid_request_error'}, 'id':

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Which packages was the bug in?

Python AgentChat (autogen-agentchat>=0.4.0)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

No response

Model provider

None

Other model provider

No response

Python version

None

.NET version

None

Operating system

None

The text was updated successfully, but these errors were encountered:

SongChiYoung · 2025-03-30T07:56:23Z

This is an interesting issue

Would it be possible to provide a more specific reproduction example, including which model/client configuration triggers this error?

I'd love to help investigate further.

yingjiewei · 2025-03-30T10:12:12Z

@SongChiYoung, many thanks for helping.

here is my code:

import asyncio
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.ui import Console
from autogen_agentchat.teams import RoundRobinGroupChat, SelectorGroupChat

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core.memory import ListMemory, MemoryContent, MemoryMimeType

def get_model_client() -> OpenAIChatCompletionClient:
    return OpenAIChatCompletionClient(
        model="qwq-plus",
        api_key="",
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
        model_info={
            "json_output": False,
            "vision": False,
            "function_calling": True,
            "family": "unknown"
        }
    )

model_client = get_model_client()


async def create_assistant():
    # 确保此函数内所有异步操作都使用await
    # 示例内存初始化（根据文献[6]优化内存管理）
    user_memory = ListMemory()
    await user_memory.add(
        MemoryContent(
            content="The user system is linux",
            mime_type=MemoryMimeType.TEXT
        )
    )

    # 初始化模型客户端（需补充实际配置）
    # model_client = model_client

    return AssistantAgent(
        name="assistant_agent",
        model_client=model_client,
        memory=[user_memory],  # 直接传递内存实例[6](@ref)
        system_message='''...''',  # 保持原有系统消息
        model_client_stream=True
    )

async def create_assistant2():
    # 确保此函数内所有异步操作都使用await
    # 示例内存初始化（根据文献[6]优化内存管理）
    user_memory = ListMemory()
    await user_memory.add(
        MemoryContent(
            content="The user system is linux",
            mime_type=MemoryMimeType.TEXT
        )
    )

    # 初始化模型客户端（需补充实际配置）
    # model_client = model_client

    return AssistantAgent(
        name="assistant_agent2",
        model_client=model_client,
        memory=[user_memory],  # 直接传递内存实例[6](@ref)
        system_message='''...''',  # 保持原有系统消息
        model_client_stream=True
    )


async def main():
    # 异步初始化所有代理（根据文献[3]优化并行初始化）
    assistant= await create_assistant()
    assistant2= await create_assistant2()

    # 构建团队（根据文献[7]优化团队配置）
    team = SelectorGroupChat(
        [assistant, assistant2],
        termination_condition=TextMentionTermination("APPROVE"),
        model_client=model_client,
        selector_prompt="""...""",  # 保持原有提示模板
    )

    # 执行任务流（根据文献[8]优化流处理）
    # try:
    stream = team.run_stream(task=input("Hello! How can I help?\n"))
    #stream = team.run_stream(task=input("Hello! How can I help?\n"))
    await Console(stream, output_stats=True)
    # except asyncio.CancelledError:
    #     print("Operation cancelled by user")
    # finally:
    #     await team.close()  # 确保资源释放

if __name__ == "__main__":
    # 标准事件循环入口（根据文献[4][9]规范）
    asyncio.run(main())

You can save this in a file, say, test_flow.py and then run python test_flow.py. The type something after the pop up.

From alibaba: the most powerful model qwq-plus only supports stream: https://www.alibabacloud.com/help/zh/model-studio/user-guide/qwq?spm=a3c0i.23458820.2359477120.1.35166e9b4ZlN2I

The error is:

    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'code': 'invalid_parameter_error', 'param': None, 'message': 'This model only support stream mode, please enable the stream parameter to access the model. ', 'type': 'invalid_request_error'}, 'id': 'chatcmpl-fb835624-c232-90e1-81f6-0b3b19c1e633', 'request_id': 'fb835624-c232-90e1-81f6-0b3b19c1e633'}

SongChiYoung · 2025-03-30T11:04:53Z

Just sharing a thought from an architectural perspective:

Rather than adding ad-hoc fixes or modifying SelectorGroupChat specifically for this, I'm considering whether it would make more sense — once PR #6063 is merged — to handle this kind of use case by configuring stream=True per model as needed.

The reason I hesitate to embed special logic for QwQ or similar agents into GroupChat (or any group structures) is that future use cases or new types of GroupChats may again require exposing stream or other model-specific flags — which could become hard to maintain.

Curious to hear thoughts from maintainers on this!

yingjiewei · 2025-03-30T11:16:43Z

By the way, the current restriction on stream = True on the model side is that OpenAIChatCompletionClient does not allow setting stream = True.

SongChiYoung · 2025-03-30T11:39:48Z

Thanks for the clarification!

Based on the error message This model only supports stream mode, please enable the stream parameter this seems to be a server-side constraint from QwQ, not a limitation in the OpenAIChatCompletionClient itself.

That’s why I think one possible path forward is to handle this at the model level (e.g. via model config / registry), so the correct stream=True flag is automatically applied based on model requirements.

This is aligned with the goal of PR #6063

to encapsulate model-specific behaviors and avoid leaking model flags like stream into higher-level constructs such as GroupChat.

That said, this is just my opinion — I believe the maintainers' judgment here is the most important.

ekzhu · 2025-03-31T06:58:18Z

Is the constraint on stream only a temporary one for QwQ or it is permanent?

I think we can enable streaming for SelectoGroupChat's built-in selector by introducing an option in SelectorGroupChat, e.g., model_client_stream so the model client will be used in streaming model. It will use create_stream rather than create.

As the next step, we can enable streaming of orchestration events through run_stream so the streaming output will be visible from consumer of run_stream. Issue here: #5127

@yingjiewei are you interested in submitting a PR for this? Just focus on adding model_client_stream option to SelectorGroupChat. You can see the contributing guide on how to set up local development environment: https://github.com/microsoft/autogen/blob/main/python/README.md

SongChiYoung · 2025-04-12T05:51:32Z

Fixed via #6145 — SelectorGroupChat now supports streaming mode for select_speaker.
Please take a look when you have time.

yingjiewei added the needs-triage label Mar 29, 2025

SongChiYoung mentioned this issue Mar 31, 2025

[BugFix][Refactor] Modular Transformer Pipeline and Fix Gemini/Anthropic Empty Content Handling #6063

Merged

2 tasks

ekzhu changed the title ~~stream = True is required for the SelectorGroupChat~~ Model client streaming from the selector of SelectorGroupChat Apr 2, 2025

ekzhu added help wanted Extra attention is needed and removed needs-triage labels Apr 2, 2025

SongChiYoung linked a pull request Apr 12, 2025 that will close this issue

FEAT: SelectorGroupChat could using stream inner select_prompt #6286

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model client streaming from the selector of SelectorGroupChat #6145

Model client streaming from the selector of SelectorGroupChat #6145

yingjiewei commented Mar 29, 2025 •

edited by ekzhu

Loading

SongChiYoung commented Mar 30, 2025

yingjiewei commented Mar 30, 2025 •

edited

Loading

SongChiYoung commented Mar 30, 2025

yingjiewei commented Mar 30, 2025

SongChiYoung commented Mar 30, 2025 •

edited

Loading

ekzhu commented Mar 31, 2025 •

edited

Loading

SongChiYoung commented Apr 12, 2025

Model client streaming from the selector of SelectorGroupChat #6145

Model client streaming from the selector of SelectorGroupChat #6145

Comments

yingjiewei commented Mar 29, 2025 • edited by ekzhu Loading

Feature Request

What happened?

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system

SongChiYoung commented Mar 30, 2025

yingjiewei commented Mar 30, 2025 • edited Loading

SongChiYoung commented Mar 30, 2025

yingjiewei commented Mar 30, 2025

SongChiYoung commented Mar 30, 2025 • edited Loading

ekzhu commented Mar 31, 2025 • edited Loading

SongChiYoung commented Apr 12, 2025

yingjiewei commented Mar 29, 2025 •

edited by ekzhu

Loading

yingjiewei commented Mar 30, 2025 •

edited

Loading

SongChiYoung commented Mar 30, 2025 •

edited

Loading

ekzhu commented Mar 31, 2025 •

edited

Loading