Skip to content

add pluggable loadFormat support for model loaders #318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jeremyeder
Copy link
Member

fixes #317

Also adds runai_streamer example and tests.

- Add missing chart dependencies (common, redis)
- Fix YAML formatting in Chart.yaml (document start, line length)
- Bump chart version to 1.0.19 (required by chart-testing)
- Configure typos checker to handle base64 SVG content
- All GitHub Actions lint checks now pass
- Add debug output to test connection pod with service lookups and timing
- Add cluster state debugging before and after chart-testing
- Add verbose curl output and error handling
- Add pod logs collection for failed pods
- Add timeout to service wait loop to prevent infinite hangs
The chart-testing install was failing because the Gateway resource requires
Istio validation webhook which is not available in the CI environment.
Disabling gateway creation in CI values resolves the connection refused error.
- Enhanced GitHub Actions workflow with verbose debugging for ct install failures
- Disabled test pods in CI values to isolate chart installation issues
- Added helm releases and cluster state checking on ct install failure
- Moved epp, prefill, and decode properties from gateway to modelservice section
- Merged duplicate modelservice sections in runai-streamer-values.yaml
- Both values files now pass helm template schema validation

Resolves: "Additional property epp/decode/prefill is not allowed" errors
@jeremyeder jeremyeder changed the title DRAFT add pluggable loadFormat support for model loaders add pluggable loadFormat support for model loaders Jun 13, 2025
@nerdalert
Copy link
Member

nerdalert commented Jun 14, 2025

@jeremyeder dug around on why this was failing in a deployment. Since the controller only exposes the variables that are present in its TemplateVars struct from modelservice when it renders the preset ConfigMap.

Since .LoadFormat wasn't part of the TemplateVars struct, it will throw the error {{ .LoadFormat }} evaluated to nil and the renderer threw the ,"error":"error executing template: template: template:19:20: executing \"template\" at <.LoadFormat>: can't evaluate field LoadFormat in type in the modelservice pod.

This PR gets things working for me llm-d/llm-d-model-service#221 I didn't test it with the load-format validation yet, just tested that modelservice doesn't blow up on the unknown var which * I think * is all we need to get unblocked.

Can test it with swapping this out in values.yaml which is v0.0.10 patched with the --load-format var:

    # -- Model Service controller image registry
    registry: ghcr.io

    # -- Model Service controller image repository
    repository: nerdalert/llm-d-model-service

    # -- Model Service controller image tag
    tag: "0.0.1"

Will look into CI as well. Not sure how this passes with modelservice crashing in a deployment.

cc/ @sriumcp PTAL and sanity check me here if you get a sec 🙏

@nerdalert
Copy link
Member

nerdalert commented Jun 14, 2025

Model service is disabled in .github/workflows/test.yaml https://github.com/llm-d/llm-d-deployer/blob/main/charts/llm-d/ci/default-values.yaml#L12 so it won't render those configmaps since Helm is skipping modelservice. vllm-sim might be an option and we also have a full e2e on ec2 with https://github.com/llm-d/llm-d-deployer/blob/main/.github/workflows/e2e-aws.yaml (only run on trigger) that I will change to be dispatched on a PR and not just main, that would allow a real e2e that would catch modelservice deployment issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for loading models using the Run:AI Model Streamer
2 participants