-
Notifications
You must be signed in to change notification settings - Fork 504
Pyproject.toml support for DockerSettings #3292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
@htahir1 Any ideas for the general docs structure of this? Or do we just leave it as-is for now? |
So I think the docs need to change in these places too:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some initial comments.
@@ -68,6 +68,12 @@ | |||
} | |||
UV_DEFAULT_ARGS = {"no-cache-dir": None} | |||
|
|||
# TODO: these don't actually install any extras. Should we include all extras? No extras? Exclude dev extras? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder for the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still relevant for this PR?
- The packages specified via the `required_integrations` | ||
- The packages defined inside a pyproject.toml file given by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a misalignment with the docs here. In the docs, we say the priority is as follows:
- Packages in the local env
- Packages for the stack
- Required Integrations
- Requirements (could be a list or a path to a requirements.txt)
- Pyproject.toml next
In the docstring here, 4 and 5 are switched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the docstring here is correct in the code and the docs need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I fixed the error in the docs.
In general, there are two scenarios:
- You don't explicitly specify any requirements using docker settings: In this case we first look for a
requirements.txt
(-> highest priority), and if it doesn't exist we look for apyproject.toml
- You do explicitly specify some requirements (using e.g.
DockerSettings.required_integrations
orDockerSettings.requirements
): In this case the order in the docstring is the one that is being used. Requirements being last means it is the highest priority (it will override values from all previous steps), which is in line with the first case.
Do you think these make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment about the confusion that I had with the interaction between the implicit and explicit definition of requirements but outside of that, the explanation looks good here.
if not any( | ||
[ | ||
docker_settings.replicate_local_python_environment, | ||
docker_settings.required_integrations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that adding an extra integration (or a package) to be installed disables the discovery of the implicit requirements? So, if I was not using any of these settings before and now want to add sklearn
to my docker settings, suddenly the requirements.txt
will not be detected. I am not sure if this is the optimal behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'm not super convinced by it myself to be honest, but what would be a better alternative in your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess I do not have a super clear alternative either.
The reason why I was a bit hesitant here was this. We have a default behavior that decides whether we look for these files or not. By adding an additional integration, (that I intuitively thought would not change this behavior), I change the way that it looks for these files. I guess the replicate_local_python_environment
flag at least allows people to control the behavior. Perhaps we can write a sentence about this in the docs?
] = Field(default=None, union_mode="left_to_right") | ||
pyproject_path: Optional[str] = None | ||
pyproject_export_command: Optional[List[str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, it is possible to set this as follows: DockerSettings(pyproject_export_command=[......])
without an explicit pyproject_path
as long as there is a pyproject.toml
file to work with, in the current source root. But as soon as someone puts an integration, requirement or the replicate flag, this becomes unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a validator can be helpful to detect this early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you validate here? I don't see the problem of this being unused if there is no pyproject.toml
to be honest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would validate the following two scenarios:
- If
pyproject_export_command
is set, there needs to be apyproject.toml
in the directory. - If
required_integrations
are set,pyproject_export_command
is unused.
However, both are not so critical, I just thought early detection might help us improve the UX a bit.
🔍 Broken Links ReportSummary
Details
📂 Full file paths
|
@michael-zenml @htahir1 May I add my 5 cents here? Does adding P.S. Just noticed that it is a long runner already - I would love to see it priotorised, as it will ease the use of Dockers for our case, without the need to rely on external build process. |
A small companion PR to that Poetry activity: #3470 |
Documentation Link Check Results❌ Absolute links check failed |
@avishniakov Users will have full control to specify a command that will be run to extract those requirements from the |
Yeah, if we can do something like |
@avishniakov Yep, we will use exactly this command as one of the default values even. I wasn't sure however if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦞
... | ||
``` | ||
|
||
This will run `pip freeze` to get a list of your local packages, and allows you to easily replicate your local Python environment in the Docker container, ensuring that your pipeline runs with the same dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this does not pick up locally installed libraries we should say so here
"Files must be included in the Docker image when trying to " | ||
"install a local python package." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Files must be included in the Docker image when trying to " | |
"install a local python package." | |
"Files must be included in the Docker image when trying to " | |
"install a local python package. You can do so by using the `allow_including_files_in_images` attribute of your docker settings: `DockerSettings(..., allow_including_files_in_images=True)`" |
Lets save the user an additional odyssey to find out how to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this literally in 100 code snippets on the same docs page?
Describe changes
This PR adds the following functionality:
pyproject.toml
files to be specified for installing requirements in Docker images.ZenML will extract the requirements from the pyproject.toml using one of the default or a custom user-provided command.
requirements.txt
orpyproject.toml
file in the source root if no requirements or pyproject file was specified in theDockerSettings
.Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes