Skip to content

fix: tool.setuptools.packages.find exclude -> include #2618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 19, 2025

Conversation

noooop
Copy link
Contributor

@noooop noooop commented May 2, 2025

when mteb is installed, it also installs docs, scripts, and tests into site-packages.

These file names are almost used in every project, so installing them in site-packages can cause conflicts.

chang to include = ["mteb*"], limiting the installation to only the mteb folder.

Current situation

pip uninstall mteb
Found existing installation: mteb 1.38.3
Uninstalling mteb-1.38.3:
  Would remove:
    /share/anaconda3/envs/docs/bin/mteb
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/docs/*
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/mteb-1.38.3.dist-info/*
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/mteb/*
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/scripts/...
    ....
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/tests/...
    ....

proposal

pip uninstall mteb
Found existing installation: mteb 1.38.3
Uninstalling mteb-1.38.3:
  Would remove:
    /share/anaconda3/envs/docs/bin/mteb
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/mteb-1.38.3.dist-info/*
    /share/anaconda3/envs/docs/lib/python3.12/site-packages/mteb/*
Proceed (Y/n)?

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

@KennethEnevoldsen
Copy link
Contributor

I am not sure what this PR attempts to solve? What is the current problem?

@KennethEnevoldsen KennethEnevoldsen self-assigned this May 2, 2025
@KennethEnevoldsen KennethEnevoldsen added the more information needed More information is needed before this issue can be resolved label May 2, 2025
@noooop
Copy link
Contributor Author

noooop commented May 2, 2025

I want to say that when mteb is installed, it also installs docs, scripts, and tests into site-packages.

These file names are almost used in every project, so installing them in site-packages can cause conflicts.

But it seems your CI also relies on installing these files in site-packages.

I'll find a way to fix this as much as possible.

@Samoed
Copy link
Member

Samoed commented May 2, 2025

I think this failing tests should be refactored to not rely on scripts folder

@noooop
Copy link
Contributor Author

noooop commented May 2, 2025

I will continue this work next week

@Samoed
Copy link
Member

Samoed commented May 2, 2025

I've created PR, that should solve your problems #2630

@noooop
Copy link
Contributor Author

noooop commented May 3, 2025

cool

@Samoed
Copy link
Member

Samoed commented May 5, 2025

@noooop Can you provide example project that would have conflict? Does your PR related to your vllm PR? vllm-project/vllm#17175

@noooop
Copy link
Contributor Author

noooop commented May 6, 2025

@noooop Can you provide example project that would have conflict? Does your PR related to your vllm PR? vllm-project/vllm#17175

Yes, installing mteb docs folder to site-packages conflicts with vllm docs build.

https://buildkite.com/vllm/ci/builds/19106#01968b1f-8e60-4e8c-9460-9cdcd5894576

After deleting the docs folder in site-packages, vllm docs build can be executed correctly.

vllm-project/vllm@b24c172

https://buildkite.com/vllm/ci/builds/19136#01968c34-76dc-4866-9f0b-6de0faa39559

Of course, the conditions that triggered this conflict were quite dramatic.

First, vllm docs build uses requirements/docs.txt, while mteb prepares to install using requirements/test.txt. Docs build should not encounter mteb. However, since vllm docs build use dev docker, two dependencies were installed.

https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml#L39

  - pip install -r ../../requirements/docs.txt

Secondly, vllm docs build does not use the root directory as the working directory, rather than through adding the root directory to sys.path,

https://github.com/vllm-project/vllm/blob/98834fefaaabe7219e35499ada8d6026a1f9b6a2/docs/source/conf.py#L26C1

sys.path.append(os.path.abspath(REPO_ROOT))

Just like that the root directory has lower priority than docs in site-packages.

@noooop
Copy link
Contributor Author

noooop commented May 6, 2025

By the way, mteb used an old datasets dependency (datasets>=2.19.0,<3.0.0), while vllm current testing relies on using datasets==3.0.2, I don't know if downgrading to datasets==2.19.0 (this will leads to more dependence downgrading)will have potential issues.

https://github.com/vllm-project/vllm/blob/98834fefaaabe7219e35499ada8d6026a1f9b6a2/requirements/test.txt#L98C1-L101C16

@Samoed
Copy link
Member

Samoed commented May 6, 2025

We have fixed it, but in a different branch. I've updated it #2661

@noooop
Copy link
Contributor Author

noooop commented May 6, 2025

It seems that the ci issue has been fixed.

What should I do to merge this PR?

@Samoed
Copy link
Member

Samoed commented May 6, 2025

I think you should wait to @KennethEnevoldsen review

@noooop
Copy link
Contributor Author

noooop commented May 6, 2025

I think you should wait to @KennethEnevoldsen review

ok

@Samoed Samoed requested a review from KennethEnevoldsen May 6, 2025 07:56
@KennethEnevoldsen KennethEnevoldsen removed the more information needed More information is needed before this issue can be resolved label May 8, 2025
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - I will let @Samoed merge

@Samoed
Copy link
Member

Samoed commented May 8, 2025

Hm. Tried to run on colab and got error

Traceback (most recent call last):
  File "/usr/local/bin/mteb", line 5, in <module>
    from mteb.cli import main
  File "/usr/local/lib/python3.11/dist-packages/mteb/__init__.py", line 5, in <module>
    from mteb.benchmarks.benchmarks import (
ModuleNotFoundError: No module named 'mteb.benchmarks'
!pip install git+https://github.com/noooop/mteb@build
!pip install model2vec
!mteb run -m minishlab/potion-base-2M -t CEDRClassification
import mteb
      3 from importlib.metadata import version
      4 
----> 5 from mteb.benchmarks.benchmarks import (
      6     MTEB_ENG_CLASSIC,
      7     MTEB_MAIN_RU,

ModuleNotFoundError: No module named 'mteb.benchmarks'

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's working

@KennethEnevoldsen
Copy link
Contributor

@Samoed just pinging you here

@Samoed
Copy link
Member

Samoed commented May 15, 2025

@KennethEnevoldsen I've made some changes and added comment to have your thoughts #2618 (comment) also it seems that documentation is not building I'll try to check why

@Samoed
Copy link
Member

Samoed commented May 15, 2025

I think I've fixed problem with documentation.

UPD No, I haven't

UPD2 Seems that folders without __init__ now can't be imported. I'll add to all dirs

@Samoed
Copy link
Member

Samoed commented May 15, 2025

It should be working now

@KennethEnevoldsen KennethEnevoldsen changed the title fix tool.setuptools.packages.find exclude -> include fix: tool.setuptools.packages.find exclude -> include May 19, 2025
@KennethEnevoldsen KennethEnevoldsen merged commit 1c803a1 into embeddings-benchmark:main May 19, 2025
9 checks passed
@noooop
Copy link
Contributor Author

noooop commented May 20, 2025

thanks for @Samoed 's help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants