Skip to content

Exploit a hierarchical multi-table schema for sklearn API #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
marcboulle opened this issue Apr 16, 2025 · 1 comment
Open

Exploit a hierarchical multi-table schema for sklearn API #400

marcboulle opened this issue Apr 16, 2025 · 1 comment
Labels
Priority/0-High To do now Size/Days Some days of work Status/ReadyForDev The issue is ready to be developed or to be investigated deeply
Milestone

Comments

@marcboulle
Copy link

Description

La spécification actuelle (10.3.1) des datasets pour l'API sklearn est basée sur un schéma entité-relation, proche de celui utilisé par l'outil FeatureTools.

X = {
    "main_table": "Accidents",
    "tables": {
        "Accidents": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
        "Vehicles": (vehicles_df, ["AccidentId", "VehicleId"]),
        "Users": (users_df, ["AccidentId", "VehicleId"]),
        "Places": (places_df, "AccidentId"),
    },
    "relations": [
        ("Accidents", "Vehicles"),
        ("Vehicles", "Users"),
        ("Accidents", "Places", True),
    ],
}

Il s'agit de passer à un schéma hiérarchique, qui est celui des dictionnaires multi-table de Khiops.

      X = {
          "main_table": (accidents_df,  ["AccidentId"]),
          "additional_data_tables": {
              "Vehicles": (vehicles_df, ["AccidentId", "VehicleId"]),
              "Vehicles/Users": (users_df, ["AccidentId", "VehicleId"]),
              "Place": (places_df, ["AccidentId"], True),
          }
      }

Cela présente les avantages suivants:

  • conformité au modèle de donnée des dictionnaires Khiops
  • simplicité accrue
  • transition facilitée de l'API sklearn vers l'API core

Questions/Ideas

Prévoir un chemin de déprécation, en acceptant l'ancienne spécification avec un warning

@marcboulle marcboulle changed the title Exploit a hirarchical multi-table schema for sklearn API Exploit a hierarchical multi-table schema for sklearn API Apr 16, 2025
@popescu-v
Copy link
Collaborator

We need to set a suppression target for the older syntax. For a start, we can put 11.1.0.

@popescu-v popescu-v added Status/ReadyForDev The issue is ready to be developed or to be investigated deeply Priority/0-High To do now Size/Days Some days of work labels Apr 16, 2025
@popescu-v popescu-v added this to the 11.0.0.0 milestone Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority/0-High To do now Size/Days Some days of work Status/ReadyForDev The issue is ready to be developed or to be investigated deeply
Projects
None yet
Development

No branches or pull requests

2 participants