Add MLJ compliant docstrings #130

josephsdavid · 2022-10-26T19:22:41Z

In service of #913, as documented here !

yaxxie · 2022-10-27T09:54:24Z

Hi @josephsdavid
Thanks for the contribution!

A couple of remarks;

Please avoid simply reformatting code. This makes diffs harder to read and muddies the purpose of the contribution
Please fill in the description to the PR. For example, a link to the documentation about "MLJ docstrings" would be useful to the reader.
Please also don't forget to add yourself to the contributors list CONTRIBUTORS.md 🙂

yaxxie · 2022-10-27T10:00:14Z

src/MLJInterface.jl

-    weights=true,
-    descr="Microsoft LightGBM FFI wrapper: Classifier",
+    weights=true
+    # descr="Microsoft LightGBM FFI wrapper: Classifier",


Specifically, how come you're commenting these ones out? And if there's a good reason for it, I'd expect it to be deleted rather than commented out.

Oh whoops! I meant to delete them! There is a good reason, the existence of a docstring after the model metadata is created overwrites the descr field i believe, making it no longer needed (paging @ablaom to confirm, there is a reason but i may have mixed it up :) )

The MLJ model trait docstring (alias descr) used to be for a short summary string, which was not that useful, in retrospect. Now it is not to be overloaded but instead falls back to the full docstring (the one @josephsdavid has worked on here).

So, yes, these should be deleted.

Removed and replaced with human_name="LightGBM classifier" and regressor respectively, @ablaom l saw it was suggested in previous comments, let me know if this is fine, or should I remove completely this entry.

src/MLJInterface.jl

josephsdavid · 2022-10-27T15:44:34Z

Please fill in the description to the PR. For example, a link to the documentation about "MLJ docstrings" would be useful to the reader.

hah i was so excited to have all the parameters documented i missed the other pieces of work 😓

src/MLJInterface.jl

ablaom

@josephsdavid Thanks for this mammoth effort. 🦣

I don't see sections "Fitted parameters" or "Report", which are required.

Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?

I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.

josephsdavid · 2022-10-28T15:11:11Z

@josephsdavid Thanks for this mammoth effort. 🦣

I don't see sections "Fitted parameters" or "Report", which are required.

Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?

I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.

Will do! going to go over more closely over the weekend :)

Co-authored-by: Anthony Blaom, PhD <[email protected]>

src/MLJInterface.jl

ablaom

Thanks @josephsdavid for the progress! We're getting there.

Particular attention is still needed in the examples. If you could please check they run, that will save me some review time.

src/MLJInterface.jl

kainkad · 2025-01-14T09:32:12Z

Hi @josephsdavid it's been quite a while since this PR was submitted and you've put a significant effort to add the MLJ compliant docstrings. I was wondering if it was possible for you to update your branch with the latest LightGBM.jl master and push it again so hopefully the PR can be re-reviewed and merged :) Also, do you happen to know what's the best way to test these docs are rendering correctly.

ablaom · 2025-02-11T21:31:17Z

I think we can safely say @josephsdavid has moved on. This work was initiated as part of a Google Season of Docs project that finished more than a year ago.

I can have a go at rebasing, and adding my own suggestions above, but it would be good to have some reassurance that a maintainer is available to make a final review in the next few weeks, before I invest the time, thanks!

kainkad · 2025-02-12T14:41:51Z

I think we can safely say @josephsdavid has moved on. This work was initiated as part of a Google Season of Docs project that finished more than a year ago.

I can have a go at rebasing, and adding my own suggestions above, but it would be good to have some reassurance that a maintainer is available to make a final review in the next few weeks, before I invest the time, thanks!

That's amazing. Thank you so much @ablaom for offering to help with rebasing and your suggestions. I'll make sure we can conduct a final review within the next few weeks so this work on MLJ compliant docs can be merged in. Just to double check, these changes would apply to the MLJInterface only and won't be required in the rest of the code? It's to get an idea of the future maintenance in case some changes were done, for e.g. there's a piece of work to bring all parameters in (which are currently not available on master/released LightGBM.jl) so we can then document them accordingly to what's been done in this PR.

kainkad · 2025-03-04T10:08:57Z

I think we can safely say @josephsdavid has moved on. This work was initiated as part of a Google Season of Docs project that finished more than a year ago.

I can have a go at rebasing, and adding my own suggestions above, but it would be good to have some reassurance that a maintainer is available to make a final review in the next few weeks, before I invest the time, thanks!

Hi @ablaom. I have rebased with the latest LightGBM master and made updates in this initial PR as per previous comments to address them so just wanted to let you know this won't be required. I updated all examples as they were not working before and tested locally so they're fine now. I noticed the parameters don't have constraints in the docs, e.g. num_leaves has a default = 31 but the constraints are: (1 < _ <= 131072). Would MLJ users benefit from including these constraints as well? If so, I'll go through the remainder of the params and update the docs where relevant. It would be great if you could then review the final and see if there's anything from MLJ point of view that should be included. Also, if you know any good practice to follow on how to render/test these docs to make sure they are displayed correctly when using with MLJ, this would be helpful. All CI tests passed, except for the Documentation builder so I'm looking into this as well.

src/MLJInterface.jl

ablaom · 2025-03-04T20:09:04Z

@kainkad Thanks for moving this along, which will make my job easier. And thanks for your patience.

I don't have good suggestion for testing the doc-string rendering but I'll do my best in my review. The current state can always be checked by going here (for the regressor). This entry will only be updated after you have tagged a new release and the MLJ model registry has been updated. Ideally, post an issue at MLJModels.jl requesting the update, although this happens from time to time anyway, and the latest versions of all models are revisited in that update.

You can can wrap example code to make it execute as part of doc-generation, but we generally haven't done this for the following reason: The MLJ idiomatic way of loading mode code uses @load but that function won't work until after the model is registered. In this case, I it wouldn't be a problem, because the models are already registered. However, unless you have some experience doing this sort of thing with Documenter.jl, you may want to leave the examples as static text.

Let me know if you have further questions, and when you are ready for a final review. You can find the full official docstring spec here.

kainkad · 2025-03-10T16:26:39Z

@kainkad Thanks for moving this along, which will make my job easier. And thanks for your patience.

I don't have good suggestion for testing the doc-string rendering but I'll do my best in my review. The current state can always be checked by going here (for the regressor). This entry will only be updated after you have tagged a new release and the MLJ model registry has been updated. Ideally, post an issue at MLJModels.jl requesting the update, although this happens from time to time anyway, and the latest versions of all models are revisited in that update.

You can can wrap example code to make it execute as part of doc-generation, but we generally haven't done this for the following reason: The MLJ idiomatic way of loading mode code uses @load but that function won't work until after the model is registered. In this case, I it wouldn't be a problem, because the models are already registered. However, unless you have some experience doing this sort of thing with Documenter.jl, you may want to leave the examples as static text.

Let me know if you have further questions, and when you are ready for a final review. You can find the full official docstring spec here.

Thank you @ablaom for all the information and the links. I have updated the docs accordingly so it's ready for the final review. Just a comment on the parameters. The currently released LightGBM.jl has about 60% of all available parameters so the documentation in this PR includes only those available (both in the core lgbm wrapper and via the MLJInterface) and not all parameters available in the underlying C code. I have been working on implementing the remainder which is currently on a separate branch so the only question I have once this work is ready to be released and more params are available, is the current approach to include all available parameters in the MLJ documentation, or for e.g. could there an external reference to the lgbm parameters made instead? The reason why I'm asking is that there's over 130 so the parameters section can be quite lengthy but as long as it's not an issue I don't mind updating the remainder when it's fully implemented.

ablaom · 2025-03-10T19:41:39Z

No you raise a good point. It's not good to have all this parameter documentation duplicated. I think it is fine to have an external link for parameters that correspond to parameters in the core implementation. We have allowed this is for our XGBoost wrapper as well. Perhaps you just list the parameters that are provided, especially if this is different from the full lightgbm set.

kainkad · 2025-03-12T11:02:30Z

No you raise a good point. It's not good to have all this parameter documentation duplicated. I think it is fine to have an external link for parameters that correspond to parameters in the core implementation. We have allowed this is for our XGBoost wrapper as well. Perhaps you just list the parameters that are provided, especially if this is different from the full lightgbm set.

I've updated the link to the parameters like it's done for xgboost and given that the current version doesn't support all params and some of the defaults are different, I just listed the available params and their defaults instead of full descriptions and their interactions which can be checked by following the link to the official docs. I also moved the docs to a separate file which I think keeps the MLJInterface neat. When the release for all parameters is ready, then just a link should be fine because that work also includes aligning the defaults with the official docs so there are no discrepancies.

ablaom · 2025-03-12T20:20:06Z

Thanks @kainkad, I'll try to review by the end of next week.

ablaom · 2025-03-16T20:12:15Z

Somehow, when I try to inspect the docstring I'm not getting the new doctoring. @kainkad Any idea what's going on here?

help?> LGBMRegression
search: LGBMRegression make_regression

  LGBMRegression(; [
      objective = "regression",
      boosting = "gbdt",
      num_iterations = 100,
      learning_rate = .1,
      num_leaves = 31,
      max_depth = -1,
      tree_learner = "serial",
      num_threads = 0,
      histogram_pool_size = -1.,
      min_data_in_leaf = 20,
      min_sum_hessian_in_leaf = 1e-3,
      max_delta_step = 0.,
      lambda_l1 = 0.,
      lambda_l2 = 0.,
      min_gain_to_split = 0.,
      feature_fraction = 1.,
      feature_fraction_bynode = 1.,
      feature_fraction_seed = 2,
      bagging_fraction = 1.,
      bagging_freq = 0,
      bagging_seed = 3,
      early_stopping_round = 0,
      extra_trees = false
      extra_seed = 6,
      max_bin = 255,
      bin_construct_sample_cnt = 200000,
      data_random_seed = 1,
      is_enable_sparse = true,
      save_binary = false,
      categorical_feature = Int[],
      use_missing = true,
      linear_tree = false,
      feature_pre_filter = true,
      is_unbalance = false,
      boost_from_average = true,
      alpha = 0.9,
      drop_rate = 0.1,
      max_drop = 50,
      skip_drop = 0.5,
      xgboost_dart_mode = false,
      uniform_drop = false,
      drop_seed = 4,
      top_rate = 0.2,
      other_rate = 0.1,
      min_data_per_group = 100,
      max_cat_threshold = 32,
      cat_l2 = 10.0,
      cat_smooth = 10.0,
      metric = [""],
      metric_freq = 1,
      is_provide_training_metric = false,
      eval_at = Int[1, 2, 3, 4, 5],
      num_machines = 1,
      local_listen_port = 12400,
      time_out = 120,
      machine_list_filename = "",
      device_type="cpu",
      gpu_use_dp = false,
      gpu_platform_id = -1,
      gpu_device_id = -1,
      num_gpu = 1,
      force_col_wise = false
      force_row_wise = false
  ])

  Return a LGBMRegression estimator.

kainkad · 2025-03-17T13:57:13Z

Somehow, when I try to inspect the docstring I'm not getting the new doctoring. @kainkad Any idea what's going on here?

help?> LGBMRegression
search: LGBMRegression make_regression

  LGBMRegression(; [
      objective = "regression",
      boosting = "gbdt",
      num_iterations = 100,
      learning_rate = .1,
      num_leaves = 31,
      max_depth = -1,
      tree_learner = "serial",
      num_threads = 0,
      histogram_pool_size = -1.,
      min_data_in_leaf = 20,
      min_sum_hessian_in_leaf = 1e-3,
      max_delta_step = 0.,
      lambda_l1 = 0.,
      lambda_l2 = 0.,
      min_gain_to_split = 0.,
      feature_fraction = 1.,
      feature_fraction_bynode = 1.,
      feature_fraction_seed = 2,
      bagging_fraction = 1.,
      bagging_freq = 0,
      bagging_seed = 3,
      early_stopping_round = 0,
      extra_trees = false
      extra_seed = 6,
      max_bin = 255,
      bin_construct_sample_cnt = 200000,
      data_random_seed = 1,
      is_enable_sparse = true,
      save_binary = false,
      categorical_feature = Int[],
      use_missing = true,
      linear_tree = false,
      feature_pre_filter = true,
      is_unbalance = false,
      boost_from_average = true,
      alpha = 0.9,
      drop_rate = 0.1,
      max_drop = 50,
      skip_drop = 0.5,
      xgboost_dart_mode = false,
      uniform_drop = false,
      drop_seed = 4,
      top_rate = 0.2,
      other_rate = 0.1,
      min_data_per_group = 100,
      max_cat_threshold = 32,
      cat_l2 = 10.0,
      cat_smooth = 10.0,
      metric = [""],
      metric_freq = 1,
      is_provide_training_metric = false,
      eval_at = Int[1, 2, 3, 4, 5],
      num_machines = 1,
      local_listen_port = 12400,
      time_out = 120,
      machine_list_filename = "",
      device_type="cpu",
      gpu_use_dp = false,
      gpu_platform_id = -1,
      gpu_device_id = -1,
      num_gpu = 1,
      force_col_wise = false
      force_row_wise = false
  ])

  Return a LGBMRegression estimator.

Thank you for sending this and for checking. So the LGBMRegression is the name of the model in core LightGBM.jl and the docs for this are pulled from the struct docstrings in estimators.jl. In the MLJInterface the models are called slightly differently: LGBMRegressor and LGBMClassifier . For the help info in julia they need to be called ?LightGBM.MLJInterface.LGBMRegressor as in the below:

It's not very user friendly. Exporting the LGBMRegressor and LGBMClassifier could make it easier as long as it won't introduce any unnecessary breaking change on MLJ side or users namespace. Calling the full path shows where they come from but I can imagine the user wouldn't necessarily know the provenience of the model types.

ablaom · 2025-03-17T22:54:48Z

Ah, I get it, thank you!

I don't think it's necessary to make any new exports, unless you want to for some other reason. In idiomatic MLJ, you use @load to load model code, and there is no need for these names to be public for that purpose.

src/docstrings.jl

ablaom

Looks good to me. Just the one suggestion to improve readability.

ablaom · 2025-03-17T23:13:12Z

When you have tagged a new release, open an issue here https://github.com/JuliaAI/MLJModels.jl/issues) to update the MLJ model registry.

kainkad · 2025-03-18T16:48:30Z

When you have tagged a new release, open an issue herehttps://github.com/JuliaAI/MLJModels.jl/issues) to update the MLJ model registry.

Thank you for reviewing this PR. These changes have been released and tagged. I created an issue: JuliaAI/MLJModels.jl#586

Update MLJInterface.jl

9849f62

yaxxie changed the title ~~Add MLJ compliant docstrings!~~ Add MLJ compliant docstrings Oct 27, 2022

yaxxie reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

yaxxie reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Oct 27, 2022

View reviewed changes

josephsdavid and others added 3 commits November 30, 2022 12:49

Apply suggestions from code review

eddb388

Co-authored-by: Anthony Blaom, PhD <[email protected]>

all done?

dbd7210

all done

b86e763

ablaom reviewed Dec 1, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Dec 1, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Dec 1, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Dec 1, 2022

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Dec 1, 2022

View reviewed changes

Dluzniak and others added 2 commits March 3, 2025 10:37

Merge remote-tracking branch 'upstream/master'

b0598cb

updated with latest lgbm master and as per PR comments

5da0e04

kainkad reviewed Mar 4, 2025

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

kainkad reviewed Mar 4, 2025

View reviewed changes

src/MLJInterface.jl Outdated Show resolved Hide resolved

updated modules in autodocs

7ca1b7a

kainkad added 2 commits March 6, 2025 11:37

update constraints and examples with feature importances

d012391

added see also section

476fd3e

updated listing for hyperparams

233ae65

ablaom reviewed Mar 17, 2025

View reviewed changes

src/docstrings.jl Show resolved Hide resolved

ablaom reviewed Mar 17, 2025

View reviewed changes

src/docstrings.jl Show resolved Hide resolved

ablaom approved these changes Mar 17, 2025

View reviewed changes

📚

57fb495

kainkad merged commit 1443672 into IQVIA-ML:master Mar 18, 2025
25 checks passed

Add MLJ compliant docstrings #130

Add MLJ compliant docstrings #130

Uh oh!

Conversation

josephsdavid commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaxxie commented Oct 27, 2022

Uh oh!

yaxxie Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

josephsdavid Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

ablaom Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

ablaom Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

kainkad Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

josephsdavid commented Oct 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ablaom left a comment

Choose a reason for hiding this comment

Uh oh!

josephsdavid commented Oct 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ablaom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kainkad commented Jan 14, 2025

Uh oh!

ablaom commented Feb 11, 2025

Uh oh!

kainkad commented Feb 12, 2025

Uh oh!

kainkad commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

ablaom commented Mar 4, 2025

Uh oh!

kainkad commented Mar 10, 2025

Uh oh!

ablaom commented Mar 10, 2025

Uh oh!

kainkad commented Mar 12, 2025

Uh oh!

ablaom commented Mar 12, 2025

josephsdavid commented Oct 26, 2022 •

edited

Loading

ablaom commented Mar 17, 2025 •

edited

Loading