-
Notifications
You must be signed in to change notification settings - Fork 10
Add MLJ compliant docstrings #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @josephsdavid A couple of remarks;
|
src/MLJInterface.jl
Outdated
weights=true, | ||
descr="Microsoft LightGBM FFI wrapper: Classifier", | ||
weights=true | ||
# descr="Microsoft LightGBM FFI wrapper: Classifier", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, how come you're commenting these ones out? And if there's a good reason for it, I'd expect it to be deleted rather than commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops! I meant to delete them! There is a good reason, the existence of a docstring after the model metadata is created overwrites the descr field i believe, making it no longer needed (paging @ablaom to confirm, there is a reason but i may have mixed it up :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MLJ model trait
docstring (alias descr
) used to be for a short summary string, which was not that useful, in retrospect. Now it is not to be overloaded but instead falls back to the full docstring (the one @josephsdavid has worked on here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, yes, these should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed and replaced with human_name="LightGBM classifier"
and regressor respectively, @ablaom l saw it was suggested in previous comments, let me know if this is fine, or should I remove completely this entry.
hah i was so excited to have all the parameters documented i missed the other pieces of work 😓 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josephsdavid Thanks for this mammoth effort. 🦣
I don't see sections "Fitted parameters" or "Report", which are required.
Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?
I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.
Will do! going to go over more closely over the weekend :) |
Co-authored-by: Anthony Blaom, PhD <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josephsdavid for the progress! We're getting there.
Particular attention is still needed in the examples. If you could please check they run, that will save me some review time.
Hi @josephsdavid it's been quite a while since this PR was submitted and you've put a significant effort to add the MLJ compliant docstrings. I was wondering if it was possible for you to update your branch with the latest LightGBM.jl master and push it again so hopefully the PR can be re-reviewed and merged :) Also, do you happen to know what's the best way to test these docs are rendering correctly. |
I think we can safely say @josephsdavid has moved on. This work was initiated as part of a Google Season of Docs project that finished more than a year ago. I can have a go at rebasing, and adding my own suggestions above, but it would be good to have some reassurance that a maintainer is available to make a final review in the next few weeks, before I invest the time, thanks! |
That's amazing. Thank you so much @ablaom for offering to help with rebasing and your suggestions. I'll make sure we can conduct a final review within the next few weeks so this work on MLJ compliant docs can be merged in. Just to double check, these changes would apply to the MLJInterface only and won't be required in the rest of the code? It's to get an idea of the future maintenance in case some changes were done, for e.g. there's a piece of work to bring all parameters in (which are currently not available on master/released LightGBM.jl) so we can then document them accordingly to what's been done in this PR. |
Hi @ablaom. I have rebased with the latest LightGBM master and made updates in this initial PR as per previous comments to address them so just wanted to let you know this won't be required. I updated all examples as they were not working before and tested locally so they're fine now. I noticed the parameters don't have constraints in the docs, e.g. |
@kainkad Thanks for moving this along, which will make my job easier. And thanks for your patience. I don't have good suggestion for testing the doc-string rendering but I'll do my best in my review. The current state can always be checked by going here (for the regressor). This entry will only be updated after you have tagged a new release and the MLJ model registry has been updated. Ideally, post an issue at MLJModels.jl requesting the update, although this happens from time to time anyway, and the latest versions of all models are revisited in that update. You can can wrap example code to make it execute as part of doc-generation, but we generally haven't done this for the following reason: The MLJ idiomatic way of loading mode code uses Let me know if you have further questions, and when you are ready for a final review. You can find the full official docstring spec here. |
Thank you @ablaom for all the information and the links. I have updated the docs accordingly so it's ready for the final review. Just a comment on the parameters. The currently released LightGBM.jl has about 60% of all available parameters so the documentation in this PR includes only those available (both in the core lgbm wrapper and via the MLJInterface) and not all parameters available in the underlying C code. I have been working on implementing the remainder which is currently on a separate branch so the only question I have once this work is ready to be released and more params are available, is the current approach to include all available parameters in the MLJ documentation, or for e.g. could there an external reference to the lgbm parameters made instead? The reason why I'm asking is that there's over 130 so the parameters section can be quite lengthy but as long as it's not an issue I don't mind updating the remainder when it's fully implemented. |
No you raise a good point. It's not good to have all this parameter documentation duplicated. I think it is fine to have an external link for parameters that correspond to parameters in the core implementation. We have allowed this is for our XGBoost wrapper as well. Perhaps you just list the parameters that are provided, especially if this is different from the full lightgbm set. |
I've updated the link to the parameters like it's done for xgboost and given that the current version doesn't support all params and some of the defaults are different, I just listed the available params and their defaults instead of full descriptions and their interactions which can be checked by following the link to the official docs. I also moved the docs to a separate file which I think keeps the MLJInterface neat. When the release for all parameters is ready, then just a link should be fine because that work also includes aligning the defaults with the official docs so there are no discrepancies. |
Thanks @kainkad, I'll try to review by the end of next week. |
Somehow, when I try to inspect the docstring I'm not getting the new doctoring. @kainkad Any idea what's going on here? help?> LGBMRegression
search: LGBMRegression make_regression
LGBMRegression(; [
objective = "regression",
boosting = "gbdt",
num_iterations = 100,
learning_rate = .1,
num_leaves = 31,
max_depth = -1,
tree_learner = "serial",
num_threads = 0,
histogram_pool_size = -1.,
min_data_in_leaf = 20,
min_sum_hessian_in_leaf = 1e-3,
max_delta_step = 0.,
lambda_l1 = 0.,
lambda_l2 = 0.,
min_gain_to_split = 0.,
feature_fraction = 1.,
feature_fraction_bynode = 1.,
feature_fraction_seed = 2,
bagging_fraction = 1.,
bagging_freq = 0,
bagging_seed = 3,
early_stopping_round = 0,
extra_trees = false
extra_seed = 6,
max_bin = 255,
bin_construct_sample_cnt = 200000,
data_random_seed = 1,
is_enable_sparse = true,
save_binary = false,
categorical_feature = Int[],
use_missing = true,
linear_tree = false,
feature_pre_filter = true,
is_unbalance = false,
boost_from_average = true,
alpha = 0.9,
drop_rate = 0.1,
max_drop = 50,
skip_drop = 0.5,
xgboost_dart_mode = false,
uniform_drop = false,
drop_seed = 4,
top_rate = 0.2,
other_rate = 0.1,
min_data_per_group = 100,
max_cat_threshold = 32,
cat_l2 = 10.0,
cat_smooth = 10.0,
metric = [""],
metric_freq = 1,
is_provide_training_metric = false,
eval_at = Int[1, 2, 3, 4, 5],
num_machines = 1,
local_listen_port = 12400,
time_out = 120,
machine_list_filename = "",
device_type="cpu",
gpu_use_dp = false,
gpu_platform_id = -1,
gpu_device_id = -1,
num_gpu = 1,
force_col_wise = false
force_row_wise = false
])
Return a LGBMRegression estimator. |
Thank you for sending this and for checking. So the It's not very user friendly. Exporting the |
Ah, I get it, thank you! I don't think it's necessary to make any new exports, unless you want to for some other reason. In idiomatic MLJ, you use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just the one suggestion to improve readability.
When you have tagged a new release, open an issue herehttps://github.com/JuliaAI/MLJModels.jl/issues) to update the MLJ model registry. |
Thank you for reviewing this PR. These changes have been released and tagged. I created an issue: JuliaAI/MLJModels.jl#586 |
In service of #913, as documented here !