Skip to content

Python exercise [03_1_encoding_scaling]: xgboost python is about to support categorical data while R hasn't yet. #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sandylaker opened this issue May 17, 2025 · 0 comments

Comments

@sandylaker
Copy link
Contributor

sandylaker commented May 17, 2025

However, it says "the feature is experimental and has limited features. Only the Python package is fully supported". So in R it's not supported and mlr3 also does not support this as of now Afaik. So would propose to keep it to be for now but open an issue that needs to be resolved in future when this categorical support is not experimental anymore. Also it looks like it's basically internally doing one hot encoding: "categorical data the split is defined depending on whether partitioning or onehot encoding is used. For partition-based splits, the splits are specified as
value $\in$ categories , where categories is the set of categories in one feature. If onehot encoding is used instead, then the split is defined as value == category "

Originally posted by @giuseppec in #26 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant