diff --git a/DESCRIPTION b/DESCRIPTION index ebf3b8df..67195a64 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: mlr3fselect Title: Feature Selection for 'mlr3' -Version: 1.3.0.9000 +Version: 1.4.0 Authors@R: c( person("Marc", "Becker", , "marcbecker@posteo.de", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-8115-0400")), @@ -24,7 +24,7 @@ URL: https://mlr3fselect.mlr-org.com, https://github.com/mlr-org/mlr3fselect BugReports: https://github.com/mlr-org/mlr3fselect/issues Depends: - mlr3 (>= 0.23.0), + mlr3 (>= 1.0.1), R (>= 3.1.0) Imports: bbotk (>= 1.6.0), diff --git a/NEWS.md b/NEWS.md index 309d6f45..8bca0192 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# mlr3fselect (development version) +# mlr3fselect 1.4.0 * feat: Introduce asynchronous optimization with the `FSelectorAsync` and `FSelectInstanceAsync*` classes. * feat: Add `max_nfeatures` argument in the `pareto_front()` and `knee_points()` methods of an `EnsembleFSResult()`. diff --git a/R/fselect.R b/R/fselect.R index a492dc55..942716bf 100644 --- a/R/fselect.R +++ b/R/fselect.R @@ -8,7 +8,7 @@ #' It executes the feature selection with the [FSelector] (`fselector`) and returns the result with the feature selection instance (`$result`). #' The [ArchiveBatchFSelect] and [ArchiveAsyncFSelect] (`$archive`) stores all evaluated feature subsets and performance scores. #' -#' You can find an overview of all feature selectors on our [website](https://mlr-org.com/feature-selectors.html). +#' You can find an overview of all feature selectors on our [website](https://mlr-org.com/fselectors.html). #' #' @details #' The [mlr3::Task], [mlr3::Learner], [mlr3::Resampling], [mlr3::Measure] and [bbotk::Terminator] are used to construct a [FSelectInstanceBatchSingleCrit]. diff --git a/README.Rmd b/README.Rmd index 5fa61fb3..c9908bcf 100644 --- a/README.Rmd +++ b/README.Rmd @@ -23,7 +23,6 @@ Package website: [release](https://mlr3fselect.mlr-org.com/) | [dev](https://mlr [![r-cmd-check](https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml) [![CRAN Status](https://www.r-pkg.org/badges/version/mlr3fselect)](https://cran.r-project.org/package=mlr3fselect) -[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3) [![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/) diff --git a/README.md b/README.md index 8b277324..4d4a47f7 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # mlr3fselect -Package website: [release](https://mlr3fselect.mlr-org.com/) | +Package website: [release](https://mlr3fselect.mlr-org.com/) \| [dev](https://mlr3fselect.mlr-org.com/dev/) @@ -9,7 +9,6 @@ Package website: [release](https://mlr3fselect.mlr-org.com/) | [![r-cmd-check](https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml) [![CRAN Status](https://www.r-pkg.org/badges/version/mlr3fselect)](https://cran.r-project.org/package=mlr3fselect) -[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3) [![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/) @@ -29,24 +28,24 @@ The package is built on the optimization framework There are several section about feature selection in the [mlr3book](https://mlr3book.mlr-org.com). - - Getting started with [wrapper feature - selection](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper). - - Do a [sequential forward - selection](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper-example) - Palmer Penguins data set. - - Optimize [multiple performance - measures](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-multicrit-featsel). - - Estimate Model Performance with [nested - resampling](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-autofselect). +- Getting started with [wrapper feature + selection](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper). +- Do a [sequential forward + selection](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper-example) + Palmer Penguins data set. +- Optimize [multiple performance + measures](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-multicrit-featsel). +- Estimate Model Performance with [nested + resampling](https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-autofselect). The [gallery](https://mlr-org.com/gallery.html) features a collection of case studies and demos about optimization. - - Utilize the built-in feature importance of models with [Recursive - Feature - Elimination](https://mlr-org.com/gallery/optimization/2023-02-07-recursive-feature-elimination/). - - Run a feature selection with [Shadow Variable - Search](https://mlr-org.com/gallery/optimization/2023-02-01-shadow-variable-search/). +- Utilize the built-in feature importance of models with [Recursive + Feature + Elimination](https://mlr-org.com/gallery/optimization/2023-02-07-recursive-feature-elimination/). +- Run a feature selection with [Shadow Variable + Search](https://mlr-org.com/gallery/optimization/2023-02-01-shadow-variable-search/). The [cheatsheet](https://cheatsheets.mlr-org.com/mlr3fselect.pdf) summarizes the most important functions of mlr3fselect. @@ -76,16 +75,18 @@ library("mlr3verse") tsk("spam") ``` - ## (4601 x 58): HP Spam Detection - ## * Target: type - ## * Properties: twoclass - ## * Features (57): - ## - dbl (57): address, addresses, all, business, capitalAve, capitalLong, capitalTotal, - ## charDollar, charExclamation, charHash, charRoundbracket, charSemicolon, - ## charSquarebracket, conference, credit, cs, data, direct, edu, email, font, free, - ## george, hp, hpl, internet, lab, labs, mail, make, meeting, money, num000, num1999, - ## num3d, num415, num650, num85, num857, order, original, our, over, parts, people, pm, - ## project, re, receive, remove, report, table, technology, telnet, will, you, your + ## + ## ── (4601x58): HP Spam Detection ────────────────────────────────────────────────────── + ## • Target: type + ## • Target classes: spam (positive class, 39%), nonspam (61%) + ## • Properties: twoclass + ## • Features (57): + ## • dbl (57): address, addresses, all, business, capitalAve, capitalLong, capitalTotal, charDollar, + ## charExclamation, charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, + ## credit, cs, data, direct, edu, email, font, free, george, hp, hpl, internet, lab, labs, mail, + ## make, meeting, money, num000, num1999, num3d, num415, num650, num85, num857, order, original, + ## our, over, parts, people, pm, project, re, receive, remove, report, table, technology, telnet, + ## will, you, your We construct an instance with the `fsi()` function. The instance describes the optimization problem. @@ -103,7 +104,7 @@ instance ## ## * State: Not optimized - ## * Objective: + ## * Objective: ## * Terminator: We select a simple random search as the optimization algorithm. @@ -132,19 +133,19 @@ instance. instance$result_feature_set ``` - ## [1] "address" "addresses" "all" "business" - ## [5] "capitalAve" "capitalLong" "capitalTotal" "charDollar" - ## [9] "charExclamation" "charHash" "charRoundbracket" "charSemicolon" - ## [13] "charSquarebracket" "conference" "credit" "cs" - ## [17] "data" "direct" "edu" "email" - ## [21] "font" "free" "george" "hp" - ## [25] "internet" "lab" "labs" "mail" - ## [29] "make" "meeting" "money" "num000" - ## [33] "num1999" "num3d" "num415" "num650" - ## [37] "num85" "num857" "order" "our" - ## [41] "parts" "people" "pm" "project" - ## [45] "re" "receive" "remove" "report" - ## [49] "table" "technology" "telnet" "will" + ## [1] "address" "addresses" "all" "business" + ## [5] "capitalAve" "capitalLong" "capitalTotal" "charDollar" + ## [9] "charExclamation" "charHash" "charRoundbracket" "charSemicolon" + ## [13] "charSquarebracket" "conference" "credit" "cs" + ## [17] "data" "direct" "edu" "email" + ## [21] "font" "free" "george" "hp" + ## [25] "internet" "lab" "labs" "mail" + ## [29] "make" "meeting" "money" "num000" + ## [33] "num1999" "num3d" "num415" "num650" + ## [37] "num85" "num857" "order" "our" + ## [41] "parts" "people" "pm" "project" + ## [45] "re" "receive" "remove" "report" + ## [49] "table" "technology" "telnet" "will" ## [53] "you" "your" And the corresponding measured performance. @@ -153,7 +154,7 @@ And the corresponding measured performance. instance$result_y ``` - ## classif.ce + ## classif.ce ## 0.07042005 The archive contains all evaluated hyperparameter configurations. @@ -162,19 +163,19 @@ The archive contains all evaluated hyperparameter configurations. as.data.table(instance$archive) ``` - ## address addresses all business capitalAve capitalLong capitalTotal charDollar charExclamation - ## 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE - ## 2: TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE - ## 3: TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE - ## 4: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE - ## 5: FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE - ## --- - ## 16: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE - ## 17: FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE - ## 18: FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE - ## 19: TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE - ## 20: TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE - ## 56 variables not shown: [charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, credit, cs, data, direct, edu, ...] + ## address addresses all business capitalAve capitalLong capitalTotal charDollar + ## 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE + ## 2: TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE + ## 3: TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE + ## 4: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE + ## 5: FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE + ## --- + ## 16: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE + ## 17: FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE + ## 18: FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE + ## 19: TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE + ## 20: TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE + ## 58 variables not shown: [charExclamation, charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, credit, cs, data, direct, ...] We fit a final model with the optimized feature set to make predictions on new data. diff --git a/man/fselect.Rd b/man/fselect.Rd index 42365f78..f68e82fa 100644 --- a/man/fselect.Rd +++ b/man/fselect.Rd @@ -84,7 +84,7 @@ The function internally creates a \link{FSelectInstanceBatchSingleCrit} or \link It executes the feature selection with the \link{FSelector} (\code{fselector}) and returns the result with the feature selection instance (\verb{$result}). The \link{ArchiveBatchFSelect} and \link{ArchiveAsyncFSelect} (\verb{$archive}) stores all evaluated feature subsets and performance scores. -You can find an overview of all feature selectors on our \href{https://mlr-org.com/feature-selectors.html}{website}. +You can find an overview of all feature selectors on our \href{https://mlr-org.com/fselectors.html}{website}. } \details{ The \link[mlr3:Task]{mlr3::Task}, \link[mlr3:Learner]{mlr3::Learner}, \link[mlr3:Resampling]{mlr3::Resampling}, \link[mlr3:Measure]{mlr3::Measure} and \link[bbotk:Terminator]{bbotk::Terminator} are used to construct a \link{FSelectInstanceBatchSingleCrit}. diff --git a/tests/testthat/test_FSelectorDesignPoints.R b/tests/testthat/test_FSelectorBatchDesignPoints.R similarity index 100% rename from tests/testthat/test_FSelectorDesignPoints.R rename to tests/testthat/test_FSelectorBatchDesignPoints.R