1
1
2
2
# mlr3fselect <img src =" man/figures/logo.png " align =" right " width = " 120 " />
3
3
4
- Package website: [ release] ( https://mlr3fselect.mlr-org.com/ ) |
4
+ Package website: [ release] ( https://mlr3fselect.mlr-org.com/ ) \ |
5
5
[ dev] ( https://mlr3fselect.mlr-org.com/dev/ )
6
6
7
7
<!-- badges: start -->
8
8
9
9
[ ![ r-cmd-check] ( https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml/badge.svg )] ( https://github.com/mlr-org/mlr3fselect/actions/workflows/r-cmd-check.yml )
10
10
[ ![ CRAN
11
11
Status] ( https://www.r-pkg.org/badges/version/mlr3fselect )] ( https://cran.r-project.org/package=mlr3fselect )
12
- [ ![ StackOverflow] ( https://img.shields.io/badge/stackoverflow-mlr3-orange.svg )] ( https://stackoverflow.com/questions/tagged/mlr3 )
13
12
[ ![ Mattermost] ( https://img.shields.io/badge/chat-mattermost-orange.svg )] ( https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/ )
14
13
<!-- badges: end -->
15
14
@@ -29,24 +28,24 @@ The package is built on the optimization framework
29
28
There are several section about feature selection in the
30
29
[ mlr3book] ( https://mlr3book.mlr-org.com ) .
31
30
32
- - Getting started with [ wrapper feature
33
- selection] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper ) .
34
- - Do a [ sequential forward
35
- selection] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper-example )
36
- Palmer Penguins data set.
37
- - Optimize [ multiple performance
38
- measures] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-multicrit-featsel ) .
39
- - Estimate Model Performance with [ nested
40
- resampling] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-autofselect ) .
31
+ - Getting started with [ wrapper feature
32
+ selection] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper ) .
33
+ - Do a [ sequential forward
34
+ selection] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-fs-wrapper-example )
35
+ Palmer Penguins data set.
36
+ - Optimize [ multiple performance
37
+ measures] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-multicrit-featsel ) .
38
+ - Estimate Model Performance with [ nested
39
+ resampling] ( https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html#sec-autofselect ) .
41
40
42
41
The [ gallery] ( https://mlr-org.com/gallery.html ) features a collection of
43
42
case studies and demos about optimization.
44
43
45
- - Utilize the built-in feature importance of models with [ Recursive
46
- Feature
47
- Elimination] ( https://mlr-org.com/gallery/optimization/2023-02-07-recursive-feature-elimination/ ) .
48
- - Run a feature selection with [ Shadow Variable
49
- Search] ( https://mlr-org.com/gallery/optimization/2023-02-01-shadow-variable-search/ ) .
44
+ - Utilize the built-in feature importance of models with [ Recursive
45
+ Feature
46
+ Elimination] ( https://mlr-org.com/gallery/optimization/2023-02-07-recursive-feature-elimination/ ) .
47
+ - Run a feature selection with [ Shadow Variable
48
+ Search] ( https://mlr-org.com/gallery/optimization/2023-02-01-shadow-variable-search/ ) .
50
49
51
50
The [ cheatsheet] ( https://cheatsheets.mlr-org.com/mlr3fselect.pdf )
52
51
summarizes the most important functions of mlr3fselect.
@@ -76,16 +75,18 @@ library("mlr3verse")
76
75
tsk(" spam" )
77
76
```
78
77
79
- ## <TaskClassif:spam> (4601 x 58): HP Spam Detection
80
- ## * Target: type
81
- ## * Properties: twoclass
82
- ## * Features (57):
83
- ## - dbl (57): address, addresses, all, business, capitalAve, capitalLong, capitalTotal,
84
- ## charDollar, charExclamation, charHash, charRoundbracket, charSemicolon,
85
- ## charSquarebracket, conference, credit, cs, data, direct, edu, email, font, free,
86
- ## george, hp, hpl, internet, lab, labs, mail, make, meeting, money, num000, num1999,
87
- ## num3d, num415, num650, num85, num857, order, original, our, over, parts, people, pm,
88
- ## project, re, receive, remove, report, table, technology, telnet, will, you, your
78
+ ##
79
+ ## ── <TaskClassif> (4601x58): HP Spam Detection ──────────────────────────────────────────────────────
80
+ ## • Target: type
81
+ ## • Target classes: spam (positive class, 39%), nonspam (61%)
82
+ ## • Properties: twoclass
83
+ ## • Features (57):
84
+ ## • dbl (57): address, addresses, all, business, capitalAve, capitalLong, capitalTotal, charDollar,
85
+ ## charExclamation, charHash, charRoundbracket, charSemicolon, charSquarebracket, conference,
86
+ ## credit, cs, data, direct, edu, email, font, free, george, hp, hpl, internet, lab, labs, mail,
87
+ ## make, meeting, money, num000, num1999, num3d, num415, num650, num85, num857, order, original,
88
+ ## our, over, parts, people, pm, project, re, receive, remove, report, table, technology, telnet,
89
+ ## will, you, your
89
90
90
91
We construct an instance with the ` fsi() ` function. The instance
91
92
describes the optimization problem.
@@ -103,7 +104,7 @@ instance
103
104
104
105
## <FSelectInstanceBatchSingleCrit>
105
106
## * State: Not optimized
106
- ## * Objective: <ObjectiveFSelect :classif.svm_on_spam>
107
+ ## * Objective: <ObjectiveFSelectBatch :classif.svm_on_spam>
107
108
## * Terminator: <TerminatorEvals>
108
109
109
110
We select a simple random search as the optimization algorithm.
@@ -132,19 +133,19 @@ instance.
132
133
instance $ result_feature_set
133
134
```
134
135
135
- ## [1] "address" "addresses" "all" "business"
136
- ## [5] "capitalAve" "capitalLong" "capitalTotal" "charDollar"
137
- ## [9] "charExclamation" "charHash" "charRoundbracket" "charSemicolon"
138
- ## [13] "charSquarebracket" "conference" "credit" "cs"
139
- ## [17] "data" "direct" "edu" "email"
140
- ## [21] "font" "free" "george" "hp"
141
- ## [25] "internet" "lab" "labs" "mail"
142
- ## [29] "make" "meeting" "money" "num000"
143
- ## [33] "num1999" "num3d" "num415" "num650"
144
- ## [37] "num85" "num857" "order" "our"
145
- ## [41] "parts" "people" "pm" "project"
146
- ## [45] "re" "receive" "remove" "report"
147
- ## [49] "table" "technology" "telnet" "will"
136
+ ## [1] "address" "addresses" "all" "business"
137
+ ## [5] "capitalAve" "capitalLong" "capitalTotal" "charDollar"
138
+ ## [9] "charExclamation" "charHash" "charRoundbracket" "charSemicolon"
139
+ ## [13] "charSquarebracket" "conference" "credit" "cs"
140
+ ## [17] "data" "direct" "edu" "email"
141
+ ## [21] "font" "free" "george" "hp"
142
+ ## [25] "internet" "lab" "labs" "mail"
143
+ ## [29] "make" "meeting" "money" "num000"
144
+ ## [33] "num1999" "num3d" "num415" "num650"
145
+ ## [37] "num85" "num857" "order" "our"
146
+ ## [41] "parts" "people" "pm" "project"
147
+ ## [45] "re" "receive" "remove" "report"
148
+ ## [49] "table" "technology" "telnet" "will"
148
149
## [53] "you" "your"
149
150
150
151
And the corresponding measured performance.
@@ -153,7 +154,7 @@ And the corresponding measured performance.
153
154
instance $ result_y
154
155
```
155
156
156
- ## classif.ce
157
+ ## classif.ce
157
158
## 0.07042005
158
159
159
160
The archive contains all evaluated hyperparameter configurations.
@@ -162,19 +163,19 @@ The archive contains all evaluated hyperparameter configurations.
162
163
as.data.table(instance $ archive )
163
164
```
164
165
165
- ## address addresses all business capitalAve capitalLong capitalTotal charDollar charExclamation
166
- ## 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
167
- ## 2: TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
168
- ## 3: TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
169
- ## 4: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
170
- ## 5: FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
171
- ## ---
172
- ## 16: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
173
- ## 17: FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
174
- ## 18: FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
175
- ## 19: TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
176
- ## 20: TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE
177
- ## 56 variables not shown: [charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, credit, cs, data, direct, edu , ...]
166
+ ## address addresses all business capitalAve capitalLong capitalTotal charDollar
167
+ ## 1: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
168
+ ## 2: TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE
169
+ ## 3: TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
170
+ ## 4: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
171
+ ## 5: FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
172
+ ## ---
173
+ ## 16: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
174
+ ## 17: FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
175
+ ## 18: FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE
176
+ ## 19: TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
177
+ ## 20: TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
178
+ ## 58 variables not shown: [charExclamation, charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, credit, cs, data, direct, ...]
178
179
179
180
We fit a final model with the optimized feature set to make predictions
180
181
on new data.
0 commit comments