Skip to content

Commit 8b4d6b3

Browse files
committed
minor fix and sync exercise
1 parent f188d4c commit 8b4d6b3

File tree

2 files changed

+15
-11
lines changed

2 files changed

+15
-11
lines changed

day_1/4E_Feature Selection.ipynb

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -174,9 +174,9 @@
174174
"\n",
175175
"- `SelectKBest` removes all but the highest scoring features\n",
176176
"\n",
177-
"- `SelectPercentile` removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate `SelectFpr`, false discovery rate `SelectFdr`, or family wise error `SelectFwe`.\n",
177+
"- `SelectPercentile` removes all but a user-specified highest scoring percentage of features \n",
178178
"\n",
179-
"- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.\n",
179+
"- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy\n",
180180
"\n",
181181
"These objects take as input a scoring function that returns univariate scores and p-values (or only scores for `SelectKBest` and `SelectPercentile`):\n",
182182
"\n",
@@ -210,15 +210,17 @@
210210
"id": "7e76a9cc",
211211
"metadata": {},
212212
"source": [
213-
"### Pearson Correlation Coefficient\n",
213+
"### Correlation Coefficient\n",
214214
"Correlation is a measure of the linear relationship of 2 or more variables. We would assume that the **good variables** are **highly correlated** with the target. Also, sometimes we would want to remove either one of the two variables that are highly correlated. \n",
215215
"<br><br>\n",
216216
"<div align=\"center\">\n",
217217
" <img alt=\"Several sets of (x, y) points, with the correlation coefficient of x and y for each set.\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Correlation_examples2.svg/1920px-Correlation_examples2.svg.png\" width=\"400\" height=\"200\"><br>\n",
218218
" <sup>Sample datasets and their pearson correlation coefficients.<sup>\n",
219219
"</div>\n",
220220
" \n",
221-
"We will show an example that drop the variable which has a lower correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables."
221+
"We will show an example that drop the variable which has a lower *Pearson* correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables.\n",
222+
" \n",
223+
"You may use other correlation coefficient such as *Spearman* or *Kendall*."
222224
]
223225
},
224226
{
@@ -248,7 +250,9 @@
248250
"metadata": {},
249251
"source": [
250252
"### Variance Threshold\n",
251-
"`VarianceThreshold` is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples."
253+
"`VarianceThreshold` is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples.\n",
254+
"\n",
255+
"The estimator only works with numeric data and it will raise an error if there are categorical features present in the dataframe."
252256
]
253257
},
254258
{
@@ -365,7 +369,7 @@
365369
"metadata": {},
366370
"source": [
367371
"### Exhaustive Feature Selection\n",
368-
"This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also take longer time."
372+
"This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also takes longer time."
369373
]
370374
},
371375
{

day_1/4S_Feature Selection.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -340,9 +340,9 @@
340340
"\n",
341341
"- `SelectKBest` removes all but the highest scoring features\n",
342342
"\n",
343-
"- `SelectPercentile` removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate `SelectFpr`, false discovery rate `SelectFdr`, or family wise error `SelectFwe`.\n",
343+
"- `SelectPercentile` removes all but a user-specified highest scoring percentage of features \n",
344344
"\n",
345-
"- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.\n",
345+
"- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy\n",
346346
"\n",
347347
"These objects take as input a scoring function that returns univariate scores and p-values (or only scores for `SelectKBest` and `SelectPercentile`):\n",
348348
"\n",
@@ -409,9 +409,9 @@
409409
" <sup>Sample datasets and their pearson correlation coefficients.<sup>\n",
410410
"</div>\n",
411411
" \n",
412-
"We will show an example that drop the variable which has a lower Pearson correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables.\n",
412+
"We will show an example that drop the variable which has a lower *Pearson* correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables.\n",
413413
" \n",
414-
"You may use other correlation coefficient such as Spearman or Kendall."
414+
"You may use other correlation coefficient such as *Spearman* or *Kendall*."
415415
]
416416
},
417417
{
@@ -655,7 +655,7 @@
655655
"metadata": {},
656656
"source": [
657657
"### Exhaustive Feature Selection\n",
658-
"This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also take longer time."
658+
"This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also takes longer time."
659659
]
660660
},
661661
{

0 commit comments

Comments
 (0)