minor fix and sync exercise

annabelle1217 · annabelle1217 · commit 8b4d6b31ea42 · 2021-10-06T18:59:31.000+08:00
diff --git a/day_1/4E_Feature Selection.ipynb b/day_1/4E_Feature Selection.ipynb
@@ -174,9 +174,9 @@
     "\n",
     "- `SelectKBest` removes all but the  highest scoring features\n",
     "\n",
-    "- `SelectPercentile` removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate `SelectFpr`, false discovery rate `SelectFdr`, or family wise error `SelectFwe`.\n",
+    "- `SelectPercentile` removes all but a user-specified highest scoring percentage of features \n",
     "\n",
-    "- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.\n",
+    "- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy\n",
     "\n",
     "These objects take as input a scoring function that returns univariate scores and p-values (or only scores for `SelectKBest` and `SelectPercentile`):\n",
     "\n",
@@ -210,15 +210,17 @@
    "id": "7e76a9cc",
    "metadata": {},
    "source": [
-    "### Pearson Correlation Coefficient\n",
+    "### Correlation Coefficient\n",
     "Correlation is a measure of the linear relationship of 2 or more variables. We would assume that the **good variables** are **highly correlated** with the target. Also, sometimes we would want to remove either one of the two variables that are highly correlated. \n",
     "<br><br>\n",
     "<div align=\"center\">\n",
     "  <img alt=\"Several sets of (x, y) points, with the correlation coefficient of x and y for each set.\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Correlation_examples2.svg/1920px-Correlation_examples2.svg.png\" width=\"400\" height=\"200\"><br>\n",
     "  <sup>Sample datasets and their pearson correlation coefficients.<sup>\n",
     "</div>\n",
     "      \n",
-    "We will show an example that drop the variable which has a lower correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables."
+    "We will show an example that drop the variable which has a lower *Pearson* correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables.\n",
+    "      \n",
+    "You may use other correlation coefficient such as *Spearman* or *Kendall*."
    ]
   },
   {
@@ -248,7 +250,9 @@
    "metadata": {},
    "source": [
     "### Variance Threshold\n",
-    "`VarianceThreshold` is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples."
+    "`VarianceThreshold` is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples.\n",
+    "\n",
+    "The estimator only works with numeric data and it will raise an error if there are categorical features present in the dataframe."
    ]
   },
   {
@@ -365,7 +369,7 @@
    "metadata": {},
    "source": [
     "### Exhaustive Feature Selection\n",
-    "This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also take longer time."
+    "This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also takes longer time."
    ]
   },
   {
diff --git a/day_1/4S_Feature Selection.ipynb b/day_1/4S_Feature Selection.ipynb
@@ -340,9 +340,9 @@
     "\n",
     "- `SelectKBest` removes all but the  highest scoring features\n",
     "\n",
-    "- `SelectPercentile` removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate `SelectFpr`, false discovery rate `SelectFdr`, or family wise error `SelectFwe`.\n",
+    "- `SelectPercentile` removes all but a user-specified highest scoring percentage of features \n",
     "\n",
-    "- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.\n",
+    "- `GenericUnivariateSelect` allows to perform univariate feature selection with a configurable strategy\n",
     "\n",
     "These objects take as input a scoring function that returns univariate scores and p-values (or only scores for `SelectKBest` and `SelectPercentile`):\n",
     "\n",
@@ -409,9 +409,9 @@
     "  <sup>Sample datasets and their pearson correlation coefficients.<sup>\n",
     "</div>\n",
     "      \n",
-    "We will show an example that drop the variable which has a lower Pearson correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables.\n",
+    "We will show an example that drop the variable which has a lower *Pearson* correlation coefficient value with the target variable. We need to set an absolute value, for example, 0.4 as the threshold for selecting the variables.\n",
     "      \n",
-    "You may use other correlation coefficient such as Spearman or Kendall."
+    "You may use other correlation coefficient such as *Spearman* or *Kendall*."
    ]
   },
   {
@@ -655,7 +655,7 @@
    "metadata": {},
    "source": [
     "### Exhaustive Feature Selection\n",
-    "This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also take longer time."
+    "This is a brute-force evaluation of each feature subset. It tries every possible combination of the variables and returns the best performing subset but also takes longer time."
    ]
   },
   {