Merge pull request #1049 from PowerGridModel/feature/split-validation-fails-xfail-raises

mgovers · web-flow · commit 098ad7734f62 · 2025-07-25T08:11:51.000Z
Validation cases: split validation case `fail` into `xfail` and `raises`
diff --git a/docs/examples/Make Test Dataset.ipynb b/docs/examples/Make Test Dataset.ipynb
@@ -87,7 +87,9 @@
     "\n",
     "You need to specify the method to use for the calculation, the relative and absolute tolerance to compare the calculation results with the reference results. For `rtol` you always give one number. For `atol` you can also give one number, or you can give a dictionary with regular expressions to match the attribute names. In this way you can have fine control of individual tolerance for each attribut (e.g. active/reactive power). In the example it has an absolute tolerance of `1e-4` for attributes which ends with `_residual` and `1e-8` for everything else.\n",
     "\n",
-    "The `calculation_method` can be one string or list of strings. In the latter case, the test program will run the validation test mutilple times using all the specified methods."
+    "The `calculation_method` can be one string or list of strings. In the latter case, the test program will run the validation test mutilple times using all the specified methods.\n",
+    "\n",
+    "See [below](#detailed-configuration-with-the-paramsjson) for details."
    ]
   },
   {
@@ -258,9 +260,9 @@
       "      {\"id\": 6, \"u_rated\": 10500}\n",
       "    ],\n",
       "    \"line\": [\n",
-      "      {\"id\": 3, \"from_node\": 1, \"to_node\": 2, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.2, \"c1\": 1e-05, \"tan1\": 0, \"i_n\": 1000},\n",
-      "      {\"id\": 5, \"from_node\": 2, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.2, \"c1\": 1e-05, \"tan1\": 0, \"i_n\": 1000},\n",
-      "      {\"id\": 8, \"from_node\": 1, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.2, \"c1\": 1e-05, \"tan1\": 0, \"i_n\": 1000}\n",
+      "      {\"id\": 3, \"from_node\": 1, \"to_node\": 2, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.20000000000000001, \"c1\": 1.0000000000000001e-05, \"tan1\": 0, \"i_n\": 1000},\n",
+      "      {\"id\": 5, \"from_node\": 2, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.20000000000000001, \"c1\": 1.0000000000000001e-05, \"tan1\": 0, \"i_n\": 1000},\n",
+      "      {\"id\": 8, \"from_node\": 1, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.20000000000000001, \"c1\": 1.0000000000000001e-05, \"tan1\": 0, \"i_n\": 1000}\n",
       "    ],\n",
       "    \"sym_load\": [\n",
       "      {\"id\": 4, \"node\": 2, \"status\": 1, \"type\": 0, \"p_specified\": 20000000, \"q_specified\": 5000000},\n",
@@ -422,6 +424,95 @@
     "\n",
     "print(batch_result[ComponentType.sym_load][\"p\"])"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c018413",
+   "metadata": {},
+   "source": [
+    "## Detailed configuration with the params.json\n",
+    "\n",
+    "### Validation cases with exceptions\n",
+    "\n",
+    "In certain cases, you may want to create test cases that either \"should raise an exception\" and/or \"currently have a behavior that differs from the intended one\". Examples are:\n",
+    "\n",
+    "* To create a test case for a known [explicitly forbidden input](../advanced_documentation/terminology.md#bad-input), such as an unobservable grid.\n",
+    "  * In this case, the validation case _should raise an exception_.\n",
+    "* To create a repro case for a bug.\n",
+    "  * In this case, the validation case _currently has behavior that differs from the intended behavior_.\n",
+    "* To create a test for behavior that is not implemented yet, like a new calculation method.\n",
+    "  * In this case, the validation case _currently has behavior that differs from the intended behavior_.\n",
+    "* To create a behavioral test for bad input when applying BDD (behavior driven development) practices.\n",
+    "  * In this case, the validation case both _should raise an exception_ and _currently has behavior that differs from the intended behavior_.\n",
+    "\n",
+    "To support the two use cases, two additional keywords are exposed: `raises` (to denote that raising is intended behavior) and `xfail` (to denote that the current behavior differs from the intended behavior). The values are dicts that contain a `raises` phrase and a `reason` to denote the exact intended/expected exception type, as well as a human-readable explanation about why that exception is intended/expected. Only exceptions known to the PGM tests can be added to the configuration. Note that the list of known exceptions is non-exhaustive, so it may be necessary to add a new exception type to the list of known exceptions.\n",
+    "\n",
+    "In addition to the PGM exception types and some Python built-in exception types, there are two types worth explicitly mentioning:\n",
+    "\n",
+    "* `AssertionError` can be used to denote tests in which the actual values are different from the expected values.\n",
+    "* `Failed` can be used to denote tests that should raise (`raises`) but actually do not raise any exceptions at all (`xfail`).\n",
+    "  * **NOTE:** If `_pytest.outcomes.Failed` is not found, there is a fall-back to the default [`pytest.mark.xfail`](https://docs.pytest.org/en/stable/reference/reference.html#pytest-mark-xfail) implementation. That means, that a test is marked `xfail` if it throws any exception that is not the intended exception, not `fail`. This is a known limitation of the current implementation.\n",
+    "\n",
+    "The following example shows how a test can be created intended to test future behavior.\n",
+    "\n",
+    "```json\n",
+    "{\n",
+    "  \"calculation_method\": \"iterative_linear\",\n",
+    "  \"rtol\": 1e-8,\n",
+    "  \"atol\": {\n",
+    "    \"default\": 1e-8,\n",
+    "    \".+_residual\": 1e-4\n",
+    "  },\n",
+    "  \"raises\": {\n",
+    "    \"raises\": \"NotObservableError\",\n",
+    "    \"reason\": \"The grid does not contain a voltage sensor\"\n",
+    "  },\n",
+    "  \"xfail\": {\n",
+    "    \"raises\": \"SparseMatrixError\",\n",
+    "    \"reason\": \"A sufficient observability check for meshed grids is not yet implemented. See also https://github.com/PowerGridModel/power-grid-model/issues/864.\"\n",
+    "  }\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "By default, the Power Grid Model test configuration accepts `XFAIL` cases - because they represent known issues - but rejects `XPASS` cases, i.e., the actual behavior was as intended, even though we expected it to be different. `XPASS` cases can happen because a known issue was fixed, or it can be the result of a newly introduced bug. It is up to the developer to resolve the conflict. Providing a good reason when marking something `xfail` can help with this decision. See https://docs.pytest.org/en/stable/reference/reference.html#pytest-mark-xfail for details.\n",
+    "\n",
+    "The following table shows how the different configurations can be used. Exception types `AError`, `BError` and `CError` denote exceptions known to the PGM tests.\n",
+    "\n",
+    "| Intended behavior | Expected behavior | Keywords                                                          |              Actual behavior: pass              |           Actually raises `AError`            |                                                         Actually raises `BError`                                                          |                                                         actually raises `CError`                                                          |\n",
+    "| ----------------- | ----------------- | ----------------------------------------------------------------- | :---------------------------------------------: | :-------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: |\n",
+    "| pass              | pass              | `{}`                                                              | <span style=\"color : green\">**.** (PASS)</span> |  <span style=\"color : red\">F (AError)</span>  |                                                <span style=\"color : red\">F (BError)</span>                                                |                                                <span style=\"color : red\">F (CError)</span>                                                |\n",
+    "| pass              | raises `AError`   | `{\"xfail\": {\"raises\": \"AError\"}}`                                 |   <span style=\"color : red\">F (XPASS)</span>    | <span style=\"color : orange\">x (XFAIL)</span> |                                                   <span style=\"color : red\">F (BError)                                                    |                                                <span style=\"color : red\">F (CError)</span>                                                |\n",
+    "| raises `AError`   | pass              | `{\"raises\": {\"raises\": \"AError\"}, \"xfail\": {\"raises\": \"Failed\"}}` | <span style=\"color : orange\">x (XFAIL) </span>  |  <span style=\"color : red\">F (XPASS)</span>   | <span style=\"color : red\">F (BError)</span> if `_pytest.outcomes.Failed` is available, else <span style=\"color : orange\">x (XFAIL)</span> | <span style=\"color : red\">F (CError)</span> if `_pytest.outcomes.Failed` is available, else <span style=\"color : orange\">x (XFAIL)</span> |\n",
+    "| raises `AError`   | raises `AError`   | `{\"raises\": {\"raises\": \"AError\"}}`                                |   <span style=\"color : red\">F (Failed)</span>   |  <span style=\"color : green\">. (PASS)</span>  |                                                   <span style=\"color : red\">F (BError)                                                    |                                                   <span style=\"color : red\">F (CError)                                                    |\n",
+    "| raises `AError`   | raises `BError`   | `{\"raises\": {\"raises\": \"AError\"}, \"xfail\": {\"raises\": \"BError\"}}` |   <span style=\"color : red\">F (Failed)</span>   |  <span style=\"color : red\">F (XPASS)</span>   |                                                  <span style=\"color : orange\">x (XFAIL)                                                   |                                                <span style=\"color : red\">F (CError)</span>                                                |\n",
+    "\n",
+    "### Calculation method-specific configuration\n",
+    "\n",
+    "In rare cases, for instance when creating a new calculation method, calculation method-specific configuration may be necessary. This can be done via the `extra_params` keyword, which, for each overloaded calculation method, contains objects similar to the root `params.json`, but that is applied as a patch for that specific calculation method run. The following example illustrates that.\n",
+    "\n",
+    "```json\n",
+    "{\n",
+    "  \"calculation_method\": [\n",
+    "    \"iterative_linear\",\n",
+    "    \"newton_raphson\"\n",
+    "  ],\n",
+    "  \"rtol\": 1e-8,\n",
+    "  \"atol\": {\n",
+    "    \"default\": 1e-8,\n",
+    "    \".+_residual\": 5e-4\n",
+    "  },\n",
+    "  \"extra_params\": {\n",
+    "    \"newton_raphson\": {\n",
+    "      \"experimental_features\": \"enabled\",\n",
+    "      \"xfail\": {\n",
+    "        \"raises\": \"SparseMatrixError\",\n",
+    "        \"reason\": \"Current sensors are not yet implemented for this calculation method\"\n",
+    "      }\n",
+    "    }\n",
+    "  }\n",
+    "}\n",
+    "```"
+   ]
   }
  ],
  "metadata": {
@@ -440,7 +531,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.0"
+   "version": "3.13.5"
   }
  },
  "nbformat": 4,
diff --git a/tests/cpp_validation_tests/test_validation.cpp b/tests/cpp_validation_tests/test_validation.cpp
@@ -388,7 +388,7 @@ std::optional<CaseParam> construct_case(std::filesystem::path const& case_dir, j
         }
     }
 
-    param.fail = calculation_method_params.contains("fail");
+    param.fail = calculation_method_params.contains("xfail") || calculation_method_params.contains("raises");
     if (calculation_type == "short_circuit") {
         calculation_method_params.at("short_circuit_voltage_scaling").get_to(param.short_circuit_voltage_scaling);
     }
diff --git a/tests/data/power_flow/automatic-tap-regulator/auto-tap-changer-meshed-any-max-iter/params.json b/tests/data/power_flow/automatic-tap-regulator/auto-tap-changer-meshed-any-max-iter/params.json
@@ -3,7 +3,7 @@
   "tap_changing_strategy": "any_valid_tap",
   "rtol": 1e-05,
   "atol": 1e-05,
-  "fail": {
+  "raises": {
     "raises": "MaxIterationReached",
     "reason": "TapPositionOptimizer::iterate"
   }
diff --git a/tests/data/power_flow/non-existent-id-update-batch/params.json b/tests/data/power_flow/non-existent-id-update-batch/params.json
@@ -2,7 +2,7 @@
   "calculation_method": "linear",
   "rtol": 1e-5,
   "atol": 1e-5,
-  "fail": {
+  "raises": {
     "raises": "PowerGridError",
     "reason": "Same invalid ID in update data set"
   }
diff --git a/tests/data/state_estimation/current-sensor/global-current-sensor/params.json b/tests/data/state_estimation/current-sensor/global-current-sensor/params.json
@@ -11,7 +11,7 @@
   "extra_params": {
     "newton_raphson": {
       "experimental_features": "enabled",
-      "fail": {
+      "xfail": {
         "raises": "SparseMatrixError",
         "reason": "Current sensors are not yet implemented for this calculation method"
       }
diff --git a/tests/data/state_estimation/ill-conditioned-system/leaf-without-power-sensor-meshed/params.json b/tests/data/state_estimation/ill-conditioned-system/leaf-without-power-sensor-meshed/params.json
@@ -5,7 +5,7 @@
     "default": 1e-8,
     ".+_residual": 5e-4
   },
-  "fail": {
+  "xfail": {
     "raises": "SparseMatrixError",
     "reason": "Bug in Sparse LU solver found in #864"
   }
diff --git a/tests/data/state_estimation/not-independent-sensor/not-independent-sensor-with-phasor-sensors/params.json b/tests/data/state_estimation/not-independent-sensor/not-independent-sensor-with-phasor-sensors/params.json
@@ -5,7 +5,7 @@
     "default": 1e-8,
     ".+_residual": 5e-4
   },
-  "fail": {
+  "raises": {
     "raises": "NotObservableError",
     "reason": "Voltage phasor, power and current sensor's measurements are not independent"
   }
diff --git a/tests/data/state_estimation/not-independent-sensor/not-independent-sensor/params.json b/tests/data/state_estimation/not-independent-sensor/not-independent-sensor/params.json
@@ -5,7 +5,7 @@
     "default": 1e-8,
     ".+_residual": 5e-4
   },
-  "fail": {
+  "raises": {
     "raises": "NotObservableError",
     "reason": "Power measurements are not independent"
   }
diff --git a/tests/unit/test_0Z_model_validation.py b/tests/unit/test_0Z_model_validation.py
diff --git a/tests/unit/utils.py b/tests/unit/utils.py

Original file line number	Diff line number	Diff line change
`@@ -388,7 +388,7 @@ std::optional<CaseParam> construct_case(std::filesystem::path const& case_dir, j`
`388`	`388`	`}`
`389`	`389`	`}`
`390`	`390`
`391`		`- param.fail = calculation_method_params.contains("fail");`
	`391`	`+ param.fail = calculation_method_params.contains("xfail") \|\| calculation_method_params.contains("raises");`
`392`	`392`	`if (calculation_type == "short_circuit") {`
`393`	`393`	`calculation_method_params.at("short_circuit_voltage_scaling").get_to(param.short_circuit_voltage_scaling);`
`394`	`394`	`}`
Original file line number	Diff line number	Diff line change
`@@ -3,7 +3,7 @@`
`3`	`3`	`"tap_changing_strategy": "any_valid_tap",`
`4`	`4`	`"rtol": 1e-05,`
`5`	`5`	`"atol": 1e-05,`
`6`		`- "fail": {`
	`6`	`+ "raises": {`
`7`	`7`	`"raises": "MaxIterationReached",`
`8`	`8`	`"reason": "TapPositionOptimizer::iterate"`
`9`	`9`	`}`
Original file line number	Diff line number	Diff line change
`@@ -2,7 +2,7 @@`
`2`	`2`	`"calculation_method": "linear",`
`3`	`3`	`"rtol": 1e-5,`
`4`	`4`	`"atol": 1e-5,`
`5`		`- "fail": {`
	`5`	`+ "raises": {`
`6`	`6`	`"raises": "PowerGridError",`
`7`	`7`	`"reason": "Same invalid ID in update data set"`
`8`	`8`	`}`
Original file line number	Diff line number	Diff line change
`@@ -11,7 +11,7 @@`
`11`	`11`	`"extra_params": {`
`12`	`12`	`"newton_raphson": {`
`13`	`13`	`"experimental_features": "enabled",`
`14`		`- "fail": {`
	`14`	`+ "xfail": {`
`15`	`15`	`"raises": "SparseMatrixError",`
`16`	`16`	`"reason": "Current sensors are not yet implemented for this calculation method"`
`17`	`17`	`}`
Original file line number	Diff line number	Diff line change
`@@ -5,7 +5,7 @@`
`5`	`5`	`"default": 1e-8,`
`6`	`6`	`".+_residual": 5e-4`
`7`	`7`	`},`
`8`		`- "fail": {`
	`8`	`+ "xfail": {`
`9`	`9`	`"raises": "SparseMatrixError",`
`10`	`10`	`"reason": "Bug in Sparse LU solver found in #864"`
`11`	`11`	`}`