Skip to content

Commit 098ad77

Browse files
authored
Merge pull request #1049 from PowerGridModel/feature/split-validation-fails-xfail-raises
Validation cases: split validation case `fail` into `xfail` and `raises`
2 parents 13298b8 + 571bb52 commit 098ad77

File tree

10 files changed

+224
-89
lines changed

10 files changed

+224
-89
lines changed

docs/examples/Make Test Dataset.ipynb

Lines changed: 96 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,9 @@
8787
"\n",
8888
"You need to specify the method to use for the calculation, the relative and absolute tolerance to compare the calculation results with the reference results. For `rtol` you always give one number. For `atol` you can also give one number, or you can give a dictionary with regular expressions to match the attribute names. In this way you can have fine control of individual tolerance for each attribut (e.g. active/reactive power). In the example it has an absolute tolerance of `1e-4` for attributes which ends with `_residual` and `1e-8` for everything else.\n",
8989
"\n",
90-
"The `calculation_method` can be one string or list of strings. In the latter case, the test program will run the validation test mutilple times using all the specified methods."
90+
"The `calculation_method` can be one string or list of strings. In the latter case, the test program will run the validation test mutilple times using all the specified methods.\n",
91+
"\n",
92+
"See [below](#detailed-configuration-with-the-paramsjson) for details."
9193
]
9294
},
9395
{
@@ -258,9 +260,9 @@
258260
" {\"id\": 6, \"u_rated\": 10500}\n",
259261
" ],\n",
260262
" \"line\": [\n",
261-
" {\"id\": 3, \"from_node\": 1, \"to_node\": 2, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.2, \"c1\": 1e-05, \"tan1\": 0, \"i_n\": 1000},\n",
262-
" {\"id\": 5, \"from_node\": 2, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.2, \"c1\": 1e-05, \"tan1\": 0, \"i_n\": 1000},\n",
263-
" {\"id\": 8, \"from_node\": 1, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.2, \"c1\": 1e-05, \"tan1\": 0, \"i_n\": 1000}\n",
263+
" {\"id\": 3, \"from_node\": 1, \"to_node\": 2, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.20000000000000001, \"c1\": 1.0000000000000001e-05, \"tan1\": 0, \"i_n\": 1000},\n",
264+
" {\"id\": 5, \"from_node\": 2, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.20000000000000001, \"c1\": 1.0000000000000001e-05, \"tan1\": 0, \"i_n\": 1000},\n",
265+
" {\"id\": 8, \"from_node\": 1, \"to_node\": 6, \"from_status\": 1, \"to_status\": 1, \"r1\": 0.25, \"x1\": 0.20000000000000001, \"c1\": 1.0000000000000001e-05, \"tan1\": 0, \"i_n\": 1000}\n",
264266
" ],\n",
265267
" \"sym_load\": [\n",
266268
" {\"id\": 4, \"node\": 2, \"status\": 1, \"type\": 0, \"p_specified\": 20000000, \"q_specified\": 5000000},\n",
@@ -422,6 +424,95 @@
422424
"\n",
423425
"print(batch_result[ComponentType.sym_load][\"p\"])"
424426
]
427+
},
428+
{
429+
"cell_type": "markdown",
430+
"id": "3c018413",
431+
"metadata": {},
432+
"source": [
433+
"## Detailed configuration with the params.json\n",
434+
"\n",
435+
"### Validation cases with exceptions\n",
436+
"\n",
437+
"In certain cases, you may want to create test cases that either \"should raise an exception\" and/or \"currently have a behavior that differs from the intended one\". Examples are:\n",
438+
"\n",
439+
"* To create a test case for a known [explicitly forbidden input](../advanced_documentation/terminology.md#bad-input), such as an unobservable grid.\n",
440+
" * In this case, the validation case _should raise an exception_.\n",
441+
"* To create a repro case for a bug.\n",
442+
" * In this case, the validation case _currently has behavior that differs from the intended behavior_.\n",
443+
"* To create a test for behavior that is not implemented yet, like a new calculation method.\n",
444+
" * In this case, the validation case _currently has behavior that differs from the intended behavior_.\n",
445+
"* To create a behavioral test for bad input when applying BDD (behavior driven development) practices.\n",
446+
" * In this case, the validation case both _should raise an exception_ and _currently has behavior that differs from the intended behavior_.\n",
447+
"\n",
448+
"To support the two use cases, two additional keywords are exposed: `raises` (to denote that raising is intended behavior) and `xfail` (to denote that the current behavior differs from the intended behavior). The values are dicts that contain a `raises` phrase and a `reason` to denote the exact intended/expected exception type, as well as a human-readable explanation about why that exception is intended/expected. Only exceptions known to the PGM tests can be added to the configuration. Note that the list of known exceptions is non-exhaustive, so it may be necessary to add a new exception type to the list of known exceptions.\n",
449+
"\n",
450+
"In addition to the PGM exception types and some Python built-in exception types, there are two types worth explicitly mentioning:\n",
451+
"\n",
452+
"* `AssertionError` can be used to denote tests in which the actual values are different from the expected values.\n",
453+
"* `Failed` can be used to denote tests that should raise (`raises`) but actually do not raise any exceptions at all (`xfail`).\n",
454+
" * **NOTE:** If `_pytest.outcomes.Failed` is not found, there is a fall-back to the default [`pytest.mark.xfail`](https://docs.pytest.org/en/stable/reference/reference.html#pytest-mark-xfail) implementation. That means, that a test is marked `xfail` if it throws any exception that is not the intended exception, not `fail`. This is a known limitation of the current implementation.\n",
455+
"\n",
456+
"The following example shows how a test can be created intended to test future behavior.\n",
457+
"\n",
458+
"```json\n",
459+
"{\n",
460+
" \"calculation_method\": \"iterative_linear\",\n",
461+
" \"rtol\": 1e-8,\n",
462+
" \"atol\": {\n",
463+
" \"default\": 1e-8,\n",
464+
" \".+_residual\": 1e-4\n",
465+
" },\n",
466+
" \"raises\": {\n",
467+
" \"raises\": \"NotObservableError\",\n",
468+
" \"reason\": \"The grid does not contain a voltage sensor\"\n",
469+
" },\n",
470+
" \"xfail\": {\n",
471+
" \"raises\": \"SparseMatrixError\",\n",
472+
" \"reason\": \"A sufficient observability check for meshed grids is not yet implemented. See also https://github.com/PowerGridModel/power-grid-model/issues/864.\"\n",
473+
" }\n",
474+
"}\n",
475+
"```\n",
476+
"\n",
477+
"By default, the Power Grid Model test configuration accepts `XFAIL` cases - because they represent known issues - but rejects `XPASS` cases, i.e., the actual behavior was as intended, even though we expected it to be different. `XPASS` cases can happen because a known issue was fixed, or it can be the result of a newly introduced bug. It is up to the developer to resolve the conflict. Providing a good reason when marking something `xfail` can help with this decision. See https://docs.pytest.org/en/stable/reference/reference.html#pytest-mark-xfail for details.\n",
478+
"\n",
479+
"The following table shows how the different configurations can be used. Exception types `AError`, `BError` and `CError` denote exceptions known to the PGM tests.\n",
480+
"\n",
481+
"| Intended behavior | Expected behavior | Keywords | Actual behavior: pass | Actually raises `AError` | Actually raises `BError` | actually raises `CError` |\n",
482+
"| ----------------- | ----------------- | ----------------------------------------------------------------- | :---------------------------------------------: | :-------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: |\n",
483+
"| pass | pass | `{}` | <span style=\"color : green\">**.** (PASS)</span> | <span style=\"color : red\">F (AError)</span> | <span style=\"color : red\">F (BError)</span> | <span style=\"color : red\">F (CError)</span> |\n",
484+
"| pass | raises `AError` | `{\"xfail\": {\"raises\": \"AError\"}}` | <span style=\"color : red\">F (XPASS)</span> | <span style=\"color : orange\">x (XFAIL)</span> | <span style=\"color : red\">F (BError) | <span style=\"color : red\">F (CError)</span> |\n",
485+
"| raises `AError` | pass | `{\"raises\": {\"raises\": \"AError\"}, \"xfail\": {\"raises\": \"Failed\"}}` | <span style=\"color : orange\">x (XFAIL) </span> | <span style=\"color : red\">F (XPASS)</span> | <span style=\"color : red\">F (BError)</span> if `_pytest.outcomes.Failed` is available, else <span style=\"color : orange\">x (XFAIL)</span> | <span style=\"color : red\">F (CError)</span> if `_pytest.outcomes.Failed` is available, else <span style=\"color : orange\">x (XFAIL)</span> |\n",
486+
"| raises `AError` | raises `AError` | `{\"raises\": {\"raises\": \"AError\"}}` | <span style=\"color : red\">F (Failed)</span> | <span style=\"color : green\">. (PASS)</span> | <span style=\"color : red\">F (BError) | <span style=\"color : red\">F (CError) |\n",
487+
"| raises `AError` | raises `BError` | `{\"raises\": {\"raises\": \"AError\"}, \"xfail\": {\"raises\": \"BError\"}}` | <span style=\"color : red\">F (Failed)</span> | <span style=\"color : red\">F (XPASS)</span> | <span style=\"color : orange\">x (XFAIL) | <span style=\"color : red\">F (CError)</span> |\n",
488+
"\n",
489+
"### Calculation method-specific configuration\n",
490+
"\n",
491+
"In rare cases, for instance when creating a new calculation method, calculation method-specific configuration may be necessary. This can be done via the `extra_params` keyword, which, for each overloaded calculation method, contains objects similar to the root `params.json`, but that is applied as a patch for that specific calculation method run. The following example illustrates that.\n",
492+
"\n",
493+
"```json\n",
494+
"{\n",
495+
" \"calculation_method\": [\n",
496+
" \"iterative_linear\",\n",
497+
" \"newton_raphson\"\n",
498+
" ],\n",
499+
" \"rtol\": 1e-8,\n",
500+
" \"atol\": {\n",
501+
" \"default\": 1e-8,\n",
502+
" \".+_residual\": 5e-4\n",
503+
" },\n",
504+
" \"extra_params\": {\n",
505+
" \"newton_raphson\": {\n",
506+
" \"experimental_features\": \"enabled\",\n",
507+
" \"xfail\": {\n",
508+
" \"raises\": \"SparseMatrixError\",\n",
509+
" \"reason\": \"Current sensors are not yet implemented for this calculation method\"\n",
510+
" }\n",
511+
" }\n",
512+
" }\n",
513+
"}\n",
514+
"```"
515+
]
425516
}
426517
],
427518
"metadata": {
@@ -440,7 +531,7 @@
440531
"name": "python",
441532
"nbconvert_exporter": "python",
442533
"pygments_lexer": "ipython3",
443-
"version": "3.13.0"
534+
"version": "3.13.5"
444535
}
445536
},
446537
"nbformat": 4,

tests/cpp_validation_tests/test_validation.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ std::optional<CaseParam> construct_case(std::filesystem::path const& case_dir, j
388388
}
389389
}
390390

391-
param.fail = calculation_method_params.contains("fail");
391+
param.fail = calculation_method_params.contains("xfail") || calculation_method_params.contains("raises");
392392
if (calculation_type == "short_circuit") {
393393
calculation_method_params.at("short_circuit_voltage_scaling").get_to(param.short_circuit_voltage_scaling);
394394
}

tests/data/power_flow/automatic-tap-regulator/auto-tap-changer-meshed-any-max-iter/params.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"tap_changing_strategy": "any_valid_tap",
44
"rtol": 1e-05,
55
"atol": 1e-05,
6-
"fail": {
6+
"raises": {
77
"raises": "MaxIterationReached",
88
"reason": "TapPositionOptimizer::iterate"
99
}

tests/data/power_flow/non-existent-id-update-batch/params.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"calculation_method": "linear",
33
"rtol": 1e-5,
44
"atol": 1e-5,
5-
"fail": {
5+
"raises": {
66
"raises": "PowerGridError",
77
"reason": "Same invalid ID in update data set"
88
}

tests/data/state_estimation/current-sensor/global-current-sensor/params.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"extra_params": {
1212
"newton_raphson": {
1313
"experimental_features": "enabled",
14-
"fail": {
14+
"xfail": {
1515
"raises": "SparseMatrixError",
1616
"reason": "Current sensors are not yet implemented for this calculation method"
1717
}

tests/data/state_estimation/ill-conditioned-system/leaf-without-power-sensor-meshed/params.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"default": 1e-8,
66
".+_residual": 5e-4
77
},
8-
"fail": {
8+
"xfail": {
99
"raises": "SparseMatrixError",
1010
"reason": "Bug in Sparse LU solver found in #864"
1111
}

tests/data/state_estimation/not-independent-sensor/not-independent-sensor-with-phasor-sensors/params.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"default": 1e-8,
66
".+_residual": 5e-4
77
},
8-
"fail": {
8+
"raises": {
99
"raises": "NotObservableError",
1010
"reason": "Voltage phasor, power and current sensor's measurements are not independent"
1111
}

tests/data/state_estimation/not-independent-sensor/not-independent-sensor/params.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"default": 1e-8,
66
".+_residual": 5e-4
77
},
8-
"fail": {
8+
"raises": {
99
"raises": "NotObservableError",
1010
"reason": "Power measurements are not independent"
1111
}

0 commit comments

Comments
 (0)