Make `run_tests` output progressive #345

rmonat · 2025-06-09T19:28:31Z

I was tired of having to wait for the whole tests to finish to get any output. This PR factors computation of results and statistics in a single list traversal, but now the output of results is progressive.

I find this more convenient but I would understand if there are good reasons not to do it that way :)

Factors computation of results and statistics in a single list traversal, but now the output is progressive, which I find convenient for large testsuites

jmid · 2025-06-19T10:44:15Z

Thanks - and apologies for the delay! 🙏

I completely agree that having to wait for, e.g., 10 tests to complete is sub-optimal if the error is already happening in number 1 or 2. Even worse if, e.g., number 8 hangs, that means one will not get any earlier counterexamples printed
without manually commenting it out and rerunning. 😬

The question is how we can make the output progressive while still keeping it manageable.
At the bottom, I attach the output of running dune exec example/QCheck_runner_test.exe -- -v
as an example of a long list of tests with various behaviours:

some fail and print a counterexample
some err with an exception
some print "Collect" labelled statistics
some print numeric statistics
some negative tests fail by not finding a counterexample
some negative tests pass and print Info with a counterexample
some use custom counterexample printing showing up in a Message

I think we should take a step backward and consider how we would like the output of those cases to be structured?

Many client users will likely just want to see a list of all green tests printed neatly under each other
(as such I think we should strive to keep that kind of behaviour)
With failing tests in the mix, the header generated error fail pass / total time test name flows away from the lines below, intended to be nicely aligned under it. Should we go away from printing a header? Or reprint it after inferring output?
When there is at most one Failure, Error, Info or Messages(!?) for each test having just run, with progressive output we should be able to remove some redundancy (Failure ... Test ... failed, Messages ... Messages for ... and repeatedly printing the same test name). The situation is just different from when they are printed apart from each other and hence requires repetition ("Here is the counterexample for test X42 which we ran above")
Do we want labeled and numeric stats printed progressively as well, or should they continue to go at the bottom?
(I personally think either is fine, but would like the choice to be a conscious decision)

qcheck$ dune exec example/QCheck_runner_test.exe -- -v
random seed: 390040087
 generated  error   fail   pass /  total     time test name
[✓]    100      0      0    100 /    100     0.0s list_rev_is_involutive
[✗]      1      0      1      0 /     10     0.0s should_fail_sort_id

--- Failure --------------------------------------------------------------------

Test should_fail_sort_id failed (12 shrink steps):

[1; 0]
[✗]      1      1      0      0 /     10     0.0s should_error_raise_exn

=== Error ======================================================================

Test should_error_raise_exn errored on (63 shrink steps):

0

exception Dune__exe__QCheck_runner_test.Error

[✓]    100      0      0    100 /    100     0.0s collect_results

+++ Collect ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Collect results for test collect_results:

4: 17 cases
3: 22 cases
2: 26 cases
0: 19 cases
1: 16 cases
[✓]    100      0      0    100 /    100     0.0s with_stats

+++ Stats for with_stats ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats mod4:
  num: 100, avg: 1.58, stddev: 1.12, median 2, min 0, max 3
  0: ###########################################                      22
  1: ###################################################              26
  2: ###############################################                  24
  3: #######################################################          28

stats num:
  num: 100, avg: 57.90, stddev: 35.78, median 57, min 4, max 120
    4..  9: #######################################################          12
   10.. 15: ################################                                  7
   16.. 21: #########                                                         2
   22.. 27: ###########################                                       6
   28.. 33: #############                                                     3
   34.. 39: ###########################                                       6
   40.. 45: #########                                                         2
   46.. 51: ###########################                                       6
   52.. 57: ###########################                                       6
   58.. 63: ######################                                            5
   64.. 69: ###########################                                       6
   70.. 75: ##################                                                4
   76.. 81: #########                                                         2
   82.. 87: ###########################                                       6
   88.. 93: ################################                                  7
   94.. 99: ####                                                              1
  100..105: ################################                                  7
  106..111: ######################                                            5
  112..117: ######################                                            5
  118..123: #########                                                         2
[✓]      2      0      1      1 /    100     0.0s neg test pass (failing as expected)

--- Info -----------------------------------------------------------------------

Negative test neg test pass (failing as expected) failed as expected (0 shrink steps):

7
[✗]    100      0      0    100 /    100     0.0s neg test unexpected success

--- Failure --------------------------------------------------------------------

Test neg test unexpected success failed:

Negative test neg test unexpected success succeeded but was expected to fail
[✗]      1      1      0      0 /    100     0.0s neg fail with error

=== Error ======================================================================

Test neg fail with error errored on (7 shrink steps):

0

exception Dune__exe__QCheck_runner_test.Error

[✗]      1      0      1      0 /    100     0.0s FAIL_pred_map_commute

--- Failure --------------------------------------------------------------------

Test FAIL_pred_map_commute failed (82 shrink steps):

([1], {_ -> 0}, {0 -> true; _ -> false})
[✗]      5      0      1      4 /    100     0.0s FAIL_fun2_pred_strings

--- Failure --------------------------------------------------------------------

Test FAIL_fun2_pred_strings failed (1 shrink steps):

{"some random string" -> true; _ -> false}
[✗]      1      0      1      0 /   1000     0.0s fold_left fold_right

--- Failure --------------------------------------------------------------------

Test fold_left fold_right failed (39 shrink steps):

(0, [1], {(1, 0) -> 1; _ -> 0})

+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test fold_left fold_right:

l=[1], fold_left=1, fold_right=0

[✗]      1      0      1      0 /   1000     0.0s fold_left fold_right uncurried

--- Failure --------------------------------------------------------------------

Test fold_left fold_right uncurried failed (43 shrink steps):

({(0, 7) -> 1; _ -> 0}, 0, [7])
[✗]      1      0      1      0 /    100     0.0s long_shrink

--- Failure --------------------------------------------------------------------

Test long_shrink failed (152 shrink steps):

([0], [-1])
[✓]    100      0      0    100 /    100     0.0s find_example
[✗]      3      0      1      2 /   1000     0.0s mod3_should_fail

--- Failure --------------------------------------------------------------------

Test mod3_should_fail failed (73 shrink steps):

-2232333
[✓]   5000      0      0   5000 /   5000     0.0s stats_neg

+++ Stats for stats_neg ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 5000, avg: 0.59, stddev: 28.67, median 0, min -99, max 99
  -99..-90: #                                                                56
  -89..-80: #                                                                51
  -79..-70: #                                                                66
  -69..-60: #                                                                54
  -59..-50: #                                                                55
  -49..-40: #                                                                53
  -39..-30: #                                                                76
  -29..-20: #                                                                53
  -19..-10: #                                                                67
   -9..  0: #######################################################        2131
    1.. 10: ##############################################                 1793
   11.. 20: #                                                                49
   21.. 30: #                                                                63
   31.. 40: #                                                                56
   41.. 50: #                                                                71
   51.. 60: #                                                                48
   61.. 70: #                                                                68
   71.. 80: #                                                                53
   81.. 90: #                                                                74
   91..100: #                                                                63
[✓]   2200      0      0     10 /   2000     0.0s WARN_unlikely_precond

!!! Warning !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Warning for test WARN_unlikely_precond:

WARNING: only 0.5% tests (of 2000) passed precondition for "WARN_unlikely_precond"

NOTE: it is likely that the precondition is too strong, or that the generator is buggy.
[✗]   2200      0      0     10 /   2000     0.0s FAIL_unlikely_precond

--- Failure --------------------------------------------------------------------

Test FAIL_unlikely_precond failed:

ERROR: only 0.5% tests (of 2000) passed precondition for "FAIL_unlikely_precond"

NOTE: it is likely that the precondition is too strong, or that the generator is buggy.

[✓]   1000      0      0   1000 /   1000     0.0s tree_rev_is_involutive
[✗]      0      0      0      0 /    100     0.0s FAIL_#99_1

--- Failure --------------------------------------------------------------------

Test FAIL_#99_1 failed:

ERROR: uncaught exception in generator for test FAIL_#99_1 after 100 steps:
Exception: QCheck.No_example_found("<example>")
Backtrace: 
[✓]     10      0      0     10 /     10     0.0s should_succeed_#99_2
[✓]   1000      0      0   1000 /   1000     0.0s stat_display_test_1

+++ Stats for stat_display_test_1 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 1000, avg: 1.49, stddev: 30.45, median 0, min -99, max 99
  -99..-90: ##                                                               15
  -89..-80: #                                                                10
  -79..-70: ##                                                               15
  -69..-60: #                                                                 9
  -59..-50:                                                                   6
  -49..-40: #                                                                14
  -39..-30: ##                                                               17
  -29..-20: #                                                                11
  -19..-10: #                                                                13
   -9..  0: #######################################################         401
    1.. 10: #################################################               363
   11.. 20:                                                                   6
   21.. 30: #                                                                10
   31.. 40: ##                                                               18
   41.. 50: ##                                                               18
   51.. 60: #                                                                14
   61.. 70: ##                                                               18
   71.. 80: #                                                                12
   81.. 90: ##                                                               19
   91..100: #                                                                11
[✓]   1000      0      0   1000 /   1000     0.0s stat_display_test_2

+++ Stats for stat_display_test_2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 1000, avg: 15.71, stddev: 24.11, median 6, min 0, max 99
    0..  4: #######################################################         390
    5..  9: ######################################################          388
   10.. 14: #                                                                11
   15.. 19: #                                                                10
   20.. 24: #                                                                 9
   25.. 29: #                                                                 8
   30.. 34: ##                                                               19
   35.. 39: #                                                                13
   40.. 44: #                                                                10
   45.. 49: ##                                                               19
   50.. 54: #                                                                11
   55.. 59: #                                                                13
   60.. 64: #                                                                10
   65.. 69: #                                                                14
   70.. 74: ##                                                               20
   75.. 79: #                                                                10
   80.. 84: ##                                                               16
   85.. 89: #                                                                 9
   90.. 94: #                                                                 9
   95.. 99: #                                                                11
[✓]   1000      0      0   1000 /   1000     0.0s stat_display_test_3

+++ Stats for stat_display_test_3 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 1000, avg: 201011.76, stddev: 138557.43, median 208327, min -43381, max 434559
  -43381..-19485: ###########################################                      48
  -19484..  4412: ############################################                     49
    4413.. 28309: ##########################################                       47
   28310.. 52206: ####################################################             58
   52207.. 76103: ########################################                         45
   76104..100000: ########################################                         45
  100001..123897: ########################################                         45
  123898..147794: #######################################                          44
  147795..171691: #####################################                            42
  171692..195588: ##########################################                       47
  195589..219485: #################################################                55
  219486..243382: #############################################                    50
  243383..267279: #######################################################          61
  267280..291176: ############################################                     49
  291177..315073: ###############################################                  53
  315074..338970: #################################################                55
  338971..362867: ###############################################                  53
  362868..386764: ############################################                     49
  386765..410661: ###############################################                  53
  410662..434558: #############################################                    51
  434559..458455:                                                                   1
[✓]   1000      0      0   1000 /   1000     0.0s stat_display_test_4

+++ Stats for stat_display_test_4 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 1000, avg: -152.62, stddev: 23111.02, median -166, min -39965, max 39990
  -39965..-35968: ##############################################                   51
  -35967..-31970: ###############################################                  52
  -31969..-27972: #############################################                    50
  -27971..-23974: #############################################                    50
  -23973..-19976: ##############################################                   51
  -19975..-15978: #########################################                        45
  -15977..-11980: #############################################                    50
  -11979.. -7982: #######################################################          60
   -7981.. -3984: ############################################                     48
   -3983..    14: ##########################################                       46
      15..  4012: ################################################                 53
    4013..  8010: ##################################################               55
    8011.. 12008: ############################################                     48
   12009.. 16006: #################################                                37
   16007.. 20004: ##############################################                   51
   20005.. 24002: ############################################                     49
   24003.. 28000: ################################################                 53
   28001.. 31998: ################################################                 53
   31999.. 35996: #############################################                    50
   35997.. 39994: ############################################                     48
[✓]   1000      0      0   1000 /   1000     0.0s stat_display_test_5

+++ Stats for stat_display_test_5 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 1000, avg: -0.00, stddev: 2.59, median 0, min -4, max 4
  -4: ##############################################                  108
  -3: #################################################               116
  -2: ##################################################              118
  -1: #############################################                   105
   0: ################################################                112
   1: ###############################################                 111
   2: #######################################                          93
   3: #######################################################         128
   4: ##############################################                  109
[✓]   1000      0      0   1000 /   1000     0.0s stat_display_test_6

+++ Stats for stat_display_test_6 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 1000, avg: 6.71, stddev: 6.38, median 7, min -4, max 17
  -4..-3: ################################################                 97
  -2..-1: ########################################                         81
   0.. 1: ###########################################                      86
   2.. 3: ##############################################                   92
   4.. 5: #################################                                67
   6.. 7: #######################################################         109
   8.. 9: #########################################                        83
  10..11: ##############################################                   92
  12..13: ##################################################              101
  14..15: ####################################################            104
  16..17: ############################################                     88
  18..19:                                                                   0
  20..21:                                                                   0
  22..23:                                                                   0
  24..25:                                                                   0
  26..27:                                                                   0
  28..29:                                                                   0
  30..31:                                                                   0
  32..33:                                                                   0
  34..35:                                                                   0
[✓] 100000      0      0 100000 / 100000     0.0s stat_display_test_7

+++ Stats for stat_display_test_7 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

stats dist:
  num: 100000, avg: 4.332e+14, stddev: 2.663e+18, median 14428394352656942, min -4611660177243528925, max 4611678045347350296
  -4611660177243528925..-4150493266113984990: #####################################################          4994
  -4150493266113984989..-3689326354984441054: #####################################################          5041
  -3689326354984441053..-3228159443854897118: ######################################################         5070
  -3228159443854897117..-2766992532725353182: #####################################################          5011
  -2766992532725353181..-2305825621595809246: #####################################################          4988
  -2305825621595809245..-1844658710466265310: ###################################################            4861
  -1844658710466265309..-1383491799336721374: #####################################################          4959
  -1383491799336721373.. -922324888207177438: ####################################################           4914
   -922324888207177437.. -461157977077633502: #####################################################          4988
   -461157977077633501..       8934051910434: #####################################################          5014
         8934051910435..  461175845181454370: #######################################################        5145
    461175845181454371..  922342756310998306: #####################################################          5016
    922342756310998307.. 1383509667440542242: #####################################################          4987
   1383509667440542243.. 1844676578570086178: #####################################################          5025
   1844676578570086179.. 2305843489699630114: #####################################################          5009
   2305843489699630115.. 2767010400829174050: #####################################################          5018
   2767010400829174051.. 3228177311958717986: #####################################################          4995
   3228177311958717987.. 3689344223088261922: ####################################################           4914
   3689344223088261923.. 4150511134217805858: #####################################################          5023
   4150511134217805859.. 4611678045347349794: #####################################################          5027
   4611678045347349795.. 4611686018427387903:                                                                   1
================================================================================
1 warning(s)
failure (10 tests failed, 2 tests errored, ran 28 tests)

jmid · 2025-06-19T13:28:40Z

I just thought of another thing: 🤔

When running a list of tests without the --verbose / -v option, users still need enough context printed to understand, e.g., an error. So an option is to print those differently depending on running verbose (with progressive output providing context) and without (no context provided)

rmonat · 2025-06-26T10:33:48Z

From my understanding of qcheck, this change makes the output progressive, but I'm not sure it concerns all outputs. I wasn't aware of all these kind of ouputs. I agree that some options could help too!

With failing tests in the mix, the header generated error fail pass / total time test name flows away from the lines below, intended to be nicely aligned under it. Should we go away from printing a header? Or reprint it after inferring output?

I'm not sure it's easy to do properly without new dependencies and in a cross-platform fashion, but we could probably keep some space for the header and print the rest/logs afterwards?

When there is at most one Failure, Error, Info or Messages(!?) for each test having just run, with progressive output we should be able to remove some redundancy (Failure ... Test ... failed, Messages ... Messages for ... and repeatedly printing the same test name). The situation is just different from when they are printed apart from each other and hence requires repetition ("Here is the counterexample for test X42 which we ran above")

Agreed!

Do we want labeled and numeric stats printed progressively as well, or should they continue to go at the bottom?
(I personally think either is fine, but would like the choice to be a conscious decision)

Is this displayed by default? It could be an option? Or displayed after the summary, and the logs/messages?

jmid · 2025-07-01T08:24:43Z

Do we want labeled and numeric stats printed progressively as well, or should they continue to go at the bottom?
(I personally think either is fine, but would like the choice to be a conscious decision)

Is this displayed by default? It could be an option? Or displayed after the summary, and the logs/messages?

Yes, the statistics are currently printed in both verbose and non-verbose mode.
I agree that an option could also make sense, e.g., to retain stats but not have them printed, e.g., in CI.
Otherwise, keeping them at the end also makes sense to me 👍

Finally I thought of (yet another) printing option I had forgotten above. With max_fail one can print several encountered counterexamples (also with the potential of removing some redundancy):

# let t = QCheck.(Test.make ~name:"List.rev test" ~max_fail:3 (list int) (fun xs -> xs = List.rev xs));;
val t : QCheck.Test.t = QCheck2.Test.Test <abstr>
# QCheck_base_runner.run_tests ~verbose:true [t];;
random seed: 361065173
generated error fail pass / total     time test name
[✗]    3    0    3    0 /  100     0.0s List.rev test

--- Failure --------------------------------------------------------------------

Test List.rev test failed (132 shrink steps):

[0; 1]

--- Failure --------------------------------------------------------------------

Test List.rev test failed (122 shrink steps):

[0; -1]

--- Failure --------------------------------------------------------------------

Test List.rev test failed (123 shrink steps):

[0; 1]
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)
- : int = 1

rmonat added 2 commits June 9, 2025 21:25

Make output progressive

dc52048

Factors computation of results and statistics in a single list traversal, but now the output is progressive, which I find convenient for large testsuites

Merge remote-tracking branch 'origin/main'

c0aed50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `run_tests` output progressive #345

Make `run_tests` output progressive #345

Uh oh!

rmonat commented Jun 9, 2025

Uh oh!

jmid commented Jun 19, 2025

Uh oh!

jmid commented Jun 19, 2025

Uh oh!

rmonat commented Jun 26, 2025

Uh oh!

jmid commented Jul 1, 2025

Uh oh!

Uh oh!

Make run_tests output progressive #345

Are you sure you want to change the base?

Make run_tests output progressive #345

Uh oh!

Conversation

rmonat commented Jun 9, 2025

Uh oh!

jmid commented Jun 19, 2025

Uh oh!

jmid commented Jun 19, 2025

Uh oh!

rmonat commented Jun 26, 2025

Uh oh!

jmid commented Jul 1, 2025

Uh oh!

Uh oh!

Make `run_tests` output progressive #345

Make `run_tests` output progressive #345