-
Notifications
You must be signed in to change notification settings - Fork 48
Quick implementation of "is_nearly_double" #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I like the "nearly" twist in the naming! Following the "natural reading" principle of Cgreen constraints should probably be
|
I can't say I'm familiar with what "natural reading" is. If the idea is that you should be able to read an assert fluently from left to right, like in an English sentence, then
|
Thanks @sampotter, for the comments. The example you gave was exactly what I did mean, but I'm not a native English speaker so my language sense might be off when it comes to "natural reading", so your comment seems reasonable.
|
Edit: Want to make sure what I'm trying to say about the current value of When we look at relative errors, we're trying to normalize the error so that the magnitude of the quantities being compared is on the order of 1.0, making it sensible to compare with machine epsilon or some constant multiple thereof. To come up with a more precise tolerance, you need to know more about what you're testing. For example, the expected round-off error from summing N numbers in sequence is ≈ ε sqrt(N), where ε is machine epsilon. So, a default tolerance of 1e-13 is similar to the expected error from summing ~200,000 numbers in sequence. It isn't particularly easy to choose a one-size-fits-all default floating point tolerance... Anyway, I'll make these changes shortly and add them to this PR for you to take a look at. (The reason I'd like to use the |
Hi! Any update on this? No rush, just want to see if there is progress... |
Hi Thomas, sorry for the delay! Should be able to reply more quickly now. I went ahead and simplified src/constraint.c a bit by replacing the What should be done next? There are evidently some failing tests. Not sure if they're failing because of changes I introduced. |
Looking into the Travis output I can actually see only one test that fails (we have a build matrix in Travis for gcc, g++ and dynamic/static linking, so that's 4 travis jobs).
This is the code of the test:
It obviously tests that significant figures works as expected, 337 and 339 are equal if only 2 significant digits, but different with 3 significant figures. (Again, the exact mathematics here is beyond me. Hope you can make some sense of that.) |
Ah, thanks. I missed that because of the many lines reading How can I debug this test? From (Thanks for bearing with me while I figure out how this project is organized!) |
No problem. Being a CMake-project things are a little more complicated than you really want them to be. (Sometimes I wonder if the portability benefits are worth it....) TIP: I most often use But that doesn't help when you want to debug. You are using the correct command. To be really sure, ensure that you are using the I checked out the PR and tried exactly that command on Cygwin and WSL. On Cygwin it passed, but on WSL it failed, which maybe makes sense in one way, since Travis in on Linux. But on Cygwin
(The "messages" tests run tests and matches output to some golden output, to ensure that user level messages are not trashed, but again these are not run with That indicates that even when the constraints test passes some other tests related to doubles might fail. And that usually indicates reading/writing outside memory/buffer overruns or some other horrible thing. So let's valgrind it; Nope, no problem. So the only thing that I can think of is
Maybe there is something in the libraries that are different that makes that statement not hold on all platforms? To actually debug you can probably adapt what the
$1 and $2 are the library and the test function respectively, like
The first couple of lines converts the Cgreen-style testname to the linkage name, dumps a break and a run command in temporary file, then The "set breakpoint pending on" instructs gdb to set breakpoints even though that code is not loaded yet. (It isn't when the cgreen-runner starts.) |
So, did you get a hold on the problem here? Again, if it is a mathematical problem (rounding?) then it´s over my head... |
Sorry for the hold up, simply did not have time to get to this recently... The failures all seemed to be because I tried to remove the Looks like my new commit is passing. Not sure how important the decrease in coverage is to you. If you'd like me to add a new section to the docs, please give me some basic instructions and I should be able to take care of this soon. |
No problem. Life.
That's probably right for now, maybe we need a mathematician ;-)
Passing is good. Coverage decrease as such is not a problem per se, but I'd really like a few tests that
Good, thanks. The docs/guide are written in AsciiDoc (which is like Markdowns bigger brother) residing in the
I also remembered that you had a question in your first comment, about switching on the name of the constraint. (Maybe this should go in code comments...) I think you actually should use the
Maybe I can help out with some of these. |
Thanks for the detailed instructions, much appreciated! I'll start looking into these tasks as I'm able this week. Re: getting a mathematician: that's basically what I am... I looked into a good way to reliably count the number of significant figures and couldn't come up with something that I had much confidence in (because of edge cases relating to floating point arithmetic). That |
No problem. Again, if you think I can help, just say the word.
I did realised that you were much more knowledgeable than me in the area. Sorry for not picking up on the mathematician bit ;-) I hear what you are saying, I think. From a "maths" perspective accuracy is the right way to go. And I buy that, so your contribution is very valuable. But I guess that to @d-meiser, who did the original I also want to bring up the naming of the user level macro again. After some time I realized what bugged me with Coming from Looking around I see that both Jasmine and Hamcrest use some variation of "is close to", while FluentAssertions use "BeApproximately" (yuck... ;-). Also, the "double" type indicator (and "_string") for So, thinking that we don't actually need the typing information here (an int will be automatically type converted into a double, and there is no "int-typed" version of Last point, do we need |
Sure, will do. Thanks.
Of course, it's not a problem at all. I'm not so egotistical that it would offend me (only a little egotistical). ;-)
Well, it's two things. Let me clarify. Issue #1: computing significant digits is of limited practical utility. Anyone who thinks they need it for something probably doesn't. I can see maybe an argument for it when comparing amounts of currency, but outside of that it is basically always better to use one of these normalized "relative error" tests, like I added. You can think of these tests as being the continuous version of the discrete "significant digits" test. Because of this, they're more natural and robust when dealing with floating point numbers. Issue #2: I don't have a lot of confidence in the current implementation of "sig digits". What's checked in currently passes all the tests, but I have a hunch that if I did some aggressive testing I'd find plenty of corner cases that fail. Part of my lack of confidence comes from not knowing the provenance of (Incidentally, we can assume that floating point behavior should be consistent across platforms. This should be guaranteed the C standards' conformance to the IEEE754 standard. If things vary across platforms, it's probably a compiler bug.) Now, as far as Issue #1 goes: whatever. ;-) This isn't my library and it doesn't bother me if there's a set of test functions in it that I don't think make a lot of sense and won't use myself. If people depend on them now, I won't assume I know their situation better than they do! As for Issue #2: I think it would be good to have an implementation of "check sig digits" that we can be confident in. I spent about an hour trying to come up with an approach based just on floating point arithmetic, but because of rounding wasn't able to find something that worked robustly. I came up with a bit of a more heavyweight solution instead. Let me know what you think: // digits.c
#include <math.h>
#include <stdio.h>
// Base 10 version of frexp from math.h
double frexp10(double x, int *exp) {
*exp = floor(log10(fabs(x)));
return x/pow(10.0, (double)*exp);
}
int get_sig_digits(double x, double y) {
if (x == y) return 16; // max digits in double mantissa
// Get (m)antissa and (e)xponent of x and y
int ex, ey;
double mx = frexp10(x, &ex);
double my = frexp10(y, &ey);
if (ex != ey) return 0; // no shared digits if diff. exps
// A number if "positive" if >= 0 --- should be enough for the comparisons below
int sx = mx >= 0, sy = my >= 0;
if (sx != sy) return 0; // no shared digits if diff. signs
// Make mx and my nonnegative for snprintf (don't want leading '-' char)
if (!sx) mx *= -1;
if (!sy) my *= -1;
// Print out normalized mantissas of x and y
char xstr[19], ystr[19]; // will write 18 digits
snprintf(xstr, sizeof(xstr), "%1.16f", mx);
snprintf(ystr, sizeof(ystr), "%1.16f", my);
// Count common leading digits
int d = xstr[0] == ystr[0];
int c = 2;
while (xstr[c] == ystr[c]) {
++d;
++c;
}
return d;
}
int main() {
// Test cases...
double pairs[][2] = {
{0.1, -0.1},
{1.0012, 1.0012},
{1.0100, 1.0012},
{7543.20001, 75},
{-987.4, -987.0004},
{1010, 1012},
{0.1, -0.1},
{-1.2, -1.19},
{20.2, 20.3},
{-21.1, -21.13},
{-21.1, -21.11},
{-21.1, -21.09},
{-21.1, -21.07},
{-21.1, -21.05},
{-21.1, -21.03},
{-21.1, -21.01},
};
int num_pairs = sizeof(pairs)/sizeof(double[2]);
double x, y;
int d;
for (int i = 0; i < num_pairs; ++i) {
x = pairs[i][0];
y = pairs[i][1];
d = get_sig_digits(x, y);
printf("x = %f, y = %f: sig digits = %d\n", x, y, d);
}
} Compiled on macOS using sfp@Samuels-MBP ~ % ./digits
x = 0.100000, y = -0.100000: sig digits = 0
x = 1.001200, y = 1.001200: sig digits = 16
x = 1.010000, y = 1.001200: sig digits = 2
x = 7543.200010, y = 75.000000: sig digits = 0
x = -987.400000, y = -987.000400: sig digits = 3
x = 1010.000000, y = 1012.000000: sig digits = 3
x = 0.100000, y = -0.100000: sig digits = 0
x = -1.200000, y = -1.190000: sig digits = 1
x = 20.200000, y = 20.300000: sig digits = 2
x = -21.100000, y = -21.130000: sig digits = 3
x = -21.100000, y = -21.110000: sig digits = 4
x = -21.100000, y = -21.090000: sig digits = 2
x = -21.100000, y = -21.070000: sig digits = 2
x = -21.100000, y = -21.050000: sig digits = 2
x = -21.100000, y = -21.030000: sig digits = 2
x = -21.100000, y = -21.010000: sig digits = 2 Note the line where I get the feeling you're more of a C hacker than I am. Maybe you can think of some clever ways to optimize this, or some corner cases I missed. :-)
I agree, that's unfortunate... I've been using these macros in my own branch for a library that I've been developing for a little while now and have basically learned to just ignore the type in each of these lines. I guess there's a bit of a tension between, on the one hand, wanting to provide fluent tests that indicate the type, and needing to accommodate C's weak type system on the other. E.g., I don't think this would be an issue as much in C++. (I see you mentioned this in your reply, as well.)
To make it a bit more explicit for the user, how about this option? Add functions to be called in BeforeEach (blah) {
do_single_precision_floating_point_tests();
absolute_tolerance_is(1e-6);
}
Ensure (blah, ...) {
...
float x = 1.01, y = 1.02;
assert_that(x, is_not_nearly(y));
...
} Incidentally, a cool side benefit of here would be that cgreen would then be in a position to tell the user what they're doing doesn't make sense. If the library knows it's going to do single precision tests, then it can alert the user that it passed a relative tolerance that's too small. For example, |
I think |
At first I didn't really follow here, but I think I now understand (by reading the code ;-) that you aimed for exploring if significant figures actually "work" in various situations, right? If so, not being fluent in floating point arithmetic, I'm not sure I can help with the actual core issue. I initially thought you could just take any floating point number, say 1.001 and then add next 10e-1 to it which should give 1.0011 which are equal to 4 digits but not to 5, then continue with 1.0001 for 5 & 6 a.s.o. But again, I'm not sure I even understand where to look, this is probably only one of many vectors in the problem, and again, rounding... Just throwing around ideas. |
This is a really good idea! Not only can we check some things, like you mentioned, but we also get a way to stepwise deprecate the "significant digits approach" if we need/want/should:
(I see that I think of the two "methods" as "significant digits" and "accuracy" respectively, but you talk about them as "single precision" and "double precision", am I (again) miss(understand)ing something here? I feel really thick here...) |
I'm checking in on this to see where we are ;-) I've pondered this off and on for a while and my position (today ;-) is that
NOTE: I'm still a bit confused about your comment
because the previous implementation is all about "double" too. I'd like to understand this a bit better way the reasoning about "single" and "double", in particular why you consider the "significant figures" approach being "single precision", before making a final decision on those names. But assuming that is well-founded, then I'd like to have a function with an argument to switch implementation. Similar to
I'm trying to not sound rigid, but I'd like this to move forward and I don't have the knowledge to continue from this point (yet). But I think part of the stall is lack of decisions, so that's what I tried to provide better this time. If you have any disagreements, I'm happy to take them and discuss further. |
Sorry about the silence on my end. So that you're aware, I'm in the middle of finishing up my PhD dissertation and have no extra time to spare at the moment. This is 100% of where the delay is coming from on my end. I'll be done by August or so, and then I'll have some more time to help out here. I'm using my own fork of cgreen with the improved floating point tests on a daily basis, and it's worked quite nicely for me. I'm very happy with cgreen and plan to continue using it, so don't worry about me disappearing. ;-) Re: your most recent response: I agree completely with points 1-6. For 7, I still prefer For the rest of your response, I can't remember all the details of our previous conversations, but I'll try to summarize what I was thinking in case it helps:
So I think we're basically in agreement about everything, apart from some naming details? (BTW, even though I don't have time to code here now, I can continue discussing this, so I should reply rather quickly) |
Thanks for the update, and the collection of thoughts! Yes, it seems like we mostly agree. So I'll let you focus on your PhD and we'll touch base in August! |
@sampotter hopefully your PhD program went well. here's a ping in case you have some time over your holidays :) |
Thanks for the ping. PhD successfully defended and now fully engaged in a postdoc, so quite busy. I have time now to help now, and would like to get this off my backburner (!), but I'm frankly lost as to where to start. I haven't thought about this in several months and the details are no longer at my fingertips. From what I can tell, If someone could feed me a single, self-contained task to focus on, I'd be happy to take care of it. Happy holidays! |
Congratulations!
I think your implementation of double assertions has great merit, and I'd like to see that as the future default (only reservation is that I have not had any use, and thus no experiance, for doubles and am no "mathematician" so I feel a bit hestiant to make this call, but I trust you on this). To get there we need and implementation that can exist side by side with the current and allowing programmatic selection during run-time is the way to go. Then we can decide to switch default at some breakpoint and then deprecate the "old" style. So, that would pretty much point to your post on Jun 21. I'm not sure if that is "single, self-contained" enough, but consider it a Christmas wish ;-)
Thanks! And to you too! |
3d2b639
to
35ca914
Compare
Here's a quick implementation of
is_nearly_double
for issue #245. Some things that jump out at me:assert_that_double_
is hard-coded assuming that the user will have specified a relative error in terms of significant figures. To emit a more helpful message on test failure when "nearly" is being used, I switched on the constraint name. This seems like a bad way to go. I figured I would get @thoni56's input on how best to handle this.significant_figures
) andabsolute_tolerance
(I wasn't sure how the default value ofDBL_MIN/1e-8
was chosen).significant_figures
andabsolute_tolerance
withdouble_relative_tolerance
anddouble_absolute_tolerance
: when setting the number of significant figures, just setdouble_relative_tolerance = pow(10, 1 - significant_figures)
. This is handy, too, because then there's no need to calllog10
when callingaccuracy
.Anyway, let me know what you think of what I've thrown together so far and if I have the right idea. I don't know much about the design of Cgreen, so please feel free to clue me in. :-)