-
Notifications
You must be signed in to change notification settings - Fork 0
Improve fib, prevent regression in parse_int #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposed changes. However, there are multiple things in consideration that are making me reluctant to merge this. Some of them depend on a general consensus in the main Julia micro-benchmarks repository, others not as much.
Can you provide the raw values which you obtained before and after the change? Not all improvements will be reproducible in another machine.
Here's what I got in mine, they don't seem to be very conclusive regarding the regressions that you mentioned. The speed-up in fib is also very minor.
Before:
rust,iteration_pi_sum,6.077312
rust,matrix_multiply,29.457496
rust,matrix_statistics,6.059977
rust,parse_integers,0.137050
rust,print_to_file,10.579359
rust,recursion_fibonacci,0.035762
rust,recursion_quicksort,0.315747
rust,userfunc_mandelbrot,0.060763
After:
rust,iteration_pi_sum,6.077004
rust,matrix_multiply,29.338795
rust,matrix_statistics,6.062510
rust,parse_integers,0.133108
rust,print_to_file,9.086284
rust,recursion_fibonacci,0.034280
rust,recursion_quicksort,0.311959
rust,userfunc_mandelbrot,0.061079
.map(move |_| { | ||
let t = Instant::now(); | ||
op(); | ||
t.elapsed() | ||
}).min().unwrap() | ||
}).collect(); | ||
*results.iter().min().unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would seriously wish to know why this prevents loop-invariant code motion. 😮 Even when considering this part as a loop, I wonder what stops the compiler from just moving fib(fibarg)
out of the inner loop declared in line 266.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suspect it's because there may be a syscall (to do the memory allocation), and LLVM stops trying to analyze at that point.
What concerns me more is that the black_box
doesn't work at this place in the code... or doesn't seem to work. At least on my machine if I crank up the value of fibarg
to something above 40, the benchmark starts taking longer and longer to run, but the function that attempt to report the fastest run still reports 0.0
. Probably worth more investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show how you added black_box
to this section of the code? I wouldn't be surprised it these optimizations cannot be avoided at this level. Only once measure_best
is monomorphized can the compiler effectively know what can be done. Avoiding unwanted optimizations would require two black_box
es: one to assume nothing about fibarg
(so as to call fib
every time instead of just once), and another one to assume that f
is used (otherwise everything is optimized out).
It might also be worth pointing out that if I fold the vector into the best measurement while manually avoiding the check for an empty iterator, the optimization reappears:
#[inline]
fn measure_best<F: FnMut()>(niters: u32, mut op: F) -> Duration {
let first = {
let t = Instant::now();
op();
t.elapsed()
};
let results: Vec<_> = (0..niters - 1)
.map(move |_| {
let t = Instant::now();
op();
t.elapsed()
}).collect();
results.iter().fold(first, |a, &b| Duration::min(a,b))
}
This might suggest that min().unwrap()
is preventing the optimizations, rather than the memory allocation. LLVM is even known to optimize allocations away in some cases, which only makes me more surprised about this.
There's more to it than just the measuring: rustc nightly and GCC 4.8.5 produce different machine codes for fib
in isolation (Clang 6 provides the exact same outcome, which is reasonable). This might just be a matter of how well optimized the compiler is.
In the mean time, please try replacing *results.iter().min().unwrap()
with results.into_iter().min.unwrap()
, which provided a 1.06 speed-up on this end.
src/main.rs
Outdated
@@ -275,7 +275,7 @@ fn main() { | |||
let n: u32 = rng.gen(); | |||
let s = format!("{:x}", n); | |||
let m = u32::from_str_radix(&s, 16).unwrap(); | |||
assert_eq!(m, n); | |||
debug_assert_eq!(m, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm nope, we can't do this. At least while the issue 10 is unresolved. There is an inconsistency on how this is done across languages, but since the C version does these checks, it would currently be unfair to change this.
src/main.rs
Outdated
@@ -287,7 +287,7 @@ fn main() { | |||
let m = mandelperf(); | |||
if j == 0 { | |||
let mandel_sum: u32 = m.iter().sum(); | |||
assert_eq!(mandel_sum, 14791); | |||
debug_assert_eq!(mandel_sum, 14791); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this would also be unfair towards the other benchmarks. We'll keep track of issue 10, but this one should be reverted as well.
old:
new:
I suggest you just try getting these changes run on the official benchmark machine. Who knows, maybe it will make a difference there. |
Please consider running the program multiple times and showing the shortest times of each task. The |
Doing some housekeeping on my open PRs. No idea how I lost track of this one, but I agree with the feedback. Nothing to see here. |
Motivation:
There's no reason for the Rust version of
fib
to be any slower than the C one. The only difference I see is that there's ablack_box
in the Rust one, so let's try to get rid of it.I modified the implementation of
measure_best
to put the benchmark results behind a memory allocation which should prevent them from being optimized away. It's janky and criterion would probably be better but it works. This should bring the Rustfib
in line with the C one; I'd love to test personally but I don't currently have the time to figure out the system that the Julia repository is using to run them (trying to load libraries from inside the source tree and all that).This change would cause a regression in
parse_int
, but switching the assert inside its hot loop to adebug_assert
prevents the regression though I'm not entirely certain why.