Add Rust microbenchmark #1

Enet4 · 2018-04-17T17:07:47Z

This PR adds a Rust implementation of the benchmark. This contribution was originally in JuliaLang/julia#22433, but the merge was postponed until the microbenchmarks were migrated here. It should be ready to go now. 👍

The necessary preparation steps for the benchmark machine:

Install Rustup
Add the toolchain mentioned in the rust-toolchain file: rustup toolchain add nightly-2018-04-16

Then: make benchmark/rust.csv

stevengj · 2018-04-17T21:39:43Z

rust/src/main.rs

+                let k = k as f64;
+                1. / (k * k)
+            })
+            .sum();


This code doesn't seem reasonable to me. Wouldn't it be faster and simpler to just write a loop to do the summation in Rust (since it is a compiled statically typed language, ordinary loops should be fast), rather than allocating a temporary array of summands and then calling .sum()?

There is no allocation there: the abstractions in Rust have zero cost. This is the idiomatic way to write this code in Rust.

The Iterator API is a very idiomatic part of Rust, and very often doesn't incur any performance penalty. Usually, it is not worth rewriting this into plain loops (even index range loops such as for i in 0..10 to traverse collections are unadvised). Also note that iterator adaptors such as map and flat_map are lazy, and do not allocate any extra memory.

Thanks for the clarification; I didn't realize that map was lazy in Rust.

stevengj · 2018-04-17T21:41:19Z

rust/src/main.rs

+        })
+        .take(80)
+        .take_while(|z| z.norm() <= 2.0)
+        .count() as u32


Is this kind of iterator-composition code really the fastest way to do it in Rust? Again, wouldn't you just write a loop?

I can't speak to this code in particular, or whether the style is ideal for performance, but I have noticed that this general style is very common in Rust.

Functional-style iteration is the idiomatic way to do so. In some case, it is even faster than a procedural loop because there are no bound checks.

As already mentioned by @Boiethios, this is pretty idiomatic and performant. Previously, I had a version of this function without an extensive use of the Iterator API (here), and the subsequent changes to what we have now actually improved performance a tiny bit.

stevengj · 2018-04-17T21:44:10Z

rust/src/main.rs

+}
+
+fn fib(n: i32) -> i32 {
+    let n = black_box(n); // prevent over-optimization


What's going on here?

That is to prevent the compiler from doing invariant code motion: it would recognize that the function had already been called with the same arguments in a previous step, and choose to memoize the output, which would distort the whole benchmark. black_box hints LLVM to ignore about the "nature" of the value, thus forcing it to always calculate fib.

#[inline(never)] might work as well.

@vks I'm afraid not, just tried that too. What makes the compiler perform invariant code motion is the perception that the pure function is being called with the exact same argument value across the multiple iterations, and so the compiler can just move it out of the loop. This is orthogonal to whether the function is inlined.

Volatile argument reads would force fibarg to be read inside the loop without making additional assumptions of its value, thus preventing invariant code motion.

stevengj · 2018-04-17T21:45:14Z

rust/src/main.rs

+        let a = nrand((n, n), &mut rng);
+        let b = nrand((n, n), &mut rng);
+        let c = nrand((n, n), &mut rng);
+        let d = nrand((n, n), &mut rng);


Wouldn't performance-conscious code allocate the arrays once and then call fill_rand in the loop?

Granted, this function can be slightly boosted. Moving the array allocations to the beginning improves the benchmark score (roughly from 6.15 to 5.85 in raw time measurements). If I recall correctly, I did this because the Julia implementation also declares these arrays inside the loop.

Okay.

This is kind of a silly benchmark anyway, because it mostly measures the speed of the random-number generation and matmul libraries.

Indeed! The same can be said about parse_integers, which measures the RNG and number printing alongside the actual number parsing.

Enet4 · 2018-04-19T13:22:24Z

A small but important update: the mandel function was modified to compare the complex value's squared norm with 4, rather than the value's norm with 2. This seems to be a fair optimization, since the other implementations are doing the same thing. The performance of this particular task is now on par with the Julia implementation.

johnfgibson · 2018-05-15T03:16:57Z

The necessary preparation steps for the benchmark machine:

Install Rustup

Add the toolchain mentioned in the rust-toolchain file: rustup toolchain add nightly-2018-04-16

How about rustup toolchain add stable instead? That would be more consistent with the Julia versioning. I've been running the benchmarks and posting the data only when Julia updates its stable version: 0.6.0, 0.6.1, 0.6.2, etc.

Enet4 · 2018-05-15T09:31:57Z

How about rustup toolchain add stable instead?

As previously mentioned here, the project relies on the test crate for the black_box function, which is currently only available in a nightly Rust compiler. Stabilization of this function is unlikely to happen soon, it is an ongoing concern without a trivial solution.

The only way to make this work with a stable toolchain is to replace black_box with something that also prevents undesirable optimizations (namely the invariant code motion seen in fib and a few others). From the experiments I made some months ago, this seemed to be possible with volatile reads and writes. I can look into this again soon.

I've been running the benchmarks and posting the data only when Julia updates its stable version: 0.6.0, 0.6.1, 0.6.2, etc.

Would the other microbenchmarks also be updated in the process? Rust has a regular 6-week release cycle, which means that pointing this project to the stable toolchain will also lead to iteratively updated versions of the compiler.

StefanKarpinski · 2018-05-15T11:28:49Z

I think we should go ahead and merge this soon without needing to debate all the fine details of how we get and update the rust compiler, otherwise I fear this may languish and never get merged. Of course, we can open a separate issue about how the rust compiler should be acquired and run.

johnfgibson · 2018-05-15T12:27:42Z

Sounds good to me. This and #2 look good for merging. I'll use the Rust nightly for now and aim to include Rust data in the julialang.org table and plot when julia-0.7.0 drops.

StefanKarpinski · 2018-05-15T12:35:53Z

@johnfgibson, I've given you write access to this repo, so feel free to merge PRs as they seem ready. Let me know if there are any permission issues and we'll get them sorted out.

johnfgibson · 2018-05-21T19:07:10Z

The Rust code runs fine and produces sensible results. I don't understand Rust well enough to vouch for the code's correctness & consistency with other languages. I see @stevengj reviewed to some extent above. I'll plan to merge in a day or two if I don't hear any objections.

Rust results, with julia-0.7.0-DEV.5096 (7 days old master).

cputime	lang
1.000	C
1.080	LuaJIT
1.089	Rust
1.186	Julia
1.450	Go
1.592	Fortran
2.929	Java
4.133	JavaScript
9.964	Matlab
14.072	Mathematica
16.794	Python
69.365	R
515.363	Octave

Enet4 · 2018-05-21T19:25:53Z

I haven't run the benchmarks with every language available, but they seem to be roughly in line with my local run:

perf	lang
1.000	C
1.060	Rust
1.180	Julia

Feel free to ask about any other details of the code. I noticed that the readme also contains a list of languages, so maybe I could update that as well? The PR might also be updated in time for the decisions made in #10.

waldyrious · 2018-05-22T11:02:36Z

@johnfgibson, just to be sure: will the issue with the Mathematica label, seen in your comment above, be present in the final rendered image?

johnfgibson · 2018-05-22T11:20:46Z

@waldyrious : No, that Mathematica label problem arose in an inkscape SVG -> PNG conversion needed to post the image to this thread. The SVG that'll appear on julialang.org is just fine.

eira-fransham · 2018-05-22T15:24:10Z

Is there a reason that you use vectors instead of arrays on the stack? All the Vecs in directblas::randmatstat could be turned into fixed-size arrays.

StefanKarpinski · 2018-05-22T15:34:57Z

@Vurich, if I'm understanding (it's quite possible that I'm not since I'm not a Rust programmer), that would be comparable to using StaticArrays in Julia. That's a legitimate optimization but in Julia it goes beyond the basics. In Rust, how advanced of a change would that be?

eira-fransham · 2018-05-22T15:36:34Z

@StefanKarpinski I would actually consider using Vec to be an antipattern when you know the length of an array is constant. Using fixed-size arrays expresses intent much better. I think it would just be a case of turning let n = 5 into const N: usize = 5 and let foo = vec![val; something] into let foo = [val; something]

Enet4 · 2018-05-22T15:46:19Z

I went for dynamic allocations because the C version is doing the same thing. At first glance, the Julia benchmark isn't using StaticArrays either, right? I agree that the avoidance of heap allocations is a plus, but I would have to ensure that this is a fair optimization.

In practice, changing this should be trivial: fixed size arrays deref to a slice just the same, and ndarray views can be created from arbitrary slices. However, since Rust doesn't have const generics yet, I would have to either hardcode the outputs of nrand to the intended dimensions or just construct the arrays inline.

johnfgibson · 2018-05-22T15:48:09Z

Hmm, I'd assumed the main point of randmatstat was to test the speed of tr((P'*P)^4) on large matrices. But it's n x n for n=5. Fixed-size arrays seem natural for that. Even so, the microbenchmarks aim to test identical algorithms in many languages. I'm inclined to think doing randmatstat with static arrays where possible (C, Fortran, Julia, Rust...) and heap arrays where not (Python, Matlab, Java, ...?) would not be in the spirit of identical algorithms.

Unless the benchmark code is the algorithm and the array implementation is just infrastructure...

StefanKarpinski · 2018-05-22T15:50:06Z

The point of the way these were written is that in library code you don't generally know that you're allocating a specific array size: you have to make your library generic and code across different sizes and do dynamic allocations. Sure, in this specific case you can do static allocations, but you can also just replace the entire benchmark computation with a constant, so drawing some lines about what is or is not "allowed" is in the nature of benchmarking. The line is currently drawn at "everyone does dynamic memory allocation of non-fixed-sized arrays". To a significant extent, this is to give the Pythons and Rs in the comparison an easier time since that's all they can do and they're already getting clobbered—it seems unsporting not to at least give them a fighting chance.

Enet4 · 2018-05-22T16:33:53Z

🎉

Just to let you know that I experimented with turning heap allocations into static arrays in randmatstat (branch). This resulted in a minor speed-up of around 1.02 in my desktop (in raw values, from 6.10 to 5.97). Still, I agree that keeping the heap allocations is a fair decision to make.

StefanKarpinski · 2018-05-22T17:45:52Z

Good to hear—I kind of figured that would be the case. Dynamic allocation with a decently implemented allocator shouldn't be a bottleneck here. If we switched Julia to use StaticArrays, on the other hand, we might see a significant speedup since the package provides specialized fully unrolled matmul implementations for fixed matrix sizes. The speedup would be from code specialization, however, rather than avoiding dynamic memory allocation.

Enet4 · 2018-05-22T17:55:14Z

If we switched Julia to use StaticArrays, on the other hand, we might see a significant speedup since the package provides specialized fully unrolled matmul implementations for fixed matrix sizes. The speedup would be from code specialization, however, rather than avoiding dynamic memory allocation.

In a way, that is what the Rust community intends to achieve with the const generics proposal. :) The folks working on embedded systems are particularly concerned about this, since they usually cannot rely on heap allocations.

This was referenced Apr 17, 2018

add to Julia benchmarks collection? Enet4/julia-bench-rs#1

Closed

Adapt the Julia microbench to dedicated repository #2

Merged

Add Rust microbenchmark

f5167da

Enet4 force-pushed the rust_microbenchmark branch from b211a62 to f5167da Compare April 17, 2018 18:50

stevengj reviewed Apr 17, 2018

View reviewed changes

[Rust] Fix performance regression in mandelbrot

ffd7051

johnfgibson merged commit ffd7051 into JuliaLang:master May 22, 2018

Enet4 deleted the rust_microbenchmark branch May 22, 2018 16:33

Add Rust microbenchmark #1

Add Rust microbenchmark #1

Conversation

Enet4 commented Apr 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enet4 Apr 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

real-felix Apr 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enet4 May 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enet4 Apr 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enet4 commented Apr 19, 2018

johnfgibson commented May 15, 2018

Enet4 commented May 15, 2018

StefanKarpinski commented May 15, 2018 • edited Loading

johnfgibson commented May 15, 2018

StefanKarpinski commented May 15, 2018

johnfgibson commented May 21, 2018

Enet4 commented May 21, 2018 • edited Loading

waldyrious commented May 22, 2018

johnfgibson commented May 22, 2018

eira-fransham commented May 22, 2018

StefanKarpinski commented May 22, 2018

eira-fransham commented May 22, 2018 • edited Loading

Enet4 commented May 22, 2018

johnfgibson commented May 22, 2018 • edited Loading

StefanKarpinski commented May 22, 2018 • edited Loading

Enet4 commented May 22, 2018

StefanKarpinski commented May 22, 2018 • edited Loading

Enet4 commented May 22, 2018

Enet4 Apr 18, 2018 •

edited

Loading

real-felix Apr 18, 2018 •

edited

Loading

Enet4 May 22, 2018 •

edited

Loading

Enet4 Apr 18, 2018 •

edited

Loading

StefanKarpinski commented May 15, 2018 •

edited

Loading

Enet4 commented May 21, 2018 •

edited

Loading

eira-fransham commented May 22, 2018 •

edited

Loading

johnfgibson commented May 22, 2018 •

edited

Loading

StefanKarpinski commented May 22, 2018 •

edited

Loading

StefanKarpinski commented May 22, 2018 •

edited

Loading