Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting #5033

SaFE-APIOpt · 2025-04-18T06:56:39Z

mdanalysis/package/MDAnalysis/lib/util.py

Line 1862 in e64755c

return np.split(arr, np.where(np.ediff1d(arr) - 1 > 0)[0] + 1)

Hi, I’d like to suggest a performance improvement in the following line:
return np.split(arr, np.where(np.ediff1d(arr) - 1 > 0)[0] + 1)
This can be rewritten more efficiently as:

diff = arr[1:] - arr[:-1]
return np.split(arr, np.where(diff > 1)[0] + 1)

Although np.ediff1d is designed to compute discrete differences between adjacent elements, it introduces unnecessary overhead by internally creating a new array and performing extra type and shape checks. In contrast, using NumPy slicing with arr[1:] - arr[:-1] achieves the exact same result with lower overhead. This avoids function call dispatch and temporary memory allocation, resulting in improved performance—especially when working with large arrays.

Since this difference array is only used for locating split indices, there’s no benefit from using np.ediff1d over simple slicing. The replacement not only boosts efficiency but also improves code clarity and aligns better with NumPy’s idiomatic practices.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting #5033

Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting #5033

SaFE-APIOpt commented Apr 18, 2025

Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting #5033

Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting #5033

Comments

SaFE-APIOpt commented Apr 18, 2025