Skip to content

Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting #5033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SaFE-APIOpt opened this issue Apr 18, 2025 · 0 comments

Comments

@SaFE-APIOpt
Copy link

return np.split(arr, np.where(np.ediff1d(arr) - 1 > 0)[0] + 1)

Hi, I’d like to suggest a performance improvement in the following line:
return np.split(arr, np.where(np.ediff1d(arr) - 1 > 0)[0] + 1)
This can be rewritten more efficiently as:

diff = arr[1:] - arr[:-1]
return np.split(arr, np.where(diff > 1)[0] + 1)

Although np.ediff1d is designed to compute discrete differences between adjacent elements, it introduces unnecessary overhead by internally creating a new array and performing extra type and shape checks. In contrast, using NumPy slicing with arr[1:] - arr[:-1] achieves the exact same result with lower overhead. This avoids function call dispatch and temporary memory allocation, resulting in improved performance—especially when working with large arrays.

Since this difference array is only used for locating split indices, there’s no benefit from using np.ediff1d over simple slicing. The replacement not only boosts efficiency but also improves code clarity and aligns better with NumPy’s idiomatic practices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant