Skip to content

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Aug 26, 2025

No description provided.

@kshyatt kshyatt added the cuda libraries Stuff about CUDA library wrappers. label Aug 26, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 371ada7 Previous: 8ff92f9 Ratio
latency/precompile 42920971255.5 ns 43336576934.5 ns 0.99
latency/ttfp 7042028159 ns 6996576586 ns 1.01
latency/import 3562776071 ns 3575630035 ns 1.00
integration/volumerhs 9614991 ns 9614254.5 ns 1.00
integration/byval/slices=1 147276.5 ns 147031 ns 1.00
integration/byval/slices=3 426131 ns 426033 ns 1.00
integration/byval/reference 145072 ns 145044 ns 1.00
integration/byval/slices=2 286642 ns 286613 ns 1.00
integration/cudadevrt 103573 ns 103601 ns 1.00
kernel/indexing 14285 ns 14203 ns 1.01
kernel/indexing_checked 14987 ns 15046 ns 1.00
kernel/occupancy 670.6518987341772 ns 669.4810126582279 ns 1.00
kernel/launch 2240.4444444444443 ns 2096.1 ns 1.07
kernel/rand 15926 ns 16215 ns 0.98
array/reverse/1d 20119 ns 20044.5 ns 1.00
array/reverse/2d 23971 ns 24931 ns 0.96
array/reverse/1d_inplace 10748 ns 10596 ns 1.01
array/reverse/2d_inplace 13219 ns 12129 ns 1.09
array/copy 21018 ns 21283 ns 0.99
array/iteration/findall/int 159529 ns 158352 ns 1.01
array/iteration/findall/bool 140948.5 ns 140146 ns 1.01
array/iteration/findfirst/int 162156 ns 157269.5 ns 1.03
array/iteration/findfirst/bool 163629 ns 158043 ns 1.04
array/iteration/scalar 73000 ns 74613 ns 0.98
array/iteration/logical 217359 ns 216352 ns 1.00
array/iteration/findmin/1d 47446 ns 46817 ns 1.01
array/iteration/findmin/2d 97072.5 ns 96972.5 ns 1.00
array/reductions/reduce/Int64/1d 43736.5 ns 44160 ns 0.99
array/reductions/reduce/Int64/dims=1 46854.5 ns 51107 ns 0.92
array/reductions/reduce/Int64/dims=2 63051.5 ns 61712 ns 1.02
array/reductions/reduce/Int64/dims=1L 89466 ns 88969.5 ns 1.01
array/reductions/reduce/Int64/dims=2L 88245 ns 88740 ns 0.99
array/reductions/reduce/Float32/1d 35398 ns 35533 ns 1.00
array/reductions/reduce/Float32/dims=1 42076.5 ns 48101 ns 0.87
array/reductions/reduce/Float32/dims=2 59882 ns 60093 ns 1.00
array/reductions/reduce/Float32/dims=1L 52411.5 ns 52663 ns 1.00
array/reductions/reduce/Float32/dims=2L 70265 ns 70814 ns 0.99
array/reductions/mapreduce/Int64/1d 43767 ns 44462 ns 0.98
array/reductions/mapreduce/Int64/dims=1 48401.5 ns 54310.5 ns 0.89
array/reductions/mapreduce/Int64/dims=2 62309.5 ns 62878.5 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 89275 ns 89203 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 87931.5 ns 87190 ns 1.01
array/reductions/mapreduce/Float32/1d 35358 ns 35099 ns 1.01
array/reductions/mapreduce/Float32/dims=1 52256 ns 41869.5 ns 1.25
array/reductions/mapreduce/Float32/dims=2 59986 ns 60074 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52752 ns 52723 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70581 ns 70625.5 ns 1.00
array/broadcast 20269 ns 20313 ns 1.00
array/copyto!/gpu_to_gpu 11318 ns 13052 ns 0.87
array/copyto!/cpu_to_gpu 218470 ns 216249 ns 1.01
array/copyto!/gpu_to_cpu 285905 ns 283227 ns 1.01
array/accumulate/Int64/1d 125498 ns 125402 ns 1.00
array/accumulate/Int64/dims=1 84337 ns 84149 ns 1.00
array/accumulate/Int64/dims=2 158645 ns 158389 ns 1.00
array/accumulate/Int64/dims=1L 1711480.5 ns 1712930 ns 1.00
array/accumulate/Int64/dims=2L 966860 ns 968016 ns 1.00
array/accumulate/Float32/1d 109988 ns 109280.5 ns 1.01
array/accumulate/Float32/dims=1 81017 ns 80759 ns 1.00
array/accumulate/Float32/dims=2 148277 ns 148278.5 ns 1.00
array/accumulate/Float32/dims=1L 1619031 ns 1619105.5 ns 1.00
array/accumulate/Float32/dims=2L 699136.5 ns 698810 ns 1.00
array/construct 1278.7 ns 1273.7 ns 1.00
array/random/randn/Float32 44951 ns 48132.5 ns 0.93
array/random/randn!/Float32 25311 ns 25307 ns 1.00
array/random/rand!/Int64 27435 ns 27592 ns 0.99
array/random/rand!/Float32 8772.666666666666 ns 8757 ns 1.00
array/random/rand/Int64 30101 ns 38376 ns 0.78
array/random/rand/Float32 13094 ns 13207 ns 0.99
array/permutedims/4d 60695.5 ns 60719.5 ns 1.00
array/permutedims/2d 54490 ns 54666 ns 1.00
array/permutedims/3d 55599.5 ns 55885 ns 0.99
array/sorting/1d 2758204 ns 2758224 ns 1.00
array/sorting/by 3344878 ns 3344978 ns 1.00
array/sorting/2d 1081602 ns 1081293.5 ns 1.00
cuda/synchronization/stream/auto 1026.6 ns 1048.4 ns 0.98
cuda/synchronization/stream/nonblocking 6826.700000000001 ns 7536.8 ns 0.91
cuda/synchronization/stream/blocking 808.8421052631579 ns 808.7849462365591 ns 1.00
cuda/synchronization/context/auto 1170.5 ns 1201.8 ns 0.97
cuda/synchronization/context/nonblocking 6898.9 ns 8845.7 ns 0.78
cuda/synchronization/context/blocking 908.9245283018868 ns 906.8139534883721 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

codecov bot commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.64%. Comparing base (8ff92f9) to head (371ada7).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2855      +/-   ##
==========================================
+ Coverage   89.56%   89.64%   +0.08%     
==========================================
  Files         150      150              
  Lines       13250    13250              
==========================================
+ Hits        11867    11878      +11     
+ Misses       1383     1372      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt kshyatt mentioned this pull request Aug 29, 2025
@kshyatt kshyatt enabled auto-merge (squash) August 29, 2025 11:52
@RomeoV
Copy link
Contributor

RomeoV commented Aug 30, 2025

I think this needs a rebase? Or manual merge, seems like auto-merge won't trigger because the branch is out of date...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants