-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Performance impact of disabling MKLDNN in Tensorflow #47991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
assign core, ml |
New categories assigned: core,ml @Dr15Jones,@makortel,@smuzaffar,@valsdav,@y19y19 you have been requested to review this Pull request/Issue and eventually sign? Thanks |
cms-bot internal usage |
A new Issue was created by @makortel. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Workflow 29834.21 80 events single threaded CMSSW_15_1_X_2025-04-21-2300 CMSSW_15_1_MKLDNN0_X_2025-04-21-2300 RECO Step
PAT step
|
@gartung What CPU did the node have? |
AMD EPYC-Genoa Processor |
I'm a bit surprised the JIT for AVX512 ("With MKLDNN") does not result in bigger difference wrt. AVX2 binaries ("Without MKLDNN"). |
The Genoa cores have avx-512 instructions. |
On an Intel CPU with deep learning instructions OneDNN might be faster. |
The VNNI extension to avx-512 |
The inclusion of VNNI optimization only occurs in the OneDNN included in Tensorflow 2.17.0. |
Right, but only when JITted in "with MKLDNN" setup, right? The "without MLKDNN" should use only the x86-64-v3 instructions. |
If I recall correctly, yes. |
@makortel Since compiling the stack with frame pointers seems to resolve the segfaults in libunwind when profiling workflows with Tensorflow/OneDNN JITing, should this issue be closed? |
Yes, let's close |
After changing the profiling script to NOT configure the build with the Tensorflow with the MKLDNN0 IB's the profiling jobs are again segfaulting. I was mistaken in assuming that enabling frame pointers was enough to prevent the segfaults. |
This issue is a followup to @gartung's talk in Core Software meeting 2025-04-29 https://indico.cern.ch/event/1543134/#17-profiling-and-tensorflow to record the performance impact and discuss how to proceed.
The text was updated successfully, but these errors were encountered: