-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Performance] Solve high memory usage issue during model compilation using OpenVINO backend on Keras 3 #31482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 5 commits
86d7685
126b77d
4628d6f
569ea46
f1fef2e
f5dd8f1
cebca9e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1291,7 +1291,7 @@ void fix_inputs_with_0d_ellipsis(ov::OutputVector& input_nodes, | |
/// 8. Transpose dimensions to match the layout required by the output subscript. | ||
/// 9. Replace the original Einsum node with the last node from the decomposed sub-graph, | ||
/// preserving the original node's name and runtime information. | ||
ov::pass::EinsumDecomposition::EinsumDecomposition() { | ||
ov::pass::EinsumDecomposition::EinsumDecomposition(bool check_const) : m_check_const(check_const) { | ||
MATCHER_SCOPE(EinsumDecomposition); | ||
auto einsum = ov::pass::pattern::wrap_type<ov::op::v7::Einsum>(); | ||
matcher_pass_callback callback = [=](ov::pass::pattern::Matcher& m) { | ||
|
@@ -1300,6 +1300,20 @@ ov::pass::EinsumDecomposition::EinsumDecomposition() { | |
return false; | ||
} | ||
|
||
if (m_check_const) { | ||
bool has_const = false; | ||
for (auto& input : einsum_node->input_values()) { | ||
auto node_ptr = input.get_node_shared_ptr(); | ||
auto constant_ptr = ov::as_type_ptr<ov::op::v0::Constant>(node_ptr); | ||
if (constant_ptr) { | ||
has_const = true; | ||
break; | ||
} | ||
} | ||
if (!has_const) | ||
return false; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you provide more detains about the einsum operation you want to optimize? Maybe link to a code of the model or a picture of subgraph There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This optimization targets specific Einsum operations in transformer models like GPT-2, where at least one input is a constant tensor. After ConstantFolding, weight matrices become constants enabling more efficient decomposition patterns. Specific Einsum Operations Being Optimized:1. Query-Key Attention Scores Computation:
2. Attention-Value Combination:
3. Weight Matrix Projections (Q/K/V Transformations):
Optimization Application:Note: The optimization is only applied when at least one einsum input is constant. In the examples above: ✅ Weight Matrix Projections (example 3): ❌ Attention Scores (examples 1&2): Both For more details and examples visit: |
||
|
||
// Parse the Einsum equation to get input and output subscripts | ||
auto equation = einsum_node->get_equation(); | ||
std::vector<std::string> input_subscripts; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a better comment before which transformation it should be called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!