-
Notifications
You must be signed in to change notification settings - Fork 448
Support for multi graph build #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dimdano
wants to merge
49
commits into
fastmachinelearning:main
Choose a base branch
from
dimdano:make_multi_graph
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,379
−42
Open
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
ee3b51d
test commit
dimdano cbeee24
split ModelGraph at specified layer name
dimdano 03111c9
feat: add make_multi_graph classmethod to ModelGraph
dimdano f4a77bb
make_multi_graph can now support arbitrary number of graphs
dimdano 851e835
Pass output_shapes to make_multi_graph to detect input shapes of spli…
dimdano 0e0cf11
fixed layer index in the newly created graph
dimdano 323236b
fix minor mistakes
dimdano f759a3e
Add TCL script for automatic connection of subgraph IPs in Vivado
dimdano a5f8277
some minor fixes in tcl script and make_multi_graph
dimdano 07d23ae
support for parallel subgraph builds. Also, make_multi_graph now retu…
dimdano 5dc4ac6
new tcl script
dimdano 202991d
connected external and control signals
dimdano dc60722
integrate ip_stitcher tcl script in hls4ml
dimdano bba704b
fix in tcl. folder creation for stitch project
dimdano da3efb0
package final stitched ip in hls4ml
dimdano 0f40e2a
support for multiple inputs/outputs in first/last layer of stitched ip
dimdano d24c42b
initial support for stitched ip simulation
dimdano 6e8f462
generate verilog testbench for stitched ip
dimdano 27c76b3
read testbench output
dimdano 704a874
minor changes
dimdano 9d69355
improvements in testbench generation and build interface
dimdano d1dd0fd
general improvements
dimdano 0bb10df
only simulate stitched_design, better verilog testbench
dimdano f1e2e57
prepare testbench input from user
dimdano 55db302
support for user-defined input in verilog testbench of stitched IP
dimdano 0af75e7
fix for multi input/output layers in graph splitting
dimdano db95628
documentation for MultiModelGraph flow
dimdano 738d489
faster rtl simulation
dimdano 7829e41
unwrap list if it has single element
dimdano f9fd4c0
Make MultiModelGraph adaptable to user-defined names
dimdano 05ea6c9
stitch script time verbose
dimdano 193381d
fix with existing stitch project folder
dimdano 04ac0f4
initial support for multigraph compilation in bridge file
dimdano 10e95a8
stitched report fix for VivadoSynth aggregate
dimdano 8c5a13b
use log_to_stdout flag for parallel builds
dimdano 4a7e6c3
small change
dimdano d6c19d5
remove bridged multigraph compilation for now
dimdano 0225845
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] 89f5eb3
fix 'ap_rst' port polarity for active high case
dimdano e21cb53
support for partition interface in verilog testbench
dimdano e070ea1
support for MultiModelGraph predict using chained bridge file
dimdano 7fbf439
Add pytest for multi-graph and fix minor issues
dimdano ba86132
pre-commit fixes
dimdano 773c411
removed pandas dependency in read_testbench_log
dimdano b91f97a
Ensure stitched RTL simulation results align with CSim output
dimdano 3dcd0d5
parallel subgraph compilation
dimdano fa3e679
added additional checks in ip_stitcher
dimdano 05d22d3
small improvements on MultiModelGraph
dimdano 3a74eea
correct AXIS port slicing for Verilog simulation
dimdano File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
======================= | ||
MultiModelGraph Class | ||
======================= | ||
|
||
This page documents the ``MultiModelGraph`` class, which enables handling multiple subgraphs (each represented as a ``ModelGraph``) derived from a single original model. | ||
The central concept here is the division of a larger model into multiple smaller subgraphs at given layers which can be useful for: | ||
|
||
* Very large models | ||
* Step-wise optimization | ||
* Modular design flows | ||
|
||
A ``MultiModelGraph`` manages these subgraphs, facilitating: | ||
|
||
* Parallel building and synthesis | ||
* Stitched designs (merging the subgraphs in HW after synthesis) | ||
* Simulation and performance estimation of the stitched design | ||
|
||
-------------- | ||
Keras Example | ||
-------------- | ||
|
||
For example, when converting a Keras model, you can specify the layers at which to split the model directly: | ||
|
||
.. code-block:: python | ||
|
||
config = hls4ml.utils.config_from_keras_model(model, granularity='model') | ||
|
||
hls_model = hls4ml.converters.convert_from_keras_model( | ||
model, | ||
hls_config=config, | ||
backend='vitis', | ||
split_layer_names = ['layer3', 'layer7'] | ||
) | ||
|
||
Here, the ``hls_model`` is actually a ``MultiModelGraph`` containing three subgraphs. Each subgraph is a ``ModelGraph`` accessible via indexing: ``hls_model[i]``. | ||
|
||
|
||
---------------------------------- | ||
Key Methods for MultiModelGraph | ||
---------------------------------- | ||
|
||
* :ref:`compile <mmg-compile-method>` | ||
* :ref:`predict <mmg-predict-method>` | ||
* :ref:`build <mmg-build-method>` | ||
* :ref:`trace <mmg-trace-method>` | ||
* :ref:`make_multi_graph <make_multi_graph-method>` | ||
|
||
---- | ||
|
||
.. _make_multi_graph-method: | ||
|
||
``make_multi_graph`` method | ||
=========================== | ||
|
||
The ``make_multi_graph`` method of ``ModelGraph`` takes a configuration, a full list of layers, the output shapes, and a list of split layers. It returns a ``MultiModelGraph`` that contains multiple ``ModelGraph`` instances. | ||
|
||
.. code-block:: python | ||
|
||
from my_hls4ml_lib.modelgraph import ModelGraph | ||
multi_graph = ModelGraph.make_multi_graph(config, layer_list, output_shapes, split_layer_names=['fc2', 'fc3']) | ||
|
||
This allows modular design flows and easier debugging of large models. | ||
|
||
---- | ||
|
||
.. _mmg-compile-method: | ||
|
||
``compile`` method | ||
================== | ||
|
||
Compiles all the individual ``ModelGraph`` subgraphs within the ``MultiModelGraph``. Also, compiles a chained bridge file with all the subgraphs linked together that can be used for the predict function. | ||
|
||
.. code-block:: python | ||
|
||
multi_graph.compile() | ||
|
||
---- | ||
|
||
.. _mmg-build-method: | ||
|
||
``build`` method | ||
================ | ||
|
||
Builds all subgraphs in parallel, each as if they were standalone ``ModelGraph`` projects. Returns reports for each subgraph. If configured, it then runs the stitching flow in Vivado, connecting the individual exported IPs and allowing you to simulate the stitched design at the RTL level. | ||
|
||
.. code-block:: python | ||
|
||
report = multi_graph.build(export=True, stitch_design=True) | ||
|
||
The returned ``report`` contains data from each subgraph's build and, if stitching was performed, a combined report of the stitched design. | ||
|
||
|
||
---- | ||
|
||
.. _mmg-predict-method: | ||
|
||
``predict`` method | ||
================== | ||
|
||
Performs a forward pass through the chained bridge file using the C-simulation (``sim='csim'``). Data is automatically passed from one subgraph's output to the next subgraph's input. For large stitched designs, you can also leverage RTL simulation (``sim='rtl'``) to perform the forward pass at the register-transfer level. In this case, a Verilog testbench is dynamically generated and executed against the stitched IP design, providing behavioral simulation to accurately verify latency and output at the hardware level. Note that the input data for the RTL simulation must have a single batch dimension. | ||
|
||
.. code-block:: python | ||
|
||
# Perform prediction using C-simulation (default) | ||
y_csim = hls_model.predict(X, sim='csim') | ||
|
||
# Perform prediction using RTL simulation (behavioral) | ||
y_rtl = hls_model.predict(X, sim='rtl') | ||
|
||
|
||
.. _mmg-trace-method: | ||
|
||
``trace`` method [TODO] | ||
================ | ||
|
||
Provides detailed layer-by-layer outputs across all sub-models, which is essential for debugging or tuning quantization and precision settings. | ||
|
||
.. code-block:: python | ||
|
||
final_output, trace_outputs = hls_model.trace(X) | ||
|
||
``trace_outputs`` includes intermediate results from each subgraph, enabling insights into the data flow. | ||
|
||
-------------------------- | ||
Summary | ||
-------------------------- | ||
|
||
The ``MultiModelGraph`` class is a tool for modular hardware design. By splitting a large neural network into multiple subgraphs, building each independently, and then stitching them together, you gain flexibility, parallelism, and facilitate hierarchical design, incremental optimization, and integrated system-level simulations. | ||
|
||
-------------------------- | ||
Other Notes | ||
-------------------------- | ||
|
||
* Branch Splitting Limitation: Splitting in the middle of a branched architecture (e.g., ResNet skip connections or multi-path networks) is currently unsupported. Also, each split subgraph must have a single input and a single output. | ||
* Handling Multiple NN Inputs & Outputs: The final NN output can support multiple output layers. However, for networks with multiple input layers, proper synchronization is required to drive inputs—especially for stream interfaces. A fork-join mechanism in the Verilog testbench can help manage input synchronization effectively. | ||
* RTL Simulation Issue: RTL simulation of stitched IPs with io_type='io_parallel' and a split at the flatten layer leads to improper simulation behavior and should be avoided. | ||
* Array Partitioning for Parallel I/O: For io_parallel interfaces, all IPs must use the 'partition' pragma instead of 'reshape'. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -214,7 +214,10 @@ def convert_from_keras_model( | |
|
||
_check_hls_config(config, hls_config) | ||
|
||
return keras_to_hls(config) | ||
# Retrieve 'split_layer_names' from kwargs, if provided, for multi-graph creation | ||
split_layer_names = kwargs.get('split_layer_names', []) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For clarity, I would suggest |
||
|
||
return keras_to_hls(config, split_layer_names=split_layer_names) | ||
|
||
|
||
@requires('_torch') | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest exporting this into the
build_prj.tcl
and invoke it from these, as having hls4ml creating the model and put them on another machine for HLS/logic could be a common workflow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
stitch_command
is relatively fast and only runs after all the individual subgraph builds are complete. However, since hls4ml manages these builds in parallel using a Python thread pool, supporting this workflow on a remote server would require a Python script that mimics this behavior, so essentially looping over each subgraph directory and running its correspondingbuild_prj.tcl
in parallel using threads or processes. It's not hard to set up and I will do it once we finalize the flow.