Support for multi graph build #1174

dimdano · 2025-01-24T15:40:41Z

📝 This feature enables the division of a larger model into multiple smaller subgraphs at given layers. With the new MultiModelGraph class we manage these subgraphs (each represented as a ModelGraph) enabling parallel building and synthesis, stitched designs (merging the subgraphs in HW after synthesis), simulation and performance estimation of the stitched design. Can be useful for:

Very large models
Step-wise optimization
Modular design flows

Type of change

New feature (non-breaking change which adds functionality)

Tests

Test Configuration:

A unit test was added in test/pytest/test_multi_graph.py

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

JanFSchulte · 2025-01-24T16:41:20Z

pre-commit.ci autofix

JanFSchulte · 2025-01-24T16:42:31Z

The changes to the keras frontend seem to be quite minimal. Would be great if we could support also the others (pytorch, QONNX), if it's not too much hassle.

- The method returns two instances of the `ModelGraph` class. - Each instance is initialized with the same config, only output folder changes, allowing separate models to be created in one call. - This improves usability by simplifying the process of generating multiple graphs from a single configuration input.

* takes as input the split_layer_names as split points * returns a list of ModelGraph * works for dense/fc layers at the moment * need to find input_shape of split layer for conv layers. Currenly in dense/fc we find it through 'n_in' key

…t layers

* Automatically scans and add HLS IP cores for subgraphs in Vivado * Automatically detects interface types used by the IPs (either unpacked or AXI stream) and configures the connections accordingly. * Also, updated the multigraph logic to copy the precision of the last layer from the previous graph and apply it to the input layer of the next graph.

…rns a MultiModelGraph instance

Notes: * missing X_INTERFACE_INFO for axi interfaces in the generated HDL during packaging * Vivado throws warning : Misformed interface info * We ommit this warning at the moment, as IP can still be packaged

calad0i

Individual issues are added in the code comments, most are minor. Have not yet checked how io_stream is handled, and have not yet examen hdl simulation behavior yet. Good job!

The major concern from my side is that optimizer sees only each subgraph. This would cause model-level optimizers and many of the graph manipulation operations to be broken.

I strongly suggest presenting the graph objects to the optimizers, like through a proxy generated with overloaded @property. Or, maybe consider constructing the MultiModelGraph from an already optimized ModelGraph.

With the changes in #1158 , ModelGraph instantiation will be decoupled from optimizer flows, and making subgraphs from an already optimzied graph could make it more competiable.

bugs?:

Split before a merge layer can cause accessing non-existent properties, likely related to incomplete graph issue. Seeing similar issues here and there.
If the non-last graph has a port to the output, the stitched graph will lose that interface.

inp = keras.Input(shape=(16,), name='inp')
o1 = keras.layers.Dense(8, activation='relu', name='dense1')(inp)
o2 = keras.layers.Dense(8, activation='relu', name='dense2')(o1)
o3 = keras.layers.Dense(8, activation='relu', name='dense3')(o2)
o4 = keras.layers.Dense(16, activation='relu', name='dense4')(o3)
model = keras.Model(inputs=inp, outputs=[o3, o4])

hls_conf = {
    'Model': {
        'Precision': 'fixed<32,16>',
        'ReuseFactor': 1,
    }
}
model_hls = convert_from_keras_model(model, output_dir='/tmp/tt', backend='Vitis', hls_config=hls_conf, split_layer_names=['dense4'])

calad0i · 2025-03-09T03:37:34Z

test/pytest/test_multi_graph.py

+    """
+    Tests the multi-graph splitting and stitching process.
+    - Verifies that predictions from the monolithic and multi-graph versions match with the CSimulation.
+    - When granularity='name', an additional HLS build and stitched RTL simulation step is performed.


What's special about granularity here? Is it just a proxy to switch on those in the test, or there are deeper reasons?

I left the option to switch tests in case the behavior between the two approaches changes in the future. Also with granularity, it means for example we have to correctly inherit precision from previous graph and the I/O ports of each IP in Vivado are properly stitched with equal-width ports.

calad0i · 2025-04-02T16:27:54Z

test/pytest/test_multi_graph.py

+        inp = np.expand_dims(X_input[0], axis=0)
+        sim_results = hls_model_multi.predict(inp, sim='rtl')
+        for sim_out, pred_out in zip(sim_results, list([pred_multi[0][0], pred_multi[1][0]])):
+            np.testing.assert_allclose(sim_out, pred_out, rtol=0, atol=0.3)


atol of 0.3 is large. I think as you mentioned that bit-exactness was fixed, can we switch to test for exactness?

calad0i · 2025-04-05T05:02:43Z

hls4ml/converters/keras_to_hls.py

+    print('Creating HLS model...')
+    merge_layers = ['add', 'subtract', 'multiply', 'average', 'maximum', 'minimum', 'concatenate', 'dot']
+    if split_layer_names:
+        if any(any(layer in name for layer in merge_layers) for name in split_layer_names):


Checking layer type by name may not always work, it's better to use the class_name field in the config generated.
For merge_layers one can check the numbers of i/o interfaces.

done. the check is handled by MultiModelGraph now using the class_name approach. Haven’t fully tested how these layers behave, thus I restrict them for now. Also, users should avoid selecting a split layer within a branch (or we could have a layer_in_branch check for these cases)

calad0i · 2025-04-05T05:11:59Z

hls4ml/model/graph.py

+                    if last_prec is not None
+                    else 'auto'
+                )
+                if last_output_precision == 'auto' or last_output_precision is None:


At its current stage I would expect a bunch of optimizers acting non-locally (with information from more than a single layer) will be broken with the current method. Also, I would suggest avoiding using classmethod to creating a class that is not subclass of cls.
I think this can be done by either

subclass ModelGraph and implement frequently used properties through proxies, such as replace_node, graph, and other stuffs

Convert ModelGraph to MultiModelGraph in a later stage, such as writer or post instantiation (e.g., subclass and overload the constructor)

calad0i · 2025-04-05T05:28:12Z

hls4ml/templates/vivado/ip_stitcher.tcl

+    if {[llength $ap_rst_ports] > 0} {
+        # Get the CONFIG.POLARITY property from one of the IP's 'ap_rst' pins
+        set sample_rst_pin [lindex $ap_rst_ports 0]
+        set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]


Suggest asserting all ports are consistent

Suggested change

set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]

set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]

foreach ap_rst_port $ap_rst_ports {

# All ports should have the same polarity

if {[get_property CONFIG.POLARITY $ap_rst_port] ne $rst_polarity} {

puts "Error: Inconsistent CONFIG.POLARITY for ap_rst ports. Aborting."

exit 1

}

}

calad0i · 2025-04-05T22:08:57Z

hls4ml/templates/vivado/build_lib_multigraph.sh

+# Compile all graphs
+OBJECT_FILES=()
+for g in "${graph_project_names[@]}"; do
+    SRC_FILE="${g}/firmware/${ORIGINAL_PROJECT}_${g}.cpp"


Parallel compiling can be used here. As you have compiled all subgraphs into shared libs, would be better if just link to them instead of compiling all subgraphs again.

Done, added parallelization in the bash script. The subgraphs are now compiled just once within the script.

calad0i · 2025-04-05T22:10:39Z

hls4ml/model/graph.py

+
+    def compile(self):
+        for g in self.graphs:
+            g.compile()


I would propose one of the two following:

Not compiling each subgraph, just dump the projects here. Compile and link everything only in the multigraph compilation step.

Compile each subgraph individually, and link them together for the multigraph prediction.

Parallel compiling should be enabled for individual subgraphs; project writing/compilation should be isolated.

Done, went with option 1, all subgraphs are compiled in parallel in build_lib_multigraph.sh.

calad0i · 2025-04-05T22:24:53Z

hls4ml/backends/vitis/vitis_backend.py

+            print('Verilog testbench and its input data were generated.')
+
+        print('Running build process of stitched IP...\n')
+        stitch_command = [


I would suggest exporting this into the build_prj.tcl and invoke it from these, as having hls4ml creating the model and put them on another machine for HLS/logic could be a common workflow.

The stitch_command is relatively fast and only runs after all the individual subgraph builds are complete. However, since hls4ml manages these builds in parallel using a Python thread pool, supporting this workflow on a remote server would require a Python script that mimics this behavior, so essentially looping over each subgraph directory and running its corresponding build_prj.tcl in parallel using threads or processes. It's not hard to set up and I will do it once we finalize the flow.

calad0i · 2025-04-05T22:27:03Z

hls4ml/writer/vivado_writer.py

@@ -450,6 +450,21 @@ def write_weights(self, model):
                    weights, model.config.get_output_dir(), namespace=namespace, write_txt_file=write_txt
                )

+    def write_multigraph_weights(self, model):


Can we avoid writing double copies of all weights? (e.g., read from the original paths in the stiched project, or block the writing of original weights for individual subgraphs, like set a is_subgraph property in each ModelGraph)?

Initially, my plan was to keep each subgraph self-contained, so that it could be compiled or reused independently without needing to be aware of the stitched context, and also avoid changes to the ModelGraph class. Also, since WEIGHTS_DIR is hardcoded at compile time via a preprocessor variable, it made it a bit tricky to dynamically resolve paths at runtime; some weights could not be found at runtime. The bridge file also relies on this variable. So, I ended up copying all weights into the stitched directory to keep things simple for now.

calad0i · 2025-04-05T22:34:18Z

hls4ml/model/graph.py

+        return self.graphs[index]
+
+    def parse_nn_config(self):
+        nn_config = {"inputs": [], "outputs": []}


The structure of the nn_config does not seem to support graphs with multiple i/o interfaces. While it is not yet supported now, is it something we want to support in the future?

dimdano force-pushed the make_multi_graph branch from 18a445e to c929f78 Compare February 14, 2025 16:02

dimdano force-pushed the make_multi_graph branch from 3c16abe to c6da873 Compare March 3, 2025 14:09

dimdano marked this pull request as ready for review March 3, 2025 16:01

dimdano added 25 commits March 4, 2025 15:42

test commit

ee3b51d

split ModelGraph at specified layer name

cbeee24

Pass output_shapes to make_multi_graph to detect input shapes of spli…

851e835

…t layers

fixed layer index in the newly created graph

0e0cf11

fix minor mistakes

323236b

some minor fixes in tcl script and make_multi_graph

a5f8277

support for parallel subgraph builds. Also, make_multi_graph now retu…

07d23ae

…rns a MultiModelGraph instance

new tcl script

5dc4ac6

connected external and control signals

202991d

integrate ip_stitcher tcl script in hls4ml

dc60722

fix in tcl. folder creation for stitch project

bba704b

package final stitched ip in hls4ml

da3efb0

Notes: * missing X_INTERFACE_INFO for axi interfaces in the generated HDL during packaging * Vivado throws warning : Misformed interface info * We ommit this warning at the moment, as IP can still be packaged

support for multiple inputs/outputs in first/last layer of stitched ip

0f40e2a

initial support for stitched ip simulation

d24c42b

generate verilog testbench for stitched ip

6e8f462

read testbench output

27c76b3

minor changes

704a874

improvements in testbench generation and build interface

9d69355

general improvements

d1dd0fd

only simulate stitched_design, better verilog testbench

0bb10df

prepare testbench input from user

f1e2e57

support for user-defined input in verilog testbench of stitched IP

55db302

dimdano and others added 17 commits March 4, 2025 15:50

fix for multi input/output layers in graph splitting

0af75e7

documentation for MultiModelGraph flow

db95628

faster rtl simulation

738d489

unwrap list if it has single element

7829e41

Make MultiModelGraph adaptable to user-defined names

f9fd4c0

stitch script time verbose

05ea6c9

fix with existing stitch project folder

193381d

initial support for multigraph compilation in bridge file

04ac0f4

stitched report fix for VivadoSynth aggregate

10e95a8

use log_to_stdout flag for parallel builds

8c5a13b

small change

4a7e6c3

remove bridged multigraph compilation for now

d6c19d5

[pre-commit.ci] auto fixes from pre-commit hooks

0225845

fix 'ap_rst' port polarity for active high case

89f5eb3

support for partition interface in verilog testbench

e21cb53

support for MultiModelGraph predict using chained bridge file

e070ea1

Add pytest for multi-graph and fix minor issues

7fbf439

dimdano force-pushed the make_multi_graph branch from 1b6067d to 7fbf439 Compare March 4, 2025 15:06

pre-commit fixes

ba86132

bo3z added this to the v1.1.0 milestone Mar 7, 2025

dimdano added 2 commits March 10, 2025 11:31

removed pandas dependency in read_testbench_log

773c411

Ensure stitched RTL simulation results align with CSim output

b91f97a

calad0i reviewed Apr 5, 2025

View reviewed changes

bo3z modified the milestones: v1.1.0, v1.2.0 Apr 8, 2025

dimdano added 4 commits April 16, 2025 10:47

parallel subgraph compilation

3dcd0d5

added additional checks in ip_stitcher

fa3e679

small improvements on MultiModelGraph

05d22d3

correct AXIS port slicing for Verilog simulation

3a74eea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multi graph build #1174

Support for multi graph build #1174

dimdano commented Jan 24, 2025 •

edited

Loading

JanFSchulte commented Jan 24, 2025

JanFSchulte commented Jan 24, 2025

calad0i left a comment •

edited

Loading

calad0i Mar 9, 2025

dimdano Apr 16, 2025

calad0i Apr 2, 2025

calad0i Apr 5, 2025 •

edited

Loading

dimdano Apr 16, 2025

calad0i Apr 5, 2025

calad0i Apr 5, 2025

dimdano Apr 16, 2025

calad0i Apr 5, 2025

dimdano Apr 16, 2025

calad0i Apr 5, 2025

dimdano Apr 16, 2025

calad0i Apr 5, 2025

dimdano Apr 16, 2025

calad0i Apr 5, 2025

dimdano Apr 16, 2025

calad0i Apr 5, 2025

-        set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]
+        set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]
+        foreach ap_rst_port $ap_rst_ports {
+            # All ports should have the same polarity
+            if {[get_property CONFIG.POLARITY $ap_rst_port] ne $rst_polarity} {
+                puts "Error: Inconsistent CONFIG.POLARITY for ap_rst ports. Aborting."
+                exit 1
+            }
+        }

Support for multi graph build #1174

Are you sure you want to change the base?

Support for multi graph build #1174

Conversation

dimdano commented Jan 24, 2025 • edited Loading

Type of change

Tests

Checklist

JanFSchulte commented Jan 24, 2025

JanFSchulte commented Jan 24, 2025

calad0i left a comment • edited Loading

Choose a reason for hiding this comment

bugs?:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calad0i Apr 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimdano commented Jan 24, 2025 •

edited

Loading

calad0i left a comment •

edited

Loading

calad0i Apr 5, 2025 •

edited

Loading