Skip to content

Support for multi graph build #1174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

dimdano
Copy link

@dimdano dimdano commented Jan 24, 2025

📝 This feature enables the division of a larger model into multiple smaller subgraphs at given layers. With the new MultiModelGraph class we manage these subgraphs (each represented as a ModelGraph) enabling parallel building and synthesis, stitched designs (merging the subgraphs in HW after synthesis), simulation and performance estimation of the stitched design. Can be useful for:

  • Very large models
  • Step-wise optimization
  • Modular design flows

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

Test Configuration:

A unit test was added in test/pytest/test_multi_graph.py

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@JanFSchulte
Copy link
Contributor

pre-commit.ci autofix

@JanFSchulte
Copy link
Contributor

The changes to the keras frontend seem to be quite minimal. Would be great if we could support also the others (pytorch, QONNX), if it's not too much hassle.

@dimdano dimdano force-pushed the make_multi_graph branch from 3c16abe to c6da873 Compare March 3, 2025 14:09
@dimdano dimdano marked this pull request as ready for review March 3, 2025 16:01
dimdano added 25 commits March 4, 2025 15:42
- The method returns two instances of the `ModelGraph` class.
- Each instance is initialized with the same config, only output folder changes, allowing separate models to be created in one call.
- This improves usability by simplifying the process of generating multiple graphs from a single configuration input.
* takes as input the split_layer_names as split points
* returns a list of ModelGraph
* works for dense/fc layers at the moment
* need to find input_shape of split layer for conv layers. Currenly in dense/fc we find it through 'n_in' key
* Automatically scans and add HLS IP cores for subgraphs in Vivado

* Automatically detects interface types used by the IPs (either unpacked or AXI stream) and configures the connections accordingly.

* Also, updated the multigraph logic to copy the precision of the last layer from the previous graph and apply it to the input layer of the next graph.
Notes:
* missing X_INTERFACE_INFO for axi interfaces in the generated HDL during packaging
* Vivado throws warning : Misformed interface info
* We ommit this warning at the moment, as IP can still be packaged
@dimdano dimdano force-pushed the make_multi_graph branch from 1b6067d to 7fbf439 Compare March 4, 2025 15:06
@bo3z bo3z added this to the v1.1.0 milestone Mar 7, 2025
Copy link
Contributor

@calad0i calad0i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual issues are added in the code comments, most are minor. Have not yet checked how io_stream is handled, and have not yet examen hdl simulation behavior yet. Good job!

The major concern from my side is that optimizer sees only each subgraph. This would cause model-level optimizers and many of the graph manipulation operations to be broken.

I strongly suggest presenting the graph objects to the optimizers, like through a proxy generated with overloaded @property. Or, maybe consider constructing the MultiModelGraph from an already optimized ModelGraph.

With the changes in #1158 , ModelGraph instantiation will be decoupled from optimizer flows, and making subgraphs from an already optimzied graph could make it more competiable.

bugs?:

  • Split before a merge layer can cause accessing non-existent properties, likely related to incomplete graph issue. Seeing similar issues here and there.

  • If the non-last graph has a port to the output, the stitched graph will lose that interface.

inp = keras.Input(shape=(16,), name='inp')
o1 = keras.layers.Dense(8, activation='relu', name='dense1')(inp)
o2 = keras.layers.Dense(8, activation='relu', name='dense2')(o1)
o3 = keras.layers.Dense(8, activation='relu', name='dense3')(o2)
o4 = keras.layers.Dense(16, activation='relu', name='dense4')(o3)
model = keras.Model(inputs=inp, outputs=[o3, o4])

hls_conf = {
    'Model': {
        'Precision': 'fixed<32,16>',
        'ReuseFactor': 1,
    }
}
model_hls = convert_from_keras_model(model, output_dir='/tmp/tt', backend='Vitis', hls_config=hls_conf, split_layer_names=['dense4'])

"""
Tests the multi-graph splitting and stitching process.
- Verifies that predictions from the monolithic and multi-graph versions match with the CSimulation.
- When granularity='name', an additional HLS build and stitched RTL simulation step is performed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's special about granularity here? Is it just a proxy to switch on those in the test, or there are deeper reasons?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the option to switch tests in case the behavior between the two approaches changes in the future. Also with granularity, it means for example we have to correctly inherit precision from previous graph and the I/O ports of each IP in Vivado are properly stitched with equal-width ports.

inp = np.expand_dims(X_input[0], axis=0)
sim_results = hls_model_multi.predict(inp, sim='rtl')
for sim_out, pred_out in zip(sim_results, list([pred_multi[0][0], pred_multi[1][0]])):
np.testing.assert_allclose(sim_out, pred_out, rtol=0, atol=0.3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atol of 0.3 is large. I think as you mentioned that bit-exactness was fixed, can we switch to test for exactness?

print('Creating HLS model...')
merge_layers = ['add', 'subtract', 'multiply', 'average', 'maximum', 'minimum', 'concatenate', 'dot']
if split_layer_names:
if any(any(layer in name for layer in merge_layers) for name in split_layer_names):
Copy link
Contributor

@calad0i calad0i Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking layer type by name may not always work, it's better to use the class_name field in the config generated.
For merge_layers one can check the numbers of i/o interfaces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. the check is handled by MultiModelGraph now using the class_name approach. Haven’t fully tested how these layers behave, thus I restrict them for now. Also, users should avoid selecting a split layer within a branch (or we could have a layer_in_branch check for these cases)

if last_prec is not None
else 'auto'
)
if last_output_precision == 'auto' or last_output_precision is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At its current stage I would expect a bunch of optimizers acting non-locally (with information from more than a single layer) will be broken with the current method. Also, I would suggest avoiding using classmethod to creating a class that is not subclass of cls.
I think this can be done by either

  • subclass ModelGraph and implement frequently used properties through proxies, such as replace_node, graph, and other stuffs
  • Convert ModelGraph to MultiModelGraph in a later stage, such as writer or post instantiation (e.g., subclass and overload the constructor)

if {[llength $ap_rst_ports] > 0} {
# Get the CONFIG.POLARITY property from one of the IP's 'ap_rst' pins
set sample_rst_pin [lindex $ap_rst_ports 0]
set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest asserting all ports are consistent

Suggested change
set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]
set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin]
foreach ap_rst_port $ap_rst_ports {
# All ports should have the same polarity
if {[get_property CONFIG.POLARITY $ap_rst_port] ne $rst_polarity} {
puts "Error: Inconsistent CONFIG.POLARITY for ap_rst ports. Aborting."
exit 1
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Compile all graphs
OBJECT_FILES=()
for g in "${graph_project_names[@]}"; do
SRC_FILE="${g}/firmware/${ORIGINAL_PROJECT}_${g}.cpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel compiling can be used here. As you have compiled all subgraphs into shared libs, would be better if just link to them instead of compiling all subgraphs again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added parallelization in the bash script. The subgraphs are now compiled just once within the script.


def compile(self):
for g in self.graphs:
g.compile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose one of the two following:

  • Not compiling each subgraph, just dump the projects here. Compile and link everything only in the multigraph compilation step.
  • Compile each subgraph individually, and link them together for the multigraph prediction.

Parallel compiling should be enabled for individual subgraphs; project writing/compilation should be isolated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, went with option 1, all subgraphs are compiled in parallel in build_lib_multigraph.sh.

print('Verilog testbench and its input data were generated.')

print('Running build process of stitched IP...\n')
stitch_command = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest exporting this into the build_prj.tcl and invoke it from these, as having hls4ml creating the model and put them on another machine for HLS/logic could be a common workflow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stitch_command is relatively fast and only runs after all the individual subgraph builds are complete. However, since hls4ml manages these builds in parallel using a Python thread pool, supporting this workflow on a remote server would require a Python script that mimics this behavior, so essentially looping over each subgraph directory and running its corresponding build_prj.tcl in parallel using threads or processes. It's not hard to set up and I will do it once we finalize the flow.

@@ -450,6 +450,21 @@ def write_weights(self, model):
weights, model.config.get_output_dir(), namespace=namespace, write_txt_file=write_txt
)

def write_multigraph_weights(self, model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid writing double copies of all weights? (e.g., read from the original paths in the stiched project, or block the writing of original weights for individual subgraphs, like set a is_subgraph property in each ModelGraph)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, my plan was to keep each subgraph self-contained, so that it could be compiled or reused independently without needing to be aware of the stitched context, and also avoid changes to the ModelGraph class. Also, since WEIGHTS_DIR is hardcoded at compile time via a preprocessor variable, it made it a bit tricky to dynamically resolve paths at runtime; some weights could not be found at runtime. The bridge file also relies on this variable. So, I ended up copying all weights into the stitched directory to keep things simple for now.

return self.graphs[index]

def parse_nn_config(self):
nn_config = {"inputs": [], "outputs": []}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the nn_config does not seem to support graphs with multiple i/o interfaces. While it is not yet supported now, is it something we want to support in the future?

@bo3z bo3z modified the milestones: v1.1.0, v1.2.0 Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants