-
Notifications
You must be signed in to change notification settings - Fork 448
Support for multi graph build #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pre-commit.ci autofix |
The changes to the keras frontend seem to be quite minimal. Would be great if we could support also the others (pytorch, QONNX), if it's not too much hassle. |
18a445e
to
c929f78
Compare
3c16abe
to
c6da873
Compare
- The method returns two instances of the `ModelGraph` class. - Each instance is initialized with the same config, only output folder changes, allowing separate models to be created in one call. - This improves usability by simplifying the process of generating multiple graphs from a single configuration input.
* takes as input the split_layer_names as split points * returns a list of ModelGraph * works for dense/fc layers at the moment * need to find input_shape of split layer for conv layers. Currenly in dense/fc we find it through 'n_in' key
* Automatically scans and add HLS IP cores for subgraphs in Vivado * Automatically detects interface types used by the IPs (either unpacked or AXI stream) and configures the connections accordingly. * Also, updated the multigraph logic to copy the precision of the last layer from the previous graph and apply it to the input layer of the next graph.
…rns a MultiModelGraph instance
Notes: * missing X_INTERFACE_INFO for axi interfaces in the generated HDL during packaging * Vivado throws warning : Misformed interface info * We ommit this warning at the moment, as IP can still be packaged
1b6067d
to
7fbf439
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Individual issues are added in the code comments, most are minor. Have not yet checked how io_stream
is handled, and have not yet examen hdl simulation behavior yet. Good job!
The major concern from my side is that optimizer sees only each subgraph. This would cause model-level optimizers and many of the graph manipulation operations to be broken.
I strongly suggest presenting the graph objects to the optimizers, like through a proxy generated with overloaded @property
. Or, maybe consider constructing the MultiModelGraph
from an already optimized ModelGraph
.
With the changes in #1158 , ModelGraph
instantiation will be decoupled from optimizer flows, and making subgraphs from an already optimzied graph could make it more competiable.
bugs?:
-
Split before a merge layer can cause accessing non-existent properties, likely related to incomplete graph issue. Seeing similar issues here and there.
-
If the non-last graph has a port to the output, the stitched graph will lose that interface.
inp = keras.Input(shape=(16,), name='inp')
o1 = keras.layers.Dense(8, activation='relu', name='dense1')(inp)
o2 = keras.layers.Dense(8, activation='relu', name='dense2')(o1)
o3 = keras.layers.Dense(8, activation='relu', name='dense3')(o2)
o4 = keras.layers.Dense(16, activation='relu', name='dense4')(o3)
model = keras.Model(inputs=inp, outputs=[o3, o4])
hls_conf = {
'Model': {
'Precision': 'fixed<32,16>',
'ReuseFactor': 1,
}
}
model_hls = convert_from_keras_model(model, output_dir='/tmp/tt', backend='Vitis', hls_config=hls_conf, split_layer_names=['dense4'])
""" | ||
Tests the multi-graph splitting and stitching process. | ||
- Verifies that predictions from the monolithic and multi-graph versions match with the CSimulation. | ||
- When granularity='name', an additional HLS build and stitched RTL simulation step is performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's special about granularity here? Is it just a proxy to switch on those in the test, or there are deeper reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the option to switch tests in case the behavior between the two approaches changes in the future. Also with granularity, it means for example we have to correctly inherit precision from previous graph and the I/O ports of each IP in Vivado are properly stitched with equal-width ports.
inp = np.expand_dims(X_input[0], axis=0) | ||
sim_results = hls_model_multi.predict(inp, sim='rtl') | ||
for sim_out, pred_out in zip(sim_results, list([pred_multi[0][0], pred_multi[1][0]])): | ||
np.testing.assert_allclose(sim_out, pred_out, rtol=0, atol=0.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atol
of 0.3 is large. I think as you mentioned that bit-exactness was fixed, can we switch to test for exactness?
hls4ml/converters/keras_to_hls.py
Outdated
print('Creating HLS model...') | ||
merge_layers = ['add', 'subtract', 'multiply', 'average', 'maximum', 'minimum', 'concatenate', 'dot'] | ||
if split_layer_names: | ||
if any(any(layer in name for layer in merge_layers) for name in split_layer_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking layer type by name may not always work, it's better to use the class_name
field in the config generated.
For merge_layers
one can check the numbers of i/o interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. the check is handled by MultiModelGraph
now using the class_name
approach. Haven’t fully tested how these layers behave, thus I restrict them for now. Also, users should avoid selecting a split layer within a branch (or we could have a layer_in_branch
check for these cases)
if last_prec is not None | ||
else 'auto' | ||
) | ||
if last_output_precision == 'auto' or last_output_precision is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At its current stage I would expect a bunch of optimizers acting non-locally (with information from more than a single layer) will be broken with the current method. Also, I would suggest avoiding using classmethod to creating a class that is not subclass of cls
.
I think this can be done by either
- subclass
ModelGraph
and implement frequently used properties through proxies, such asreplace_node
,graph
, and other stuffs - Convert
ModelGraph
toMultiModelGraph
in a later stage, such as writer or post instantiation (e.g., subclass and overload the constructor)
if {[llength $ap_rst_ports] > 0} { | ||
# Get the CONFIG.POLARITY property from one of the IP's 'ap_rst' pins | ||
set sample_rst_pin [lindex $ap_rst_ports 0] | ||
set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest asserting all ports are consistent
set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin] | |
set rst_polarity [get_property CONFIG.POLARITY $sample_rst_pin] | |
foreach ap_rst_port $ap_rst_ports { | |
# All ports should have the same polarity | |
if {[get_property CONFIG.POLARITY $ap_rst_port] ne $rst_polarity} { | |
puts "Error: Inconsistent CONFIG.POLARITY for ap_rst ports. Aborting." | |
exit 1 | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Compile all graphs | ||
OBJECT_FILES=() | ||
for g in "${graph_project_names[@]}"; do | ||
SRC_FILE="${g}/firmware/${ORIGINAL_PROJECT}_${g}.cpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallel compiling can be used here. As you have compiled all subgraphs into shared libs, would be better if just link to them instead of compiling all subgraphs again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added parallelization in the bash script. The subgraphs are now compiled just once within the script.
hls4ml/model/graph.py
Outdated
|
||
def compile(self): | ||
for g in self.graphs: | ||
g.compile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose one of the two following:
- Not compiling each subgraph, just dump the projects here. Compile and link everything only in the multigraph compilation step.
- Compile each subgraph individually, and link them together for the multigraph prediction.
Parallel compiling should be enabled for individual subgraphs; project writing/compilation should be isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, went with option 1, all subgraphs are compiled in parallel in build_lib_multigraph.sh
.
print('Verilog testbench and its input data were generated.') | ||
|
||
print('Running build process of stitched IP...\n') | ||
stitch_command = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest exporting this into the build_prj.tcl
and invoke it from these, as having hls4ml creating the model and put them on another machine for HLS/logic could be a common workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stitch_command
is relatively fast and only runs after all the individual subgraph builds are complete. However, since hls4ml manages these builds in parallel using a Python thread pool, supporting this workflow on a remote server would require a Python script that mimics this behavior, so essentially looping over each subgraph directory and running its corresponding build_prj.tcl
in parallel using threads or processes. It's not hard to set up and I will do it once we finalize the flow.
@@ -450,6 +450,21 @@ def write_weights(self, model): | |||
weights, model.config.get_output_dir(), namespace=namespace, write_txt_file=write_txt | |||
) | |||
|
|||
def write_multigraph_weights(self, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid writing double copies of all weights? (e.g., read from the original paths in the stiched project, or block the writing of original weights for individual subgraphs, like set a is_subgraph
property in each ModelGraph
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, my plan was to keep each subgraph self-contained, so that it could be compiled or reused independently without needing to be aware of the stitched context, and also avoid changes to the ModelGraph class. Also, since WEIGHTS_DIR is hardcoded at compile time via a preprocessor variable, it made it a bit tricky to dynamically resolve paths at runtime; some weights could not be found at runtime. The bridge file also relies on this variable. So, I ended up copying all weights into the stitched directory to keep things simple for now.
return self.graphs[index] | ||
|
||
def parse_nn_config(self): | ||
nn_config = {"inputs": [], "outputs": []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of the nn_config does not seem to support graphs with multiple i/o interfaces. While it is not yet supported now, is it something we want to support in the future?
📝 This feature enables the division of a larger model into multiple smaller subgraphs at given layers. With the new MultiModelGraph class we manage these subgraphs (each represented as a ModelGraph) enabling parallel building and synthesis, stitched designs (merging the subgraphs in HW after synthesis), simulation and performance estimation of the stitched design. Can be useful for:
Type of change
Tests
Test Configuration:
A unit test was added in
test/pytest/test_multi_graph.py
Checklist
pre-commit
on the files I edited or added.