fastmachinelearning · dimdano · Oct 11, 2024 · Oct 11, 2024 · Oct 14, 2024 · Oct 15, 2024
diff --git a/docs/img/logo_small.png b/docs/img/logo_small.png
diff --git a/docs/ir/multimodelgraph.rst b/docs/ir/multimodelgraph.rst
@@ -0,0 +1,137 @@
+=======================
+MultiModelGraph Class
+=======================
+
+This page documents the ``MultiModelGraph`` class, which enables handling multiple subgraphs (each represented as a ``ModelGraph``) derived from a single original model.
+The central concept here is the division of a larger model into multiple smaller subgraphs at given layers which can be useful for:
+
+* Very large models
+* Step-wise optimization
+* Modular design flows
+
+A ``MultiModelGraph`` manages these subgraphs, facilitating:
+
+* Parallel building and synthesis
+* Stitched designs (merging the subgraphs in HW after synthesis)
+* Simulation and performance estimation of the stitched design
+
+--------------
+Keras Example
+--------------
+
+For example, when converting a Keras model, you can specify the layers at which to split the model directly:
+
+.. code-block:: python
+
+   config = hls4ml.utils.config_from_keras_model(model, granularity='model')
+
+   hls_model = hls4ml.converters.convert_from_keras_model(
+       model,
+       hls_config=config,
+       backend='vitis',
+       split_layer_names = ['layer3', 'layer7']
+   )
+
+Here, the ``hls_model`` is actually a ``MultiModelGraph`` containing three subgraphs. Each subgraph is a ``ModelGraph`` accessible via indexing: ``hls_model[i]``.
+
+
+----------------------------------
+Key Methods for MultiModelGraph
+----------------------------------
+
+* :ref:`compile <mmg-compile-method>`
+* :ref:`predict <mmg-predict-method>`
+* :ref:`build <mmg-build-method>`
+* :ref:`trace <mmg-trace-method>`
+* :ref:`make_multi_graph <make_multi_graph-method>`
+
+----
+
+.. _make_multi_graph-method:
+
+``make_multi_graph`` method
+===========================
+
+The ``make_multi_graph`` method of ``ModelGraph`` takes a configuration, a full list of layers, the output shapes, and a list of split layers. It returns a ``MultiModelGraph`` that contains multiple ``ModelGraph`` instances.
+
+.. code-block:: python
+
+   from my_hls4ml_lib.modelgraph import ModelGraph
+   multi_graph = ModelGraph.make_multi_graph(config, layer_list, output_shapes, split_layer_names=['fc2', 'fc3'])
+
+This allows modular design flows and easier debugging of large models.
+
+----
+
+.. _mmg-compile-method:
+
+``compile`` method
+==================
+
+Compiles all the individual ``ModelGraph`` subgraphs within the ``MultiModelGraph``. Also, compiles a chained bridge file with all the subgraphs linked together that can be used for the predict function.
+
+.. code-block:: python
+
+   multi_graph.compile()
+
+----
+
+.. _mmg-build-method:
+
+``build`` method
+================
+
+Builds all subgraphs in parallel, each as if they were standalone ``ModelGraph`` projects. Returns reports for each subgraph. If configured, it then runs the stitching flow in Vivado, connecting the individual exported IPs and allowing you to simulate the stitched design at the RTL level.
+
+.. code-block:: python
+
+   report = multi_graph.build(export=True, stitch_design=True)
+
+The returned ``report`` contains data from each subgraph's build and, if stitching was performed, a combined report of the stitched design.
+
+
+----
+
+.. _mmg-predict-method:
+
+``predict`` method
+==================
+
+Performs a forward pass through the chained bridge file using the C-simulation (``sim='csim'``). Data is automatically passed from one subgraph's output to the next subgraph's input. For large stitched designs, you can also leverage RTL simulation (``sim='rtl'``) to perform the forward pass at the register-transfer level. In this case, a Verilog testbench is dynamically generated and executed against the stitched IP design, providing behavioral simulation to accurately verify latency and output at the hardware level. Note that the input data for the RTL simulation must have a single batch dimension.
+
+.. code-block:: python
+
+   # Perform prediction using C-simulation (default)
+   y_csim = hls_model.predict(X, sim='csim')
+
+   # Perform prediction using RTL simulation (behavioral)
+   y_rtl = hls_model.predict(X, sim='rtl')
+
+
+.. _mmg-trace-method:
+
+``trace`` method [TODO]
+================
+
+Provides detailed layer-by-layer outputs across all sub-models, which is essential for debugging or tuning quantization and precision settings.
+
+.. code-block:: python
+
+   final_output, trace_outputs = hls_model.trace(X)
+
+``trace_outputs`` includes intermediate results from each subgraph, enabling insights into the data flow.
+
+--------------------------
+Summary
+--------------------------
+
+The ``MultiModelGraph`` class is a tool for modular hardware design. By splitting a large neural network into multiple subgraphs, building each independently, and then stitching them together, you gain flexibility, parallelism, and facilitate hierarchical design, incremental optimization, and integrated system-level simulations.
+
+--------------------------
+Other Notes
+--------------------------
+
+* Branch Splitting Limitation: Splitting in the middle of a branched architecture (e.g., ResNet skip connections or multi-path networks) is currently unsupported. Also, each split subgraph must have a single input and a single output.
+* Handling Multiple NN Inputs & Outputs: The final NN output can support multiple output layers. However, for networks with multiple input layers, proper synchronization is required to drive inputs—especially for stream interfaces. A fork-join mechanism in the Verilog testbench can help manage input synchronization effectively.
+* RTL Simulation Issue: RTL simulation of stitched IPs with io_type='io_parallel' and a split at the flatten layer leads to improper simulation behavior and should be avoided.
+* Array Partitioning for Parallel I/O: For io_parallel interfaces, all IPs must use the 'partition' pragma instead of 'reshape'.
diff --git a/hls4ml/backends/vitis/vitis_backend.py b/hls4ml/backends/vitis/vitis_backend.py
@@ -1,9 +1,21 @@
+import importlib.util
+import json
 import os
+import shutil
+import subprocess
 import sys
 
 from hls4ml.backends import VivadoBackend
 from hls4ml.model.flow import get_flow, register_flow
-from hls4ml.report import parse_vivado_report
+from hls4ml.report import aggregate_graph_reports, parse_vivado_report
+from hls4ml.utils.simulation_utils import (
+    annotate_axis_stream_widths,
+    prepare_testbench_input,
+    prepare_zero_input,
+    read_testbench_log,
+    write_testbench_input,
+    write_verilog_testbench,
+)
 
 
 class VitisBackend(VivadoBackend):
@@ -94,29 +106,142 @@ def build(
         export=False,
         vsynth=False,
         fifo_opt=False,
+        log_to_stdout=True,
     ):
         if 'linux' in sys.platform:
             found = os.system('command -v vitis_hls > /dev/null')
             if found != 0:
                 raise Exception('Vitis HLS installation not found. Make sure "vitis_hls" is on PATH.')
 
-        curr_dir = os.getcwd()
-        os.chdir(model.config.get_output_dir())
-        os.system(
-            (
-                'vitis_hls -f build_prj.tcl "reset={reset} csim={csim} synth={synth} cosim={cosim} '
-                'validation={validation} export={export} vsynth={vsynth} fifo_opt={fifo_opt}"'
-            ).format(
-                reset=reset,
-                csim=csim,
-                synth=synth,
-                cosim=cosim,
-                validation=validation,
-                export=export,
-                vsynth=vsynth,
-                fifo_opt=fifo_opt,
-            )
+        build_command = (
+            'vitis_hls -f build_prj.tcl "reset={reset} csim={csim} synth={synth} cosim={cosim} '
+            'validation={validation} export={export} vsynth={vsynth} fifo_opt={fifo_opt}"'
+        ).format(
+            reset=reset,
+            csim=csim,
+            synth=synth,
+            cosim=cosim,
+            validation=validation,
+            export=export,
+            vsynth=vsynth,
+            fifo_opt=fifo_opt,
         )
-        os.chdir(curr_dir)
 
-        return parse_vivado_report(model.config.get_output_dir())
+        output_dir = model.config.get_output_dir()
+        stdout_log = os.path.join(output_dir, 'build_stdout.log')
+        stderr_log = os.path.join(output_dir, 'build_stderr.log')
+
+        stdout_target = None if log_to_stdout else open(stdout_log, 'w')
+        stderr_target = None if log_to_stdout else open(stderr_log, 'w')
+
+        try:
+            process = subprocess.Popen(
+                build_command, shell=True, cwd=output_dir, stdout=stdout_target, stderr=stderr_target, text=True
+            )
+            process.communicate()
+
+            if process.returncode != 0:
+                raise Exception(f'Build failed for {model.config.get_project_name()}. See logs for details.')
+        finally:
+            if not log_to_stdout:
+                stdout_target.close()
+                stderr_target.close()
+
+        return parse_vivado_report(output_dir)
+
+    def build_stitched_design(
+        self,
+        model,
+        stitch_design=True,
+        sim_stitched_design=False,
+        export_stitched_design=False,
+        nn_config=None,
+        graph_reports=None,
+        simulation_input_data=None,
+    ):
+
+        os.makedirs(nn_config['OutputDir'], exist_ok=True)
+        stitched_design_dir = os.path.join(nn_config['OutputDir'], nn_config['StitchedProjectName'])
+        if stitch_design:
+            if not os.path.exists(stitched_design_dir):
+                os.makedirs(stitched_design_dir)
+
+        spec = importlib.util.find_spec('hls4ml')
+        hls4ml_path = os.path.dirname(spec.origin)
+        ip_stitcher_path = os.path.join(hls4ml_path, 'templates/vivado/ip_stitcher.tcl')
+        stdout_log = os.path.join(stitched_design_dir, 'stitcher_stdout.log')
+        stderr_log = os.path.join(stitched_design_dir, 'stitcher_stderr.log')
+        nn_config_path = os.path.join(stitched_design_dir, 'nn_config.json')
+        testbench_path = os.path.join(stitched_design_dir, 'testbench.v')
+        testbench_log_path = os.path.join(stitched_design_dir, 'testbench_log.csv')
+
+        try:
+            shutil.copy(ip_stitcher_path, stitched_design_dir)
+        except Exception as e:
+            print(f"Error: {e}. Cannot copy 'ip_stitcher.tcl' to {nn_config['StitchedProjectName']} folder.")
+
+        if nn_config:
+            if nn_config['outputs'][0]['pragma'] == 'stream':
+                last_graph_project_path = os.path.join(
+                    model.graphs[-1].config.get_output_dir(), model.graphs[-1].config.get_project_dir()
+                )
+                annotate_axis_stream_widths(nn_config, last_graph_project_path)
+                with open(nn_config_path, "w") as file:
+                    json.dump(nn_config, file, indent=4)
+
+        if sim_stitched_design:
+            write_verilog_testbench(nn_config, testbench_path)
+            # Produce a testbench input file for every input layer
+            for i, layer in enumerate(nn_config['inputs']):
+                testbench_input_path = os.path.join(stitched_design_dir, f"{layer['name']}_input_data.txt")
+                # We reshape input simulation data to (fifo_depth, batch_size)
+                if simulation_input_data is None:
+                    input_data_reshaped = prepare_zero_input(layer)
+                    print("No simulation input provided. Using zero-filled inputs.")
+                else:
+                    # Handles both single and multi-layer cases. First dim should always be batch size
+                    data = simulation_input_data[i]
+                    input_data_reshaped = prepare_testbench_input(data, layer['fifo_depth'], layer['batch_size'])
+                write_testbench_input(
+                    input_data_reshaped, testbench_input_path, layer['integer_bits'], layer['fractional_bits']
+                )
+            print('Verilog testbench and its input data were generated.')
+
+        print('Running build process of stitched IP...\n')
+        stitch_command = [
+            'vivado',
+            '-mode',
+            'batch',
+            '-nojournal',
+            '-nolog',
+            '-notrace',
+            '-source',
+            ip_stitcher_path,
+            '-tclargs',
+            f'stitch_design={int(stitch_design)}',
+            f'sim_design={int(sim_stitched_design)}',
+            f'export_design={int(export_stitched_design)}',
+            f"stitch_project_name={nn_config['StitchedProjectName']}",
+            f"original_project_name={nn_config['OriginalProjectName']}",
+            'sim_verilog_file=testbench.v',
+        ]
+
+        with open(stdout_log, 'w') as stdout_file, open(stderr_log, 'w') as stderr_file:
+            process = subprocess.Popen(
+                stitch_command, cwd=stitched_design_dir, stdout=stdout_file, stderr=stderr_file, text=True, shell=False
+            )
+            process.communicate()
+            if process.returncode != 0:
+                raise Exception(f"Stitching failed for {nn_config['StitchedProjectName']}. See logs for details.")
+
+        stitched_report = {'StitchedDesignReport': {}}
+        if stitch_design:
+            stitched_report = aggregate_graph_reports(graph_reports)
+
+        if sim_stitched_design:
+            testbench_output = read_testbench_log(testbench_log_path, nn_config['outputs'])
+            stitched_report['BehavSimResults'] = testbench_output['BehavSimResults']
+            stitched_report['StitchedDesignReport']['BestLatency'] = testbench_output['BestLatency']
+            stitched_report['StitchedDesignReport']['WorstLatency'] = testbench_output['WorstLatency']
+
+        return stitched_report
diff --git a/hls4ml/backends/vivado/passes/transform_types.py b/hls4ml/backends/vivado/passes/transform_types.py
@@ -31,6 +31,7 @@ def transform(self, model, node):
                 new_var = self.array_var_converter.convert(var, pragma='stream')
             elif io_type == 'io_parallel':
                 if out_name in node.model.inputs:
+                    # NOTE this needs to be changed to partition
                     new_var = self.array_var_converter.convert(var, pragma='reshape')
                 elif isinstance(var, InplaceTensorVariable):
                     new_var = self.inplace_array_var_converter.convert(var, pragma='')

diff --git a/hls4ml/converters/__init__.py b/hls4ml/converters/__init__.py
@@ -214,7 +214,10 @@ def convert_from_keras_model(
 
     _check_hls_config(config, hls_config)
 
-    return keras_to_hls(config)
+    # Retrieve 'split_layer_names' from kwargs, if provided, for multi-graph creation
+    split_layer_names = kwargs.get('split_layer_names', [])
+
+    return keras_to_hls(config, split_layer_names=split_layer_names)
 
 
 @requires('_torch')

diff --git a/hls4ml/converters/keras_to_hls.py b/hls4ml/converters/keras_to_hls.py
@@ -322,9 +322,18 @@ def parse_keras_model(model_arch, reader):
     return layer_list, input_layers, output_layers, output_shapes
 
 
-def keras_to_hls(config):
+def keras_to_hls(config, split_layer_names=None):
     model_arch, reader = get_model_arch(config)
-    layer_list, input_layers, output_layers, _ = parse_keras_model(model_arch, reader)
-    print('Creating HLS model')
-    hls_model = ModelGraph(config, layer_list, input_layers, output_layers)
+    layer_list, input_layers, output_layers, output_shapes = parse_keras_model(model_arch, reader)
+
+    print('Creating HLS model...')
+    if split_layer_names:
+        hls_model = ModelGraph.make_multi_graph(
+            config, layer_list, input_layers, output_layers, output_shapes, split_layer_names
+        )
+        print('Multi-graph HLS model created.')
+    else:
+        hls_model = ModelGraph(config, layer_list, input_layers, output_layers)
+        print('HLS model created.')
+
     return hls_model