Skip to content

Commit bbaea6a

Browse files
authored
Merge pull request #58 from mli_dev
Release 1.0 RC3
2 parents 01f4868 + 8aef628 commit bbaea6a

27 files changed

+778
-341
lines changed

README.md

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
embARC Machine Learning Inference Library
22
==================================================
33

4-
This repository contains source code of embARC Machine Learning Inference Library (embARC MLI Library),
5-
examples and documentation.
4+
This repository contains source code of embARC Machine Learning Inference Library (embARC MLI Library),
5+
documentation and examples. Read the documentation at [embarc.org](https://embarc.org/embarc_mli).
66

77
## Release notes
88
----------------
@@ -16,7 +16,7 @@ examples and documentation.
1616
* Elementwise (add, sub, mul, min, max)
1717
* Data manipulation (concatanation, permute, 2D padding)
1818
* ReLU, Leaky ReLu, ReLu1, ReLu6
19-
* Softmax, Sigmoid, ThanH
19+
* Softmax, Sigmoid, TanH
2020
3. Supported data layout CHW (Channel-Height-Width standard for Caffe)
2121

2222
## Package structure
@@ -73,8 +73,29 @@ Building of embARC MLI library
7373

7474
5. Result Quality shall be "S/N=1823.9 (65.2 db)"
7575

76+
## Optimizations for code size
77+
------------------------------
78+
By default the embARC MLI Library is build for optimal speed. If code size needs to be reduced, there are two things that can be done:
79+
1. For convolution and pooling layers there are specialized funtions for specific kernel sizes, they are called by a wrapper functions based on the parameters.
80+
These parameters are compile time constant in the application, so the application can directly call the specialized functions. This will reduce over all code size.
81+
Please be aware that the list of specializations is not guaranteed to be backwards compatible between releases.
82+
83+
2. Use a different optimization mode when calling the makefile. OPTMODE=size will optimize for size. default is OPTMODE=speed
84+
'gmake TCF_FILE=../../hw/em9d.tcf OPTMODE=size'
7685

7786
## Known Issues
7887
---------------
7988
1. Optimal performance for 8-bit data requires version of MetaWare Development Tools 2019.06 or later
8089

90+
## Frequently Asked Questions
91+
---------------
92+
93+
Q: Can I use ARC GNU tools to build embARC MLI library?
94+
A: No you cannot. embARC MLI Library must be built by MetaWare Development Tools only. Read the documentation at [embarc.org]( https://embarc.org/embarc_mli/doc/build/html/getting_started/getting_started.html#build-library) for details
95+
96+
Q: Can I use MetaWare Development Tools Lite to pre-build embARC MLI library and ARC GNU to build example application?
97+
A: No you cannot. embARC MLI Library must be built by full version of MetaWare Development Tools. Binaries built with MWDT Lite are not compatible with ARC GNU Tools and full MetaWare Development Tools. Read the MWDT Lite documentation for details.
98+
99+
Q: I can not build and run example application for my Synopsys board (EMSK, IoTDK, etc), what I shall do?
100+
A: If you build for Synopsys boards refer to documentation [embarc.org](https://embarc.org/platforms.html) as a good starting point.
101+
You should also note that example applications support different configurations for pre trained models and thus memory requirements, not all configurations can be built and run on Synopsys boards due to memory limitations and HW capabilities, read example application readme for details. embARC MLI Library must be also pre built specifically for your board by MetaWare Development Tools. Please note that makefiles provided with examples are configured for IoTDK only if GNU tools are used.

build/rules.mk

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ quote=$(subst %,$(Q)%, \
6464
# Global settings
6565
#=============================================================
6666
TOOLCHAIN ?= gnu
67+
#optmization mode
68+
OPTMODE ?= speed
6769

6870
export DEBUG_BUILD?=ON
6971
#export ASM_OUT?=OFF
@@ -76,6 +78,13 @@ endif
7678
# # CFLAGS += -Hon=Print_var_info
7779
#endif
7880

81+
ifeq ($(OPTMODE),size)
82+
CFLAGS += -O2 -Hlto
83+
endif
84+
ifeq ($(OPTMODE),speed)
85+
CFLAGS += -O3
86+
endif
87+
7988
#=============================================================
8089
# Files and directories
8190
#=============================================================

doc/documents/MLI_kernels/convolution_2d.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ inputs shape.
4949

5050
For more details on calculations see convolution part of `TensorFlow–Neural Network details`_.
5151

52-
.. _TensorFlow–Neural Network details: https://www.tensorflow.org/api_guides/python/nn.
52+
.. _TensorFlow–Neural Network details: https://www.tensorflow.org/versions/r1.11/api_guides/python/nn
5353

5454
ReLU activation function might be applied to result of convolution. The
5555
following types of ReLU activations are supported (for more info see

doc/documents/MLI_kernels/pooling_max.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ padding parameters. This logic is similar to convolution 2D operation
3434
For more information on calculations, see the pooling part of
3535
`TensorFlow–Neural Network details`_.
3636

37-
.. _TensorFlow–Neural Network details: https://www.tensorflow.org/api_guides/python/nn
37+
.. _TensorFlow–Neural Network details: https://www.tensorflow.org/versions/r1.11/api_guides/python/nn
3838

3939
.. caution::
4040
Ensure that input and output

doc/documents/library_model/functions.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,8 @@ Naming convention for the specializations: \
137137
| | parameters to achieve | |
138138
| | same output size | |
139139
| | (similar to ‘SAME’ | |
140-
| | padding scheme used | |
141-
| | in TensorFlow [3]) | |
140+
| | `padding scheme`_ used | |
141+
| | in TensorFlow) | |
142142
+-----------------------+---------------------------+-----------------------+
143143
| ``Input channels`` | [_ch\ *n*] | convolution group, |
144144
| | | pooling group |
@@ -182,6 +182,9 @@ Naming convention for the specializations: \
182182
| | specializations. | |
183183
+-----------------------+---------------------------+-----------------------+
184184

185+
.. _padding scheme: https://www.tensorflow.org/versions/r1.11/api_guides/python/nn#Notes_on_SAME_Convolution_Padding
186+
187+
185188

186189
For example, the function name of a 16bit 2d convolution kernel with
187190
CHW layout and a kernel size of 3x3 and stride of 1 is:

doc/documents/library_model/hw_dependencies_config.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ round to the nearest even). All parameters are described in *MetaWare
109109
Fixed-Point Reference for ARC EM and ARC HS*.
110110

111111
.. note::
112-
The MLI Library sets the required DSP mode inside each function where it is needed, but does not restore it to previous state. If another ARC DSP code beside MLI library is used in an application, ensure that you set the required DSP mode before its execution. For more information see “Configuring the ARC DSP Extensions” section of *MetaWare DSP Programming Guide for ARC EM and ARC HS* or “Using the FXAPI” section of entry [5] of *MetaWare Fixed-Point Reference for ARC EM and ARC HS*.
112+
The MLI Library sets the required DSP mode inside each function where it is needed, but does not restore it to previous state. If another ARC DSP code beside MLI library is used in an application, ensure that you set the required DSP mode before its execution. For more information see “Configuring the ARC DSP Extensions” section of *MetaWare DSP Programming Guide for ARC EM and ARC HS* or “Using the FXAPI” section of *MetaWare Fixed-Point Reference for ARC EM and ARC HS*.
113113

114114
AGU Support
115115
^^^^^^^^^^^

examples/example_cifar10_caffe/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ BUILD_DIR ?= ./obj
2828
OUT_NAME ?= example_cifar10_caffe
2929
ifeq ($(TOOLCHAIN),mwdt)
3030
# MWDT specific options
31-
CFLAGS = -Hnocopyr -Hpurge -Hheap=8K -Hstack=1K -Hfxapi -e_start -Bgrouplib -Hldopt=-q -O0 -Hsdata0
31+
CFLAGS = -Hnocopyr -Hpurge -Hheap=8K -Hstack=1K -Hfxapi -e_start -Bgrouplib -Hldopt=-q -Hsdata0 -Xdsp_ctrl=postshift,guard,convergent -Hdense_prologue
3232
else
3333
PREBUILT_LIB ?= $(EMBARC_MLI_DIR)/examples/prebuilt/libmli.a
3434

examples/example_cifar10_caffe/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,21 @@ More Options on Building and Running
106106
---------------------------------------
107107
CIFAR-10 example application is implemented in the same way as LSTM Based HAR example and provides the same configuration and running abilities. For more details see appropriate HAR example [description part](/examples/example_har_smartphone/README.md#more-options-on-building-and-running).
108108

109+
Data Memory Requirements
110+
----------------------------
111+
112+
Example application uses statically allocated memory for model weights and intermediate results (activations) and structures. Requirements for them depends on model bit depth
113+
configuration define and listed in table below. Before compiling application for desired hardware configuration, be sure it has enough memory to keep data.
114+
115+
| Data | MODEL_BIT_DEPTH=8 | MODEL_BIT_DEPTH=816 | MODEL_BIT_DEPTH=16 |
116+
| :----------------------------------------------------: | :-------------------: | :-------------------: | :------------------: |
117+
| Weights <br/>*.mli_model* and *mli_model_p2 * sections | 33212 bytes | 33212 bytes | 66420 bytes |
118+
| Activations 1 <br/>*.Zdata * section | 32768 bytes | 65536 bytes | 65536 bytes |
119+
| Activations 2 <br/>*.Ydata * section | 8192 bytes | 16384 bytes | 16384 bytes |
120+
| Structures <br/>*.mli_data* section | 384 bytes | 384 bytes | 384 bytes |
121+
122+
By default, application uses MODEL_BIT_DEPTH=16 mode. Application code size depends on target hardware configuration and compilation flags. MLI Library code is wrapped into mli_lib section.
123+
109124
References
110125
----------------------------
111126
CIFAR-10 Dataset:

examples/example_cifar10_caffe/cifar10_model_chw.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -434,7 +434,7 @@ static void check_result(
434434
//========================================================================================
435435
#if (MODEL_BIT_DEPTH != MODEL_FX_8)
436436
static inline mli_status maxpool_chw(const mli_tensor *in, const mli_pool_cfg *cfg, mli_tensor *out) {
437-
return mli_krn_maxpool_chw_fx16_k3x3(in, cfg, out);
437+
return mli_krn_maxpool_chw_fx16_k3x3_krnpad(in, cfg, out);
438438
}
439439

440440
static inline mli_status avepool_chw(const mli_tensor *in, const mli_pool_cfg *cfg, mli_tensor *out) {
@@ -455,7 +455,7 @@ static inline mli_status mli_krn_permute_fx(const mli_tensor *in, const mli_perm
455455

456456
#else // MODEL_BIT_DEPTH == (MODEL_FX_8W16D || MODEL_FX_8W16D)
457457
static inline mli_status maxpool_chw(const mli_tensor *in, const mli_pool_cfg *cfg, mli_tensor *out) {
458-
return mli_krn_maxpool_chw_fx8_k3x3(in, cfg, out);
458+
return mli_krn_maxpool_chw_fx8_k3x3_krnpad(in, cfg, out);
459459
}
460460

461461
static inline mli_status avepool_chw(const mli_tensor *in, const mli_pool_cfg *cfg, mli_tensor *out) {

examples/example_har_smartphone/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ BUILD_DIR ?= ./obj
2828
OUT_NAME ?= example_har_smartphone
2929
ifeq ($(TOOLCHAIN),mwdt)
3030
# MWDT specific options
31-
CFLAGS = -Hnocopyr -Hpurge -Hheap=8K -Hstack=1K -Hfxapi -e_start -Bgrouplib -Hldopt=-q -O0 -Hsdata0
31+
CFLAGS = -Hnocopyr -Hpurge -Hheap=8K -Hstack=1K -Hfxapi -e_start -Bgrouplib -Hldopt=-q -Hsdata0 -Xdsp_ctrl=postshift,guard,convergent -Hdense_prologue
3232
else
3333
PREBUILT_LIB ?= $(EMBARC_MLI_DIR)/examples/prebuilt/libmli.a
3434

0 commit comments

Comments
 (0)