Quartus streaming support for Activations, Dense & Batch Normalization
Created by: bo3z
A PR enabling support for io_stream
in the Qaurtus backend. Currently, the supported layers for streaming I/O are Dense, Activation and Batch Normalisation. Due to the inherent differences between parallel and streaming interfaces in Intel HLS, this is an extensive PR. Furthermore, Intel's HLS stream
has certain requirements when implemented (no copy constructor, global declaration etc.). Therefore, It is recommended to review this PR commit by commit (rather than side-by-side diff), with each commit being briefly explained below. Each commit is self-contained and can be checked out and the project compiled.
-
2233a68c Allow multiple includes of the same LUT - This simply allows the same look-up tables to be used for both
io_parallel
andio_stream
, by removing theIFNDEF
guard -
e0032347 Quartus custom Stream & distinction between g++ (nnet::stream) and i++ (ihc::stream):
- When executing
hls4ml.compile()
orhls4ml.predict(...)
, GCC is used. However, GCC doesn't have access to HLS source files (which include Intel's streaming interface,ihc::stream
,ihc::stream_in
,ihc::stream_out
). Unlikeac_int
,ac_fixed
which are open-source and included in hls4ml, Intel's HLS stream source files are protected by licence and cannot be included in the repository. Therefore, a customnnet::stream
struct is written, having the same high-level function as Intel's HLSihc::stream
, but implemented using queues. Please note,pytests
written for streaming layers usennet::stream
. To verify correct functionality of the IP, use cosim with the following command:i++ -march=x86-64 -o myproject_test -v myproject_test.cpp firmware/myproject.cpp
- Finally, as per this tutorial by Intel: https://youtu.be/aZYBlkcoj8Q?t=819, streams are always passed by reference to the top-level component. A top-level component with a streaming output is
void
- instead of returning a stream, it takes thestream
object by reference. It is not even possible to return a stream from a component, since the internal implementation contains a explicitly deleted copy constructor. - Finally, there are 3 distinct types of stream - inputs to an HLS component must be of type
stream_in
, outputs of typestream_out
and inter-component connection of typestream
. With this distinction, the HLS compiler is able to distinguish between component inputs and outputs. If onlystream
was used, the component would not synthesize with the correct inputs/outputs
- 1bbcca5f Top-level & Quartus Writer distinction between io_parallel and io_stream:
- This commit handles the inherent differences between
io_parallel
andio_stream
inmyproject.cpp
,myproject.h
anddefines.h
- in parallel, the top-level component takes a input struct, containing the array of data and returns a struct containing the output array. On the other hand, as explained above, streaming interfaces arevoid
, as both the input and output and output are passed by reference. - Secondly, a new cosim benchmark was written for streaming inputs. Previously (and still for
io_parallel
), all of the input data is processed, stored to a vector and then passed to the component sequentially. However, due to an explicitly deleted copy constructor,ihc::stream
cannot be stored inside a vector. Therefore, this new benchmark processes data from input files and executes the component straight away. - Finally, for
io_stream
, inter-layer connections (typestream
) must be declared outside ofmain()
- Intel HLS has a requirement through which all streams are either passed by reference to the top-level component or declared as global variables. Nostream
types can be instantiated inside the component.
- d6008820 Quartus Stream Variable Converter & Backend io_stream enabled:
- Adds a stream variable converter, in a similar manner to Vivado
- Add
nnet::array
, as an array-like struct for storing data inside streams. Implementation similar to Vivado.
-
1b17b3b0 Quartus Clone Optimizer - Sets up streaming optimizers in Quartus backend and adds the clone optimizer, as the most commonly used. From here, it should be straight forward to add further streaming passes.
-
d4da71f3 Tanh bug fix in Quartus - Addresses a small bug when invoking
TanH
activation. This is simply a pre-requisite for writing streaming activations. -
The other commits add support for streaming
Dense
,BatchNormalization
andActivation
, in a similar manner to Vivado, with the appropriate pipeline initiation interval:
- Existing PyTests were expanded to include
io_stream
. Some new tests (test_activations.py
) were written as well, to verify correct results of less-frequently used layers. Important to note, a passing PyTest does not imply correct HLS outputs - as explained above, PyTest usesnnet::stream
and HLS, when synthesizing usesihc::stream
- these two are inherently different, even though they offer the same high-level funcionality. To verify correct HLS behaviour, cosim should be used (explained above), as well as some RTL-level simulation, such as ModelSim or Questa. All of the above layers were tested using both PyTest and cosim. Finally, all of the above layers synthesized correctly and produced a valid IP block. - Finally, some of the Quartus-equivalent pragmas couldn't be used. For example, Vivado offers a
#pragma HLS data_pack
directive, for which the Intel equivalent would behls_bankwidth
. However,hls_bankwidth
(similar to other memory-optimisation pragmas, except forhls_register
) is not supported for variables passed by reference/pointers, which is a necessity for correct functionality.