Skip to content

Quartus streaming support for Activations, Dense & Batch Normalization

Javier Duarte requested to merge github/fork/bo3z/quartus-stream into main

Created by: bo3z

A PR enabling support for io_stream in the Qaurtus backend. Currently, the supported layers for streaming I/O are Dense, Activation and Batch Normalisation. Due to the inherent differences between parallel and streaming interfaces in Intel HLS, this is an extensive PR. Furthermore, Intel's HLS stream has certain requirements when implemented (no copy constructor, global declaration etc.). Therefore, It is recommended to review this PR commit by commit (rather than side-by-side diff), with each commit being briefly explained below. Each commit is self-contained and can be checked out and the project compiled.

  1. 2233a68c Allow multiple includes of the same LUT - This simply allows the same look-up tables to be used for both io_parallel and io_stream, by removing the IFNDEF guard

  2. e0032347 Quartus custom Stream & distinction between g++ (nnet::stream) and i++ (ihc::stream):

  • When executing hls4ml.compile() or hls4ml.predict(...), GCC is used. However, GCC doesn't have access to HLS source files (which include Intel's streaming interface, ihc::stream, ihc::stream_in, ihc::stream_out). Unlike ac_int, ac_fixed which are open-source and included in hls4ml, Intel's HLS stream source files are protected by licence and cannot be included in the repository. Therefore, a custom nnet::stream struct is written, having the same high-level function as Intel's HLS ihc::stream, but implemented using queues. Please note, pytests written for streaming layers use nnet::stream. To verify correct functionality of the IP, use cosim with the following command: i++ -march=x86-64 -o myproject_test -v myproject_test.cpp firmware/myproject.cpp
  • Finally, as per this tutorial by Intel: https://youtu.be/aZYBlkcoj8Q?t=819, streams are always passed by reference to the top-level component. A top-level component with a streaming output is void - instead of returning a stream, it takes the stream object by reference. It is not even possible to return a stream from a component, since the internal implementation contains a explicitly deleted copy constructor.
  • Finally, there are 3 distinct types of stream - inputs to an HLS component must be of type stream_in, outputs of type stream_out and inter-component connection of type stream. With this distinction, the HLS compiler is able to distinguish between component inputs and outputs. If only stream was used, the component would not synthesize with the correct inputs/outputs
  1. 1bbcca5f Top-level & Quartus Writer distinction between io_parallel and io_stream:
  • This commit handles the inherent differences between io_parallel and io_stream in myproject.cpp, myproject.h and defines.h - in parallel, the top-level component takes a input struct, containing the array of data and returns a struct containing the output array. On the other hand, as explained above, streaming interfaces are void, as both the input and output and output are passed by reference.
  • Secondly, a new cosim benchmark was written for streaming inputs. Previously (and still for io_parallel), all of the input data is processed, stored to a vector and then passed to the component sequentially. However, due to an explicitly deleted copy constructor, ihc::stream cannot be stored inside a vector. Therefore, this new benchmark processes data from input files and executes the component straight away.
  • Finally, for io_stream, inter-layer connections (type stream) must be declared outside of main() - Intel HLS has a requirement through which all streams are either passed by reference to the top-level component or declared as global variables. No stream types can be instantiated inside the component.
  1. d6008820 Quartus Stream Variable Converter & Backend io_stream enabled:
  • Adds a stream variable converter, in a similar manner to Vivado
  • Add nnet::array, as an array-like struct for storing data inside streams. Implementation similar to Vivado.
  1. 1b17b3b0 Quartus Clone Optimizer - Sets up streaming optimizers in Quartus backend and adds the clone optimizer, as the most commonly used. From here, it should be straight forward to add further streaming passes.

  2. d4da71f3 Tanh bug fix in Quartus - Addresses a small bug when invoking TanH activation. This is simply a pre-requisite for writing streaming activations.

  3. The other commits add support for streaming Dense, BatchNormalization and Activation, in a similar manner to Vivado, with the appropriate pipeline initiation interval:

  • Existing PyTests were expanded to include io_stream. Some new tests (test_activations.py) were written as well, to verify correct results of less-frequently used layers. Important to note, a passing PyTest does not imply correct HLS outputs - as explained above, PyTest uses nnet::stream and HLS, when synthesizing uses ihc::stream - these two are inherently different, even though they offer the same high-level funcionality. To verify correct HLS behaviour, cosim should be used (explained above), as well as some RTL-level simulation, such as ModelSim or Questa. All of the above layers were tested using both PyTest and cosim. Finally, all of the above layers synthesized correctly and produced a valid IP block.
  • Finally, some of the Quartus-equivalent pragmas couldn't be used. For example, Vivado offers a #pragma HLS data_pack directive, for which the Intel equivalent would be hls_bankwidth. However, hls_bankwidth (similar to other memory-optimisation pragmas, except for hls_register) is not supported for variables passed by reference/pointers, which is a necessity for correct functionality.

Merge request reports

Loading