Description

This is the refined version of a Conv1D/2D implementation that urolls the input feature matrix of im2col algorithm for the io_parallel implementation. The general idea is to generate code for im2col transformation with exact instructions for each layer instead of synthesizing a generic C++ function because the HLS compiler has issues with it. With this implementation, I was able to synthesize layers with <= 4096 elements (the usual partitioning limit). The old implementations had trouble with far smaller layers.

Based on the unrolled im2col step, the implementation further uses an adapted matrix-vector multiplication for Resource or Latency strategy. Note that using overall Latency strategy won't work as that will pipeline the entire design and cause all the loops to be urolled and this breaks the synthesis. Therefore, using Latency strategy for the model will issue a warning and switch to the Resource strategy (aka "dataflow"). Individual layers may still use the Latency strategy.

A new turning knob is introduced to be combined with the ReuseFactor to control the amount of parallelism: ParallelizationFactor. This controls the number of output pixels processed in parallel. Defaults to 1, implying no parallelization. Valid values are divisors of the out_height * out_width, though hls4ml will warn if the an incorrect ParallelizationFactor is used.

One feature of this implementation that wasn't part of the original implementation from last year is the predictable II. In general, for Resource strategy, II = (ReuseFactor + C) * out_height * out_width / ParallelizationFactor + 1 where C is ~4. For Latency strategy C is 1-2. The +1 is for the function call itself.

This only touches the base Conv1D/2D layers, SeparableConv1D/2D will come as a later PR. PointwiseConv1D/2D needs investigation if it should be a special case at all with this implementation.

Limitations:

in_height  * in_width  * n_chan <= 4096
out_height * out_width * n_filt <= 4096

In order to wire all this, the core of the layers had to be extended. A new type of an attribute is introduced Source, representing generated source code. Layers can have any number of generated source codes. Writer can pick up this information.

Type of change

New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Breaking in a sense that it replaces previous implementations and changes slightly the mechanics of how strategy.

Tests

The existing tests confirm the accuracy of the implementation.

Test Configuration:

Run any Conv1D/2D tests, just ensure io_parallel is used. Play with ParallelizationFactor and ReuseFactor as desired. Don't forget the limitations above!

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.

Unrolled CNN implementation

Description

Type of change

Tests

Checklist

Merge request reports