Discussion - Inlined Conv slows down latency significantly (up to x15 - x20)
Created by: bo3z
Description
- While testing some code unrolling for the hls4ml Optimisation API, I noticed that inlining in Conv2D can allocate unnecessary RAM.
- When tested on the current version of Conv2D (line buffer, streaming, Resource strategy, RF > 1), there is a significant difference in latency (between 3x and nearly 20x)
- Still unsure what cause this bug and if it is present for (i) Latency strategy, (ii) RF = 1 and (iii) encoded convolution. But is certainly seems that for RF > 1 in Resource to seem a bug. Opening this as a discussion until further synthesis results are obtained.
Type of change
-
Bug fix -
Breaking change (potentially) -
Discussion
Tests
- Below are report files following a full Vivado synthesis and CoSim analysis, for the SVHN paper model, with RF = 9
- Underscored _master, corresponds to implementations of the current, line-buffer, resource, streaming Conv2D
- Underscored _no_pragma, corresponds to implementations with the inline keyword removed, as per the PR
- Inspecting the report files, the models are clearly equivalent (in terms of HLS config and architecture) as they use the same number of DSPs and BRAM and similar utilisation of LUT & FF. However, latencies differ up to 20x.
- Source of report files: https://cernbox.cern.ch/s/DK4v2KUTiBmFvYN
Checklist
-
I have read the guidelines for contributing. -
I have commented my code, particularly in hard-to-understand areas. -
I have made corresponding changes to the documentation. -
My changes generate no new warnings. -
I have installed and run pre-commit
on the files I edited or added. -
I have added tests that prove my fix is effective or that my feature works.