Vivado synthesis report - zero BRAM utilisation (OOC)

Created by: bo3z

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
Check that the issue hasn't already been reported, by checking the currently open issues.
If there are steps to reproduce the problem, make sure to write them down below.
If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.

Quick summary

When using Resource strategy with Vivado backend, BRAM utilisation after full synthesis is zero. Inconsistency between Verilog and VHDL on the same designs. Four designs, all four of which use exactly the same number of other resources, have the same latency and implement the same architecture but have very different BRAM utilisation - this is not possible, as none of the memory was changed to LUT (other resources stay the same). This is a bug in reporting BRAM utilisation.

Details

Synthesize a one-layer model with 16 inputs and 16 outputs. The architecture-wide precision is set to 16 bits and biases are disabled. The reuse factor is set to 4. This is essentially a matrix-vector product between a 16x16 matrix and 16x1 vector implemented across 4 clock cycles. Set strategy to Resource, so that weights are stored in BRAM.

The BRAM can be estimated with the formula n_inputs * n_outputs * bit_width / (k * reuse_factor), where k is the constant determining BRAM width. In this case, the BRAM is set to 36-bit width, as it is quite shallow, so the estimate for all the cases below should be (16 * 16 * 16) / (4 * 36) = 28.444.

There are four test cases, and the results don’t really align [attached below, synthesis reports]. In all the cases, I synthesised both the Verilog and VHDL that HLS generated, performed a full Vivado synthesis and design optimisation. The cases are:

No hardware pragma specified for the weights. The HLS estimate for both (1) Verilog and (2) VHDL was 29 BRAMs (in line with my estimate). However, full synthesis reported 0 BRAMs for both Verilog and VHDL and in both case the synthesis logs said: weights were implemented using auto ROMs.
Specifying a hardware pragma in HLS - I used the #pragma HLS resource directive, explicitly saying the weights should go to ROM_nP_BRAM and synthesised the (3) VHDL output. The logs now stated that weights were implemented using block ROMs, compared to the auto ROMs beforehand. In this case, the HLS estimate was again 29 but after full synthesis, there was 1 18-bit BRAM used. However, all the memory resources (LUT, registers) remained exactly the same (after full synthesis and design optimisation) as in the previous case. This seems like a bug, as there is now more BRAM used, for the same design and same number of other memory resources.
Last case, I specified a hardware pragma in HLS but this time I used the (4) Verilog output. Again, the HLS estimate was 29 which was accurate. The logs again stated that weights were implemented using block ROMs, compared to the auto ROMs beforehand. Now, after full synthesis, 14.5 BRAMs were used. This is most in line with the expectation (off by exactly a factor of 2). However, other resources (LUT, registers) remained exactly the same (after full synthesis and design optimisation). Again, this is likely a bug as all three designs are equivalent but use varying amounts of BRAM.

Steps to Reproduce

To reproduce, the source code of nnet_dense_resource.h and vivado_synth.tcl needs to be modified. To match the cases above:

1st case for Verilog, no pragma, in vivado_synth.tcl change the from vhdl to verilog, but no change to nnet_dense_resource.h
2nd case for VHDL, no pragma, change nothing [current master branch]
3rd case for VHDL, pragma included, no change in vivado_synth.tcl from master branch (keep vhdl) but change line 29 in nnet_dense_resource.h to #pragma HLS RESOURCE variable=weights core=ROM_nP_BRAM
4th case for Verilog, change both nnet_dense and the synthesis script.

Clone the hls4ml repository
Checkout the master branch, with commit hash: 2e71ff45
Run conversion on model file with code [see below]:

input_shape = (16, )
output_shape = 16

keras_model = Sequential()
keras_model.add(Dense(output_shape, input_shape=input_shape, name='dense'))
keras_model.compile()

weights = np.arange(np.prod(input_shape) * output_shape).reshape(*input_shape, output_shape) + 1
weights = weights / np.max(weights)
keras_model.layers[0].set_weights([weights, np.zeros(output_shape)])

hls_config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name', default_reuse_factor=4)     
hls_config['Model']['Strategy'] = 'Resource'

Expected behavior

All four designs should use the same resources, including BRAM.

Actual behavior

All four designs have the same resources (some of the DSPs were re-implemented as LUTs in one design) except for BRAM. Two of the design use no BRAM even though logs say weights were implemented using BRAM. One design uses 1 BRAM and the last one uses 14.5, which is expected. This is likely a problem in the report / synthesis script. Since all of the IPs are equivalent, there should be no difference in BRAM.

Possible fix

A possible solution is to do with the BRAM being OOC and Vivado not including it in the synthesis, see here and here. Haven't managed yet to find a way to modify the synthesis scripts to include OOC files - some of the pragmas I looked into are link_design as well as synthesising the exported IP, rather than HDL files.

A short-term solution is re-inserting the pragma in nnet_dense_resource and changing the synthesis script to use Verilog. Not sure if this bug occurs for larger models.

verilog_no_pragma.txt verilog_pragma.txt vhdl_no_pragma.txt vhdl_pragma.txt