Description

This is a small fix for 2D conv layers when io_parallel is used and ParallelizationFactor is set to the output size. Currently, this causes a pipeline pragma to be ignored which results in no unrolling and very large latency/II. This fix adds explicit unroll pragmas in this case to restore the expected behavior.

Type of change

Bug fix (non-breaking change that fixes an issue)

Tests

I have tested with the small model below:

# simplified CNN as example
nbits = 4
sym = 1

model = Sequential()
model.add(Input((9,16,1), name = 'input_student'))

model.add(QConv2D(1, (3,3), kernel_quantizer = quantized_bits(nbits,0,alpha = sym), bias_quantizer = quantized_bits(nbits,0,alpha = 1),  name = 'Student_Conv1a'))                                             
model.add( QActivation('quantized_relu('+str(nbits)+')'))
model.add(Flatten())
model.add(QDense(10, name='fc1',
                 kernel_quantizer=quantized_bits(nbits,0,alpha=1), bias_quantizer=quantized_bits(nbits,0,alpha=1)))
model.add(QActivation(activation=quantized_relu(nbits), name='relu1'))
model.add(QDense(10, name='fc2',
                 kernel_quantizer=quantized_bits(nbits,0,alpha=1), bias_quantizer=quantized_bits(nbits,0,alpha=1)))
model.add(QActivation(activation=quantized_relu(nbits), name='relu2'))
model.add(Dense(2, name='output'))

model.summary()
model.compile(optimizer="adam", loss=['mse'], metrics=['mse'])


import hls4ml

# HLS4ML: extraction of the model
config = hls4ml.utils.config_from_keras_model(model, granularity='name', default_reuse_factor=1)
config['LayerName']['Student_Conv1a']['ParallelizationFactor'] = int(98/1)

print("-----------------------------------")
print_dict(config)
print("-----------------------------------")

# parameters for the conversion configuration 

cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType']     = 'io_parallel' #or io_stream
cfg['Strategy']   = 'latency'
cfg['HLSConfig']  = config
cfg['KerasModel'] = model
cfg['OutputDir']  = 'hls4ml_test_prj/'
cfg['XilinxPart'] = 'xcvu13p-flga2577-1-e'
cfg['Part'] = 'xcvu13p-flga2577-1-e'

hls_model = hls4ml.converters.keras_to_hls(cfg)

hls_model.compile()

It's not clear to me if this can be easily incorporated into a test. If this is necessary let me know and I will try a bit harder.

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

Fix for 2D conv layers in the special case of io_parallel with full parallelization

Description

Type of change

Tests

Checklist

Merge request reports