Fix for 2D conv layers in the special case of io_parallel with full parallelization
Created by: drankincms
Description
This is a small fix for 2D conv layers when io_parallel
is used and ParallelizationFactor
is set to the output size. Currently, this causes a pipeline pragma to be ignored which results in no unrolling and very large latency/II. This fix adds explicit unroll pragmas in this case to restore the expected behavior.
Type of change
-
Bug fix (non-breaking change that fixes an issue)
Tests
I have tested with the small model below:
# simplified CNN as example
nbits = 4
sym = 1
model = Sequential()
model.add(Input((9,16,1), name = 'input_student'))
model.add(QConv2D(1, (3,3), kernel_quantizer = quantized_bits(nbits,0,alpha = sym), bias_quantizer = quantized_bits(nbits,0,alpha = 1), name = 'Student_Conv1a'))
model.add( QActivation('quantized_relu('+str(nbits)+')'))
model.add(Flatten())
model.add(QDense(10, name='fc1',
kernel_quantizer=quantized_bits(nbits,0,alpha=1), bias_quantizer=quantized_bits(nbits,0,alpha=1)))
model.add(QActivation(activation=quantized_relu(nbits), name='relu1'))
model.add(QDense(10, name='fc2',
kernel_quantizer=quantized_bits(nbits,0,alpha=1), bias_quantizer=quantized_bits(nbits,0,alpha=1)))
model.add(QActivation(activation=quantized_relu(nbits), name='relu2'))
model.add(Dense(2, name='output'))
model.summary()
model.compile(optimizer="adam", loss=['mse'], metrics=['mse'])
import hls4ml
# HLS4ML: extraction of the model
config = hls4ml.utils.config_from_keras_model(model, granularity='name', default_reuse_factor=1)
config['LayerName']['Student_Conv1a']['ParallelizationFactor'] = int(98/1)
print("-----------------------------------")
print_dict(config)
print("-----------------------------------")
# parameters for the conversion configuration
cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType'] = 'io_parallel' #or io_stream
cfg['Strategy'] = 'latency'
cfg['HLSConfig'] = config
cfg['KerasModel'] = model
cfg['OutputDir'] = 'hls4ml_test_prj/'
cfg['XilinxPart'] = 'xcvu13p-flga2577-1-e'
cfg['Part'] = 'xcvu13p-flga2577-1-e'
hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()
It's not clear to me if this can be easily incorporated into a test. If this is necessary let me know and I will try a bit harder.
Checklist
-
I have read the guidelines for contributing. -
I have commented my code, particularly in hard-to-understand areas. -
I have made corresponding changes to the documentation. -
My changes generate no new warnings. -
I have installed and run pre-commit
on the files I edited or added. -
I have added tests that prove my fix is effective or that my feature works.