Javier Duarte requested to merge github/fork/vloncar/csim_integration into master May 28, 2020

Created by: thesps

This is a big one... There are three significant new things in this PR: QKeras support, predict, and enhancements to the API. Both predict and expanded API were essential for the QKeras support. In reverse order:

API

One can do much more by importing hls4ml than before. Previously this stopped at, basically:

import hls4ml
import yaml; cfg = open('keras-config.yml'); cfg = yaml.load(cfg)
hls_model = hls4ml.converters.keras_to_hls(cfg)

Now it's much richer and we can do:

import tensorflow as tf
import hls4ml
# load a model
kmodel = tf.keras.models.load_model('my_model.h5')
# get the 'HLSConfig' part of the config 'file' (now just dictionary)
hls_cfg = hls4ml.utils.config_from_keras_model(kmodel, granularity='name')
# Modify the precision and reuse if desired

# create the model object. Parameters that were normally outside the 'HLSConfig' part of the file are now function arguments
hls_model = hls4ml.converters.convert_from_keras_model(model, output_dir='my-hls-test', hls_config = hls_cfg, fpga_part=...)
# write the project, but also compile the csim library (more on that below)
hls_model.compile()
# run c-synthesis, logic synthesis
hls_model.build(synth=True, vsynth=True)

All the 'old' ways of working still apply, so hls4ml convert -c ..., hls4ml build -cs -p ... from the command line still work. And the API allows the 'old' way above: keras_to_hls with a config file returning an HLSModel (but you can do more with the model now).

`predict`, `trace`

This is the reason the branch is called csim_integration. Now with an HLSModel object (created as above with the API, for example), one can do:

y = hls_model.predict(X)

It's pretty self explanatory, but awesome. X is an array object just as you would use to predict with the Keras (or whatever) Python model, and the returned y is an array as well. Then you can compare your predictions vs. the original float model with ease, make ROC curves, etc. Technically this is implemented by compiling the HLS project (together with the ap_fixed and other headers) into a shared object, and binding it to Python using ctypes. So what you get is "csim" but without running Vivado, and in fact this works without an installation of Vivado on the system(!).

trace is the related capability to get the output from individual layers from the model for lower-level debugging. This is useful when using custom precision throughout the model for example. It's used like: trace = hls_model.trace(X). The returned object is a dictionary. The names of the (HLS) model layers are used as keys, and each one points to the array captured at the output of that layer. A utility has been added to the profiling to get the equivalent for a Keras model: trace = hls4ml.model.profiling.get_ymodel(keras_model, X). hls4ml does things like separating activations out of Dense layers into their own layers, and this utility does that too to make it easier to make comparisons.

QKeras support

We can now support QKeras models using most of the quantizers that QKeras has. Some more detail can be obtained from this presentation. When a Keras model uses QDense, QConv, etc layers, the quantizer is extracted and used to quantize the weights. The HLS data type is set according to the settings of the quantizer. The QKeras quantizer alpha parameter is handled with a new layer ApplyAlpha (uses BatchNorm for inference) to scale the weights after the Dense or Conv layer. The proper conversion is handled at the config-dictionary level, and with new Optimizer passes. To get proper performance, users should use the hls4ml config file utility demonstrated above:

hls_cfg = hls4ml.utils.config_from_keras_model(qkeras_model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(qkeras_model, ...)

Other

There is a new implementation of Softmax activation. The existing implementation had a few hard coded constants so was difficult to configure with different data types, and would occasionally bug and output the largest value for the smallest input value (I think a saturation issue). The new version was also made more configurable, with the possibility for different types for the e^x and 1/x tables. I will follow up this PR with a comment benchmarking the performance, but I've found it to be smaller, faster and more accurate.

QKeras, predict/trace, API enhancements

API

predict, trace

QKeras support

Other

Merge request reports

`predict`, `trace`