Skip to content

[WIP] Readable layer function names

Javier Duarte requested to merge github/fork/thesps/layer-labels into master

Created by: thesps

In the C Synthesized HLS project, our layer modules have names like:

dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_1

With a view to making the logs and reports a bit more parsable, I tried to label the layer function calls with the layer names from the model. This kind of approach might also help the Catapult backend where we anticipate controlling certain directives from tcl which will need to refer to the module names.

Unfortunately, functions can't be labelled in the same way as loops (at least in Vivado 2019.2) , which might look like:

fc1:
nnet::dense<input_t, layer2_t, config2>(fc1_input, layer2_out, w2, b2);

or

fc1: {
    nnet::dense<input_t, layer2_t, config2>(fc1_input, layer2_out, w2, b2);
}

What I found works is to define a kind of wrapper function, like:

void fc1(input_t* fc1_input, layer2_t* layer2_out, weight2_t* w2, bias2_t* b2){
    #pragma HLS inline off
    nnet::dense<input_t, layer2_t, config2>(fc1_input, layer2_out, w2, b2); // fc1
}

The function name corresponds to the layer name in the model, and the inline off pragma prevents it being removed from the hierarchy. So now, the writer defines all these functions, and then replace each call to e.g. nnet::dense<...>(...) with e.g. fc1(...).

Now the report looks like this:

+ Detail: 
    * Instance: 
    +----------------------------+----------+---------+-------+----+-------+-----+
    |          Instance          |  Module  | BRAM_18K| DSP48E| FF |  LUT  | URAM|
    +----------------------------+----------+---------+-------+----+-------+-----+
    |fc1_ret_fc1_fu_191          |fc1       |        0|     25|   0|   7863|    0|
    |fc2_ret_fc2_fu_123          |fc2       |        0|      0|   0|  11540|    0|
    |fc3_ret_fc3_fu_197          |fc3       |        0|      0|   0|   7803|    0|
    |output_ret_output_r_fu_364  |output_r  |        0|      0|   0|   1297|    0|
    |relu1_ret_relu1_fu_233      |relu1     |        0|      0|   0|   4480|    0|
    |relu2_ret_relu2_fu_301      |relu2     |        0|      0|   0|   2240|    0|
    |relu3_ret_relu3_fu_337      |relu3     |        0|      0|   0|   1610|    0|
    |grp_softmax_fu_391          |softmax   |        4|      5|  96|     84|    0|
    +----------------------------+----------+---------+-------+----+-------+-----+
    |Total                       |          |        4|     30|  96|  36917|    0|
    +----------------------------+----------+---------+-------+----+-------+-----+

whereas on master we had:

+ Detail: 
    * Instance: 
    +--------------------------------------------------------------------------------+---------------------------------------------------------------+---------+-------+----+-------+-----+
    |                                    Instance                                    |                             Module                            | BRAM_18K| DSP48E| FF |  LUT  | URAM|
    +--------------------------------------------------------------------------------+---------------------------------------------------------------+---------+-------+----+-------+-----+
    |call_ret2_dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_1_fu_123  |dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_1  |        0|      0|   0|  11540|    0|
    |call_ret_dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_2_fu_227   |dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_2  |        0|     25|   0|   7863|    0|
    |call_ret4_dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_s_fu_191  |dense_latency_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_s  |        0|      0|   0|   7803|    0|
    |call_ret6_dense_latency_ap_fixed_ap_fixed_config11_0_0_0_0_0_0_fu_364           |dense_latency_ap_fixed_ap_fixed_config11_0_0_0_0_0_0           |        0|      0|   0|   1297|    0|
    |call_ret5_relu_ap_fixed_ap_fixed_7_1_0_0_0_relu_config10_s_fu_337               |relu_ap_fixed_ap_fixed_7_1_0_0_0_relu_config10_s               |        0|      0|   0|   1610|    0|
    |call_ret1_relu_ap_fixed_ap_fixed_7_1_0_0_0_relu_config4_s_fu_233                |relu_ap_fixed_ap_fixed_7_1_0_0_0_relu_config4_s                |        0|      0|   0|   4480|    0|
    |call_ret3_relu_ap_fixed_ap_fixed_7_1_0_0_0_relu_config7_s_fu_301                |relu_ap_fixed_ap_fixed_7_1_0_0_0_relu_config7_s                |        0|      0|   0|   2240|    0|
    |grp_softmax_latency_ap_fixed_ap_fixed_softmax_config13_s_fu_391                 |softmax_latency_ap_fixed_ap_fixed_softmax_config13_s           |        4|      5|  93|     78|    0|
    +--------------------------------------------------------------------------------+---------------------------------------------------------------+---------+-------+----+-------+-----+
    |Total                                                                           |                                                               |        4|     30|  93|  36911|    0|
    +--------------------------------------------------------------------------------+---------------------------------------------------------------+---------+-------+----+-------+-----+

It seems to have remove 3 FFs and added 6 LUTs to Softmax, but other than that there is no change in the reports.

Obviously the actual implementation of this is a bit of a hacky proof of principle here, but if people like the idea, we could implement it properly. Basically, I think we'll just need to add another type of template to the backend templates. Or, extend the function templates we have now to be able to define both the function definition as well as the string to call it.

Merge request reports

Loading