Skip to content

Optimized BatchNorm

Javier Duarte requested to merge github/fork/vloncar/bn_opt into master

Created by: vloncar

Optimizes the computation of BatchNorm. Instead of:

y = gamma * ((x - mean) / sqrt(var + eps)) + beta

do

scale = gamma / sqrt(var + eps)
bias = beta - gamma * mean / sqrt(var + eps)
y = scale * x + bias

For 3layer-bn model: Before:

+ Latency (clock cycles): 
    * Summary: 
    +-----+-----+-----+-----+----------+
    |  Latency  |  Interval | Pipeline |
    | min | max | min | max |   Type   |
    +-----+-----+-----+-----+----------+
    |   37|   37|    1|    1| function |
    +-----+-----+-----+-----+----------+

    + Detail: 
        * Instance: 
        +----------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |                                  |                       |  Latency  |  Interval | Pipeline |
        |             Instance             |         Module        | min | max | min | max |   Type   |
        +----------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |grp_dense_0_0_0_0_0_0_0_s_fu_103  |dense_0_0_0_0_0_0_0_s  |    2|    2|    1|    1| function |
        |grp_dense_0_0_0_0_0_0_0_2_fu_171  |dense_0_0_0_0_0_0_0_2  |    2|    2|    1|    1| function |
        |grp_dense_0_0_0_0_0_0_0_1_fu_207  |dense_0_0_0_0_0_0_0_1  |    2|    2|    1|    1| function |
        |grp_dense_0_0_0_0_0_0_fu_213      |dense_0_0_0_0_0_0      |    2|    2|    1|    1| function |
        |grp_normalize_0_0_0_0_0_1_fu_249  |normalize_0_0_0_0_0_1  |    2|    2|    1|    1| function |
        |grp_softmax_fu_317                |softmax                |    6|    6|    1|    1| function |
        |grp_normalize_0_0_0_0_0_s_fu_330  |normalize_0_0_0_0_0_s  |    2|    2|    1|    1| function |
        |grp_normalize_0_0_0_0_0_3_fu_366  |normalize_0_0_0_0_0_3  |    2|    2|    1|    1| function |
        |call_ret3_relu_1_fu_402           |relu_1                 |    0|    0|    1|    1| function |
        |call_ret7_relu_fu_470             |relu                   |    0|    0|    1|    1| function |
        |call_ret11_relu_2_fu_506          |relu_2                 |    0|    0|    1|    1| function |
        |grp_normalize_0_0_0_0_0_2_fu_542  |normalize_0_0_0_0_0_2  |    2|    2|    1|    1| function |
        |call_ret1_linear_1_fu_551         |linear_1               |    0|    0|    1|    1| function |
        |call_ret5_linear_fu_619           |linear                 |    0|    0|    1|    1| function |
        |call_ret9_linear_3_fu_655         |linear_3               |    0|    0|    1|    1| function |
        |call_ret13_linear_2_fu_691        |linear_2               |    0|    0|    1|    1| function |
        +----------------------------------+-----------------------+-----+-----+-----+-----+----------+

        * Loop: 
        N/A



================================================================
== Utilization Estimates
================================================================
* Summary: 
+---------------------+---------+-------+---------+--------+
|         Name        | BRAM_18K| DSP48E|    FF   |   LUT  |
+---------------------+---------+-------+---------+--------+
|DSP                  |        -|      -|        -|       -|
|Expression           |        -|      -|        0|       6|
|FIFO                 |        -|      -|        -|       -|
|Instance             |       13|   2530|    68216|  139310|
|Memory               |        -|      -|        -|       -|
|Multiplexer          |        -|      -|        -|      36|
|Register             |        -|      -|     4472|       -|
+---------------------+---------+-------+---------+--------+
|Total                |       13|   2530|    72688|  139352|
+---------------------+---------+-------+---------+--------+
|Available SLR        |     2160|   2760|   663360|  331680|
+---------------------+---------+-------+---------+--------+
|Utilization SLR (%)  |    ~0   |     91|       10|      42|
+---------------------+---------+-------+---------+--------+
|Available            |     4320|   5520|  1326720|  663360|
+---------------------+---------+-------+---------+--------+
|Utilization (%)      |    ~0   |     45|        5|      21|
+---------------------+---------+-------+---------+--------+

+ Detail: 
    * Instance: 
    +----------------------------------+-----------------------+---------+-------+-------+-------+
    |             Instance             |         Module        | BRAM_18K| DSP48E|   FF  |  LUT  |
    +----------------------------------+-----------------------+---------+-------+-------+-------+
    |grp_dense_0_0_0_0_0_0_fu_213      |dense_0_0_0_0_0_0      |        0|    113|   3017|   4862|
    |grp_dense_0_0_0_0_0_0_0_1_fu_207  |dense_0_0_0_0_0_0_0_1  |        0|    693|  14857|  28394|
    |grp_dense_0_0_0_0_0_0_0_2_fu_171  |dense_0_0_0_0_0_0_0_2  |        0|    597|  14320|  30150|
    |grp_dense_0_0_0_0_0_0_0_s_fu_103  |dense_0_0_0_0_0_0_0_s  |        0|    997|  25013|  58814|
    |call_ret5_linear_fu_619           |linear                 |        0|      0|      0|      0|
    |call_ret1_linear_1_fu_551         |linear_1               |        0|      0|      0|      0|
    |call_ret13_linear_2_fu_691        |linear_2               |        0|      0|      0|      0|
    |call_ret9_linear_3_fu_655         |linear_3               |        0|      0|      0|      0|
    |grp_normalize_0_0_0_0_0_1_fu_249  |normalize_0_0_0_0_0_1  |        0|     63|   4788|   4553|
    |grp_normalize_0_0_0_0_0_2_fu_542  |normalize_0_0_0_0_0_2  |        0|      5|    374|    360|
    |grp_normalize_0_0_0_0_0_3_fu_366  |normalize_0_0_0_0_0_3  |        0|     31|   2390|   2301|
    |grp_normalize_0_0_0_0_0_s_fu_330  |normalize_0_0_0_0_0_s  |        0|     31|   2390|   2301|
    |call_ret7_relu_fu_470             |relu                   |        0|      0|      0|    896|
    |call_ret3_relu_1_fu_402           |relu_1                 |        0|      0|      0|   1792|
    |call_ret11_relu_2_fu_506          |relu_2                 |        0|      0|      0|    896|
    |grp_softmax_fu_317                |softmax                |       13|      0|   1067|   3991|
    +----------------------------------+-----------------------+---------+-------+-------+-------+
    |Total                             |                       |       13|   2530|  68216| 139310|
    +----------------------------------+-----------------------+---------+-------+-------+-------+

After:

+ Latency (clock cycles): 
    * Summary: 
    +-----+-----+-----+-----+----------+
    |  Latency  |  Interval | Pipeline |
    | min | max | min | max |   Type   |
    +-----+-----+-----+-----+----------+
    |   30|   30|    1|    1| function |
    +-----+-----+-----+-----+----------+

    + Detail: 
        * Instance: 
        +----------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |                                  |                       |  Latency  |  Interval | Pipeline |
        |             Instance             |         Module        | min | max | min | max |   Type   |
        +----------------------------------+-----------------------+-----+-----+-----+-----+----------+
        |grp_dense_0_0_0_0_0_0_0_s_fu_103  |dense_0_0_0_0_0_0_0_s  |    2|    2|    1|    1| function |
        |grp_dense_0_0_0_0_0_0_0_2_fu_171  |dense_0_0_0_0_0_0_0_2  |    2|    2|    1|    1| function |
        |grp_dense_0_0_0_0_0_0_0_1_fu_207  |dense_0_0_0_0_0_0_0_1  |    2|    2|    1|    1| function |
        |grp_dense_0_0_0_0_0_0_fu_213      |dense_0_0_0_0_0_0      |    2|    2|    1|    1| function |
        |grp_softmax_fu_249                |softmax                |    6|    6|    1|    1| function |
        |grp_normalize_0_0_0_0_0_1_fu_262  |normalize_0_0_0_0_0_1  |    1|    1|    1|    1| function |
        |grp_normalize_0_0_0_0_0_s_fu_330  |normalize_0_0_0_0_0_s  |    1|    1|    1|    1| function |
        |grp_normalize_0_0_0_0_0_3_fu_366  |normalize_0_0_0_0_0_3  |    1|    1|    1|    1| function |
        |call_ret3_relu_1_fu_402           |relu_1                 |    0|    0|    1|    1| function |
        |call_ret7_relu_fu_470             |relu                   |    0|    0|    1|    1| function |
        |call_ret11_relu_2_fu_506          |relu_2                 |    0|    0|    1|    1| function |
        |grp_normalize_0_0_0_0_0_2_fu_542  |normalize_0_0_0_0_0_2  |    1|    1|    1|    1| function |
        |call_ret1_linear_1_fu_551         |linear_1               |    0|    0|    1|    1| function |
        |call_ret5_linear_fu_619           |linear                 |    0|    0|    1|    1| function |
        |call_ret9_linear_3_fu_655         |linear_3               |    0|    0|    1|    1| function |
        |call_ret13_linear_2_fu_691        |linear_2               |    0|    0|    1|    1| function |
        +----------------------------------+-----------------------+-----+-----+-----+-----+----------+

        * Loop: 
        N/A



================================================================
== Utilization Estimates
================================================================
* Summary: 
+---------------------+---------+-------+---------+--------+
|         Name        | BRAM_18K| DSP48E|    FF   |   LUT  |
+---------------------+---------+-------+---------+--------+
|DSP                  |        -|      -|        -|       -|
|Expression           |        -|      -|        0|       6|
|FIFO                 |        -|      -|        -|       -|
|Instance             |       13|   2530|    61700|  134713|
|Memory               |        -|      -|        -|       -|
|Multiplexer          |        -|      -|        -|      36|
|Register             |        -|      -|     4545|       -|
+---------------------+---------+-------+---------+--------+
|Total                |       13|   2530|    66245|  134755|
+---------------------+---------+-------+---------+--------+
|Available SLR        |     2160|   2760|   663360|  331680|
+---------------------+---------+-------+---------+--------+
|Utilization SLR (%)  |    ~0   |     91|        9|      40|
+---------------------+---------+-------+---------+--------+
|Available            |     4320|   5520|  1326720|  663360|
+---------------------+---------+-------+---------+--------+
|Utilization (%)      |    ~0   |     45|        4|      20|
+---------------------+---------+-------+---------+--------+

+ Detail: 
    * Instance: 
    +----------------------------------+-----------------------+---------+-------+-------+-------+
    |             Instance             |         Module        | BRAM_18K| DSP48E|   FF  |  LUT  |
    +----------------------------------+-----------------------+---------+-------+-------+-------+
    |grp_dense_0_0_0_0_0_0_fu_213      |dense_0_0_0_0_0_0      |        0|    113|   3017|   4862|
    |grp_dense_0_0_0_0_0_0_0_1_fu_207  |dense_0_0_0_0_0_0_0_1  |        0|    693|  14857|  28394|
    |grp_dense_0_0_0_0_0_0_0_2_fu_171  |dense_0_0_0_0_0_0_0_2  |        0|    597|  14320|  30150|
    |grp_dense_0_0_0_0_0_0_0_s_fu_103  |dense_0_0_0_0_0_0_0_s  |        0|    997|  25013|  58814|
    |call_ret5_linear_fu_619           |linear                 |        0|      0|      0|      0|
    |call_ret1_linear_1_fu_551         |linear_1               |        0|      0|      0|      0|
    |call_ret13_linear_2_fu_691        |linear_2               |        0|      0|      0|      0|
    |call_ret9_linear_3_fu_655         |linear_3               |        0|      0|      0|      0|
    |grp_normalize_0_0_0_0_0_1_fu_262  |normalize_0_0_0_0_0_1  |        0|     63|   1654|   2367|
    |grp_normalize_0_0_0_0_0_2_fu_542  |normalize_0_0_0_0_0_2  |        0|      5|    128|    185|
    |grp_normalize_0_0_0_0_0_3_fu_366  |normalize_0_0_0_0_0_3  |        0|     31|    822|   1183|
    |grp_normalize_0_0_0_0_0_s_fu_330  |normalize_0_0_0_0_0_s  |        0|     31|    822|   1183|
    |call_ret7_relu_fu_470             |relu                   |        0|      0|      0|    896|
    |call_ret3_relu_1_fu_402           |relu_1                 |        0|      0|      0|   1792|
    |call_ret11_relu_2_fu_506          |relu_2                 |        0|      0|      0|    896|
    |grp_softmax_fu_249                |softmax                |       13|      0|   1067|   3991|
    +----------------------------------+-----------------------+---------+-------+-------+-------+
    |Total                             |                       |       13|   2530|  61700| 134713|
    +----------------------------------+-----------------------+---------+-------+-------+-------+

Merge request reports

Loading