Optimized BatchNorm
Created by: vloncar
Optimizes the computation of BatchNorm. Instead of:
y = gamma * ((x - mean) / sqrt(var + eps)) + beta
do
scale = gamma / sqrt(var + eps)
bias = beta - gamma * mean / sqrt(var + eps)
y = scale * x + bias
For 3layer-bn model: Before:
+ Latency (clock cycles):
* Summary:
+-----+-----+-----+-----+----------+
| Latency | Interval | Pipeline |
| min | max | min | max | Type |
+-----+-----+-----+-----+----------+
| 37| 37| 1| 1| function |
+-----+-----+-----+-----+----------+
+ Detail:
* Instance:
+----------------------------------+-----------------------+-----+-----+-----+-----+----------+
| | | Latency | Interval | Pipeline |
| Instance | Module | min | max | min | max | Type |
+----------------------------------+-----------------------+-----+-----+-----+-----+----------+
|grp_dense_0_0_0_0_0_0_0_s_fu_103 |dense_0_0_0_0_0_0_0_s | 2| 2| 1| 1| function |
|grp_dense_0_0_0_0_0_0_0_2_fu_171 |dense_0_0_0_0_0_0_0_2 | 2| 2| 1| 1| function |
|grp_dense_0_0_0_0_0_0_0_1_fu_207 |dense_0_0_0_0_0_0_0_1 | 2| 2| 1| 1| function |
|grp_dense_0_0_0_0_0_0_fu_213 |dense_0_0_0_0_0_0 | 2| 2| 1| 1| function |
|grp_normalize_0_0_0_0_0_1_fu_249 |normalize_0_0_0_0_0_1 | 2| 2| 1| 1| function |
|grp_softmax_fu_317 |softmax | 6| 6| 1| 1| function |
|grp_normalize_0_0_0_0_0_s_fu_330 |normalize_0_0_0_0_0_s | 2| 2| 1| 1| function |
|grp_normalize_0_0_0_0_0_3_fu_366 |normalize_0_0_0_0_0_3 | 2| 2| 1| 1| function |
|call_ret3_relu_1_fu_402 |relu_1 | 0| 0| 1| 1| function |
|call_ret7_relu_fu_470 |relu | 0| 0| 1| 1| function |
|call_ret11_relu_2_fu_506 |relu_2 | 0| 0| 1| 1| function |
|grp_normalize_0_0_0_0_0_2_fu_542 |normalize_0_0_0_0_0_2 | 2| 2| 1| 1| function |
|call_ret1_linear_1_fu_551 |linear_1 | 0| 0| 1| 1| function |
|call_ret5_linear_fu_619 |linear | 0| 0| 1| 1| function |
|call_ret9_linear_3_fu_655 |linear_3 | 0| 0| 1| 1| function |
|call_ret13_linear_2_fu_691 |linear_2 | 0| 0| 1| 1| function |
+----------------------------------+-----------------------+-----+-----+-----+-----+----------+
* Loop:
N/A
================================================================
== Utilization Estimates
================================================================
* Summary:
+---------------------+---------+-------+---------+--------+
| Name | BRAM_18K| DSP48E| FF | LUT |
+---------------------+---------+-------+---------+--------+
|DSP | -| -| -| -|
|Expression | -| -| 0| 6|
|FIFO | -| -| -| -|
|Instance | 13| 2530| 68216| 139310|
|Memory | -| -| -| -|
|Multiplexer | -| -| -| 36|
|Register | -| -| 4472| -|
+---------------------+---------+-------+---------+--------+
|Total | 13| 2530| 72688| 139352|
+---------------------+---------+-------+---------+--------+
|Available SLR | 2160| 2760| 663360| 331680|
+---------------------+---------+-------+---------+--------+
|Utilization SLR (%) | ~0 | 91| 10| 42|
+---------------------+---------+-------+---------+--------+
|Available | 4320| 5520| 1326720| 663360|
+---------------------+---------+-------+---------+--------+
|Utilization (%) | ~0 | 45| 5| 21|
+---------------------+---------+-------+---------+--------+
+ Detail:
* Instance:
+----------------------------------+-----------------------+---------+-------+-------+-------+
| Instance | Module | BRAM_18K| DSP48E| FF | LUT |
+----------------------------------+-----------------------+---------+-------+-------+-------+
|grp_dense_0_0_0_0_0_0_fu_213 |dense_0_0_0_0_0_0 | 0| 113| 3017| 4862|
|grp_dense_0_0_0_0_0_0_0_1_fu_207 |dense_0_0_0_0_0_0_0_1 | 0| 693| 14857| 28394|
|grp_dense_0_0_0_0_0_0_0_2_fu_171 |dense_0_0_0_0_0_0_0_2 | 0| 597| 14320| 30150|
|grp_dense_0_0_0_0_0_0_0_s_fu_103 |dense_0_0_0_0_0_0_0_s | 0| 997| 25013| 58814|
|call_ret5_linear_fu_619 |linear | 0| 0| 0| 0|
|call_ret1_linear_1_fu_551 |linear_1 | 0| 0| 0| 0|
|call_ret13_linear_2_fu_691 |linear_2 | 0| 0| 0| 0|
|call_ret9_linear_3_fu_655 |linear_3 | 0| 0| 0| 0|
|grp_normalize_0_0_0_0_0_1_fu_249 |normalize_0_0_0_0_0_1 | 0| 63| 4788| 4553|
|grp_normalize_0_0_0_0_0_2_fu_542 |normalize_0_0_0_0_0_2 | 0| 5| 374| 360|
|grp_normalize_0_0_0_0_0_3_fu_366 |normalize_0_0_0_0_0_3 | 0| 31| 2390| 2301|
|grp_normalize_0_0_0_0_0_s_fu_330 |normalize_0_0_0_0_0_s | 0| 31| 2390| 2301|
|call_ret7_relu_fu_470 |relu | 0| 0| 0| 896|
|call_ret3_relu_1_fu_402 |relu_1 | 0| 0| 0| 1792|
|call_ret11_relu_2_fu_506 |relu_2 | 0| 0| 0| 896|
|grp_softmax_fu_317 |softmax | 13| 0| 1067| 3991|
+----------------------------------+-----------------------+---------+-------+-------+-------+
|Total | | 13| 2530| 68216| 139310|
+----------------------------------+-----------------------+---------+-------+-------+-------+
After:
+ Latency (clock cycles):
* Summary:
+-----+-----+-----+-----+----------+
| Latency | Interval | Pipeline |
| min | max | min | max | Type |
+-----+-----+-----+-----+----------+
| 30| 30| 1| 1| function |
+-----+-----+-----+-----+----------+
+ Detail:
* Instance:
+----------------------------------+-----------------------+-----+-----+-----+-----+----------+
| | | Latency | Interval | Pipeline |
| Instance | Module | min | max | min | max | Type |
+----------------------------------+-----------------------+-----+-----+-----+-----+----------+
|grp_dense_0_0_0_0_0_0_0_s_fu_103 |dense_0_0_0_0_0_0_0_s | 2| 2| 1| 1| function |
|grp_dense_0_0_0_0_0_0_0_2_fu_171 |dense_0_0_0_0_0_0_0_2 | 2| 2| 1| 1| function |
|grp_dense_0_0_0_0_0_0_0_1_fu_207 |dense_0_0_0_0_0_0_0_1 | 2| 2| 1| 1| function |
|grp_dense_0_0_0_0_0_0_fu_213 |dense_0_0_0_0_0_0 | 2| 2| 1| 1| function |
|grp_softmax_fu_249 |softmax | 6| 6| 1| 1| function |
|grp_normalize_0_0_0_0_0_1_fu_262 |normalize_0_0_0_0_0_1 | 1| 1| 1| 1| function |
|grp_normalize_0_0_0_0_0_s_fu_330 |normalize_0_0_0_0_0_s | 1| 1| 1| 1| function |
|grp_normalize_0_0_0_0_0_3_fu_366 |normalize_0_0_0_0_0_3 | 1| 1| 1| 1| function |
|call_ret3_relu_1_fu_402 |relu_1 | 0| 0| 1| 1| function |
|call_ret7_relu_fu_470 |relu | 0| 0| 1| 1| function |
|call_ret11_relu_2_fu_506 |relu_2 | 0| 0| 1| 1| function |
|grp_normalize_0_0_0_0_0_2_fu_542 |normalize_0_0_0_0_0_2 | 1| 1| 1| 1| function |
|call_ret1_linear_1_fu_551 |linear_1 | 0| 0| 1| 1| function |
|call_ret5_linear_fu_619 |linear | 0| 0| 1| 1| function |
|call_ret9_linear_3_fu_655 |linear_3 | 0| 0| 1| 1| function |
|call_ret13_linear_2_fu_691 |linear_2 | 0| 0| 1| 1| function |
+----------------------------------+-----------------------+-----+-----+-----+-----+----------+
* Loop:
N/A
================================================================
== Utilization Estimates
================================================================
* Summary:
+---------------------+---------+-------+---------+--------+
| Name | BRAM_18K| DSP48E| FF | LUT |
+---------------------+---------+-------+---------+--------+
|DSP | -| -| -| -|
|Expression | -| -| 0| 6|
|FIFO | -| -| -| -|
|Instance | 13| 2530| 61700| 134713|
|Memory | -| -| -| -|
|Multiplexer | -| -| -| 36|
|Register | -| -| 4545| -|
+---------------------+---------+-------+---------+--------+
|Total | 13| 2530| 66245| 134755|
+---------------------+---------+-------+---------+--------+
|Available SLR | 2160| 2760| 663360| 331680|
+---------------------+---------+-------+---------+--------+
|Utilization SLR (%) | ~0 | 91| 9| 40|
+---------------------+---------+-------+---------+--------+
|Available | 4320| 5520| 1326720| 663360|
+---------------------+---------+-------+---------+--------+
|Utilization (%) | ~0 | 45| 4| 20|
+---------------------+---------+-------+---------+--------+
+ Detail:
* Instance:
+----------------------------------+-----------------------+---------+-------+-------+-------+
| Instance | Module | BRAM_18K| DSP48E| FF | LUT |
+----------------------------------+-----------------------+---------+-------+-------+-------+
|grp_dense_0_0_0_0_0_0_fu_213 |dense_0_0_0_0_0_0 | 0| 113| 3017| 4862|
|grp_dense_0_0_0_0_0_0_0_1_fu_207 |dense_0_0_0_0_0_0_0_1 | 0| 693| 14857| 28394|
|grp_dense_0_0_0_0_0_0_0_2_fu_171 |dense_0_0_0_0_0_0_0_2 | 0| 597| 14320| 30150|
|grp_dense_0_0_0_0_0_0_0_s_fu_103 |dense_0_0_0_0_0_0_0_s | 0| 997| 25013| 58814|
|call_ret5_linear_fu_619 |linear | 0| 0| 0| 0|
|call_ret1_linear_1_fu_551 |linear_1 | 0| 0| 0| 0|
|call_ret13_linear_2_fu_691 |linear_2 | 0| 0| 0| 0|
|call_ret9_linear_3_fu_655 |linear_3 | 0| 0| 0| 0|
|grp_normalize_0_0_0_0_0_1_fu_262 |normalize_0_0_0_0_0_1 | 0| 63| 1654| 2367|
|grp_normalize_0_0_0_0_0_2_fu_542 |normalize_0_0_0_0_0_2 | 0| 5| 128| 185|
|grp_normalize_0_0_0_0_0_3_fu_366 |normalize_0_0_0_0_0_3 | 0| 31| 822| 1183|
|grp_normalize_0_0_0_0_0_s_fu_330 |normalize_0_0_0_0_0_s | 0| 31| 822| 1183|
|call_ret7_relu_fu_470 |relu | 0| 0| 0| 896|
|call_ret3_relu_1_fu_402 |relu_1 | 0| 0| 0| 1792|
|call_ret11_relu_2_fu_506 |relu_2 | 0| 0| 0| 896|
|grp_softmax_fu_249 |softmax | 13| 0| 1067| 3991|
+----------------------------------+-----------------------+---------+-------+-------+-------+
|Total | | 13| 2530| 61700| 134713|
+----------------------------------+-----------------------+---------+-------+-------+-------+