Embedding layer
Original implementation from @drewpersigehl: https://github.com/drewpersigehl/L1MET_EmbeddingLayer
Addresses #422 (closed)
Simple test script: https://gist.github.com/jmduarte/302c1d629747d86b926fa13b485a1682
-
io_parallel
implementation -
io_stream
implementation -
add tests
A few peculiarities to note (not yet enforced):
-
weight_t
should always be equal tores_t
-
data_t
should always be[u]int
orap_[u]int<X>
Example HLS latency/resources for n_in = 100
, vocab_size = 13
, n_out = 8
, ap_uint<4>
for inputs, and ap_fixed<16,6>
for outputs.
-
io_parallel
:
+---------+---------+----------+----------+-----+-----+----------+
| Latency (cycles) | Latency (absolute) | Interval | Pipeline |
| min | max | min | max | min | max | Type |
+---------+---------+----------+----------+-----+-----+----------+
| 1| 1| 5.000 ns | 5.000 ns | 1| 1| function |
+---------+---------+----------+----------+-----+-----+----------+
+---------------------+---------+-------+---------+---------+-----+
| Name | BRAM_18K| DSP48E| FF | LUT | URAM|
+---------------------+---------+-------+---------+---------+-----+
|DSP | -| -| -| -| -|
|Expression | -| -| 0| 6| -|
|FIFO | -| -| -| -| -|
|Instance | 0| -| 5602| 4918| -|
|Memory | -| -| -| -| -|
|Multiplexer | -| -| -| 45| -|
|Register | -| -| 403| -| -|
+---------------------+---------+-------+---------+---------+-----+
|Total | 0| 0| 6005| 4969| 0|
+---------------------+---------+-------+---------+---------+-----+
|Available SLR | 1440| 2280| 788160| 394080| 320|
+---------------------+---------+-------+---------+---------+-----+
|Utilization SLR (%) | 0| 0| ~0 | 1| 0|
+---------------------+---------+-------+---------+---------+-----+
|Available | 4320| 6840| 2364480| 1182240| 960|
+---------------------+---------+-------+---------+---------+-----+
|Utilization (%) | 0| 0| ~0 | ~0 | 0|
+---------------------+---------+-------+---------+---------+-----+
-
io_stream
: Note the long latency/II is due to the fact that the input data is put in a stream of size 100 (so we read have to wait to read in the full data)
+---------+---------+----------+----------+-----+-----+----------+
| Latency (cycles) | Latency (absolute) | Interval | Pipeline |
| min | max | min | max | min | max | Type |
+---------+---------+----------+----------+-----+-----+----------+
| 101| 101| 0.505 us | 0.505 us | 100| 100| dataflow |
+---------+---------+----------+----------+-----+-----+----------+
+---------------------+---------+-------+---------+---------+-----+
| Name | BRAM_18K| DSP48E| FF | LUT | URAM|
+---------------------+---------+-------+---------+---------+-----+
|DSP | -| -| -| -| -|
|Expression | -| -| 0| 20| -|
|FIFO | -| -| -| -| -|
|Instance | 0| -| 557| 13507| -|
|Memory | -| -| -| -| -|
|Multiplexer | -| -| -| 36| -|
|Register | -| -| 6| -| -|
+---------------------+---------+-------+---------+---------+-----+
|Total | 0| 0| 563| 13563| 0|
+---------------------+---------+-------+---------+---------+-----+
|Available SLR | 1440| 2280| 788160| 394080| 320|
+---------------------+---------+-------+---------+---------+-----+
|Utilization SLR (%) | 0| 0| ~0 | 3| 0|
+---------------------+---------+-------+---------+---------+-----+
|Available | 4320| 6840| 2364480| 1182240| 960|
+---------------------+---------+-------+---------+---------+-----+
|Utilization (%) | 0| 0| ~0 | 1| 0|
+---------------------+---------+-------+---------+---------+-----+