Skip to content

Embedding layer

Javier Duarte requested to merge embed into master

Original implementation from @drewpersigehl: https://github.com/drewpersigehl/L1MET_EmbeddingLayer

Addresses #422 (closed)

Simple test script: https://gist.github.com/jmduarte/302c1d629747d86b926fa13b485a1682

  • io_parallel implementation
  • io_stream implementation
  • add tests

A few peculiarities to note (not yet enforced):

  • weight_t should always be equal to res_t
  • data_t should always be [u]int or ap_[u]int<X>

Example HLS latency/resources for n_in = 100, vocab_size = 13, n_out = 8, ap_uint<4> for inputs, and ap_fixed<16,6> for outputs.

  • io_parallel:
    +---------+---------+----------+----------+-----+-----+----------+
    |  Latency (cycles) |  Latency (absolute) |  Interval | Pipeline |
    |   min   |   max   |    min   |    max   | min | max |   Type   |
    +---------+---------+----------+----------+-----+-----+----------+
    |        1|        1| 5.000 ns | 5.000 ns |    1|    1| function |
    +---------+---------+----------+----------+-----+-----+----------+
+---------------------+---------+-------+---------+---------+-----+
|         Name        | BRAM_18K| DSP48E|    FF   |   LUT   | URAM|
+---------------------+---------+-------+---------+---------+-----+
|DSP                  |        -|      -|        -|        -|    -|
|Expression           |        -|      -|        0|        6|    -|
|FIFO                 |        -|      -|        -|        -|    -|
|Instance             |        0|      -|     5602|     4918|    -|
|Memory               |        -|      -|        -|        -|    -|
|Multiplexer          |        -|      -|        -|       45|    -|
|Register             |        -|      -|      403|        -|    -|
+---------------------+---------+-------+---------+---------+-----+
|Total                |        0|      0|     6005|     4969|    0|
+---------------------+---------+-------+---------+---------+-----+
|Available SLR        |     1440|   2280|   788160|   394080|  320|
+---------------------+---------+-------+---------+---------+-----+
|Utilization SLR (%)  |        0|      0|    ~0   |        1|    0|
+---------------------+---------+-------+---------+---------+-----+
|Available            |     4320|   6840|  2364480|  1182240|  960|
+---------------------+---------+-------+---------+---------+-----+
|Utilization (%)      |        0|      0|    ~0   |    ~0   |    0|
+---------------------+---------+-------+---------+---------+-----+
  • io_stream: Note the long latency/II is due to the fact that the input data is put in a stream of size 100 (so we read have to wait to read in the full data)
    +---------+---------+----------+----------+-----+-----+----------+
    |  Latency (cycles) |  Latency (absolute) |  Interval | Pipeline |
    |   min   |   max   |    min   |    max   | min | max |   Type   |
    +---------+---------+----------+----------+-----+-----+----------+
    |      101|      101| 0.505 us | 0.505 us |  100|  100| dataflow |
    +---------+---------+----------+----------+-----+-----+----------+
+---------------------+---------+-------+---------+---------+-----+
|         Name        | BRAM_18K| DSP48E|    FF   |   LUT   | URAM|
+---------------------+---------+-------+---------+---------+-----+
|DSP                  |        -|      -|        -|        -|    -|
|Expression           |        -|      -|        0|       20|    -|
|FIFO                 |        -|      -|        -|        -|    -|
|Instance             |        0|      -|      557|    13507|    -|
|Memory               |        -|      -|        -|        -|    -|
|Multiplexer          |        -|      -|        -|       36|    -|
|Register             |        -|      -|        6|        -|    -|
+---------------------+---------+-------+---------+---------+-----+
|Total                |        0|      0|      563|    13563|    0|
+---------------------+---------+-------+---------+---------+-----+
|Available SLR        |     1440|   2280|   788160|   394080|  320|
+---------------------+---------+-------+---------+---------+-----+
|Utilization SLR (%)  |        0|      0|    ~0   |        3|    0|
+---------------------+---------+-------+---------+---------+-----+
|Available            |     4320|   6840|  2364480|  1182240|  960|
+---------------------+---------+-------+---------+---------+-----+
|Utilization (%)      |        0|      0|    ~0   |        1|    0|
+---------------------+---------+-------+---------+---------+-----+

Merge request reports

Loading