Pruning conv1d
Created by: benjaminkreis
I have a branch going to allow the conv1d to take advantage of pruning in training. @jmduarte and I have been discussing this, and we think there isn't a simple formula for reducing the multiplier limit based on the number of weights equaling zero. It gets pretty complicated due to the presence of padding.
So the approach here is to just do the loops once in advance and count the number of multiplications by nonzero weights, then divide that by the reuse factor to get the multiplier limit.
Perhaps the simplest would be to do these first loops for counting in separate code (e.g. our python), but I think it would be nice for the nnet_utils to have this feature. So I've added the count here. Unfortunately it looks to me like I haven't managed to fully decouple it from the firmware part. I see a slightly different resource usage if I use this function versus just passing the limit from the outside world. If anyone sees the problem, let me know!
Some test results: Model: KERAS_conv1d_small Precision: ap_fixed<18,8> HLS Version: v2017.2
Default weights:
Reuse | BRAM | DSP | FF | LUT | Lat | II |
---|---|---|---|---|---|---|
1 | 13 | 547 | 47149 | 28161 | 24 | 1 |
2 | 13 | 398 | 47188 | 30769 | 26 | 2 |
3 | 13 | 266 | 47504 | 33031 | 30 | 3 |
Random 20% weights set to zero:
Reuse | BRAM | DSP | FF | LUT | Lat | II |
---|---|---|---|---|---|---|
1 | 13 | 455 | 41276 | 23950 | 24 | 1 |
2 | 13 | 303 | 37926 | 24188 | 26 | 2 |
3 | 13 | 206 | 38074 | 26472 | 30 | 3 |
Random 50% weights set to zero:
Reuse | BRAM | DSP | FF | LUT | Lat | II |
---|---|---|---|---|---|---|
1 | 13 | 279 | 24769 | 15877 | 22 | 1 |
2 | 13 | 200 | 27442 | 16876 | 25 | 2 |
3 | 13 | 99 | 17592 | 11563 | 25 | 3 |
I'm setting the mult limit, so I think here the test is to see if the non-DSP numbers look problematic. I don't see the FF or LUT numbers explode suggesting we are doing multiplications with logic. Latency looks alright. II is as expected.
So if people like this approach, I can make a PR. It would be nice to first figure out why the mult count is possibly taking some firmware to do though.