softmax: Move the HLS PIPELINE command into the outer loop
Created by: ejk43
I ran into a few issues with PIPELINE in the inner loop. Basically, I could not meet timing for the device I was simulating (xc7z020clg484-1) for the keras_3layer example, and the critical path traced back to the exp_diff_res lookup and accumulate. With some empiprical testing, I found moving HLS PIPELINE to the outer loop gets much better results:
- Timing improved from a min of 6.6 ns to 4.32
- Latency reduced in the softmax from 106 to 70 (likely driven by size of the softmax input)
- Other resource usage stayed pretty similar. A big more usage in softmax, but OK overall
This should only impact serial mode