Quantized Neural Networks for FPGA Inference

Low precision quantization for neural networks supports AI application specifications by providing greater throughput for the same footprint or reducing resource usage. Block floating point (BFP) is particularly useful in this scenario due to its high dynamic range which allows for lower precision while maintaining accuracy. Any drop in accuracy can be recouped by retraining using our open source software.