29 7.eight 0.12 A5 259 3.9 0.12 A6 246 four.1 0.13 A7 492 two.0 0.13

29 7.eight 0.12 A5 259 3.9 0.12 A6 246 four.1 0.13 A7 492 two.0 0.13 A8 140 7.1 0.Future Web 2021, 13,16 of120 A1 – (13,eight)Quantity of
29 7.8 0.12 A5 259 three.9 0.12 A6 246 4.1 0.13 A7 492 2.0 0.13 A8 140 7.1 0.Future World wide web 2021, 13,16 of120 A1 – (13,eight)Number of Cores60 A8 – (13,four) 40 A6 – (four,8) A3 – (13,two) 20 A7 – (four,4)A4 – (eight,8);A2 – (13,four)A5 – (eight,4)0,two,4,6,0 8,0 10,0 Frames per Second (FPS)12,14,16,Figure 9. The amount of cores versus frames per second of each configuration on the architecture. The graphs indicate the configuration as number of lines of cores and number of columns of cores).Table 9 presents the Tiny-YOLOv3 network execution occasions on various platforms: Intel i7-8700 @ 3.two GHz, GPU RTX 2080ti, and embedded GPU Jetson TX2 and Jetson Nano. The CPU and GPU outcomes had been obtained using the original Tiny-YOLOv3 network [42] with floating-point representation. The CPU outcome corresponds towards the execution of Tiny-YOLOv3 implemented in C. The GPU outcome was obtained from the execution of Tiny-YOLOv3 within the Pytorch atmosphere working with CUDA libraries.Table 9. Tiny-YOLOv3 execution occasions on numerous platforms. Software program Version Floating-point Floating-point Floating-point Floating-point Fixed-point-16 Fixed-point-8 Platform CPU (Intel i7-8700 @ 3.2 GHz) GPU (RTX 2080ti) eGPU (Jetson TX2) [43] eGPU (Jetson Nano) [43] ZYNQ7020 ZYNQ7020 CNN (ms) 819.2 7.five 140 68 FPS 1.2 65.0 17 1.2 7.1 14.The Tiny-YOLOv3 on desktop CPUs is also slow. The inference time on an RTX 2080ti GPU showed a 109 speedup versus the desktop CPU. Using the 2-Bromo-6-nitrophenol MedChemExpress proposed accelerator, the inference times were 140 and 68 ms, within the ZYNQ7020. The low-cost FPGA was 6X (16-bit) and 12X (8-bit) quicker than the CPU with a smaller drop in accuracy of 1.4 and two.1 points, respectively. In comparison with the embedded GPU, the proposed architecture was 15 slower. The advantage of utilizing the FPGA could be the energy consumption. Jetson TX2 features a energy close to 15 W, even though the proposed accelerator has a power of about 0.five W. The Nvidia Jetson Nano consumes a maximum of 10 W but is about 12slower than the proposed architecture. five.3. Comparison with Other FPGA Implementations The proposed implementation was compared with preceding accelerators of TinyYOLOv3. We report the quantization, the operating frequency, the occupation of FPGA resources (DSP, LUTs, and BRAMs), and two functionality metrics (execution time and frames per second). In addition, we viewed as three metrics to quantify how efficientlyFuture World wide web 2021, 13,17 ofthe hardware resources have been becoming applied. Given that unique Nitrocefin custom synthesis options commonly possess a various quantity of sources, it can be fair to think about metrics to somehow normalize the results prior to comparison. FSP/kLUT, FPS/DSP, and FPS/BRAM identify the amount of each and every resource that is certainly made use of to produce a frame per second. The larger these values, the greater the utilization efficiency of these resources (see Table 10).Table ten. Performance comparison with other FPGA implementations. [38] Device Dataset Quant. Freq. (MHz) DSPs LUTs BRAMs Exec. (ms) FPS FPS/kLUT FPS/DSP FPS/BRAM ZYNQZU9EG Pedestrian signs eight 9.six 104 16 100 120 26 K 93 532.0 1.9 0.07 0.016 0.020 18 200 2304 49 K 70 [39] ZYNQ7020 [41] [40] Ours ZYNQVirtexVX485T US XCKU040 COCO dataset 16 143 832 139 K 384 24.4 32 0.23 0.038 0.16 one hundred 208 27.5 K 120 140 7.1 0.26 0.034 0.eight 100 208 33.four K 120 68 14.7 0.44 0.068 0.The implementation in [39] may be the only preceding implementation using a Zynq 7020 SoC FPGA. This device has substantially fewer resources than the devices applied within the other functions. Our architecture implemented within the similar device was three.7X and 7.4X more quickly, depend.

Author: cot- tpi2

Related Posts