The 4th generation of aiWare automotive NPU hardware IP has undergone significant upgrades to both the hardware and software, delivering up to 1000 TOPS with industry-leading efficiency of up to 98% while enabling support for latest AI trends such as Transformer Networks, FP8 and structured fine-grained sparsity
The unique 'data-first' scalable hardware architecture combines concepts such as near-memory execution, massively parallel on-chip I/O, hierarchical hardware tiling, and wavefront processing to deliver the highest possible PPA.
Upgraded capabilities for aiWare4+ include:
Upgraded Programmability: significant enhancements to the aiWare hardware architecture and SDK portfolio of tools enable users to gain full access to every part of aiWare's internal execution pipeline without compromising the high-level AI-centric approach that makes tools such as the highly interactive aiWare Studio so popular with both research and production engineers
Full FP8 Support: with aiWare4+, full support has been added for FP8 in addition to INT8 quantization for workload execution
Broader Network Support: SDK upgrades enable users to deliver higher performance for not only CNNs but the latest emerging industry trends, such as transformer networks, occupancy networks and LSTMs. aiWare4+ users will also benefit from hardware enhancements delivering significant performance and efficiency boosts for workloads such as transformer networks
Enhanced Sparsity Support: aiWare4+ hardware upgrades mean any weight sparsity results in minimized NPU power consumption on a per-clock basis, ensuring optimized power consumption for the widest possible range of workloads
Improved Scalability: aiWare4+ is designed to scale from 10 TOPS up to 1000+ TOPS using a multi-core architecture to increase throughput while retaining high efficiency (subject to external memory bandwidth constraints). Furthermore, aiWare4+ brings interleaved multi-tasking that optimizes performance and efficiency with multiple workloads.
aiMotive team of AI researchers constantly track the latest developments in the automotive AI industry and relentlessly benchmark and compare our methodologies to the best in the industry. aiWare4+ continues to deliver the automotive industry's highest NPU efficiency of up to 98% for a wide range of AI workloads, enabling superior performance using less silicon and less power.
'When we delivered aiWare4, we knew our highly customized hardware architecture enabled us to deliver superior efficiency and PPA compared to any other automotive inference NPU on the market,' says
aiMotive will be shipping aiWare4+ RTL to lead customers starting Q2 2023 while the SDK provides early support for the majority of the new features today, with the availability of production quality release in 2023.
Notes
Note 1: PPA: Power, Performance and Area
Note 2: See aiWare3 benchmarks on
(C) 2022 Electronic News Publishing, source