Tensilica has extended its IP core offering for compute-intensive dataplane and DSP functions such as imaging, video, networking and baseband wired/wireless communications. The Xtensa LX4 dataplane processor (DPU) for SoCs supports wider local data memory bandwidth of up to 1024 bits per cycle, wider VLIW (very long instruction word) instructions up to 128 bits for increased parallel processing, and a cache memory prefetch option that boosts overall performance for systems with long off-chip memory latency.
Tensilica is already using many of these capabilities in its recently introduced ConnX BBE64 DSP for LTE Advanced communications. The Xtensa LX4 DPU has four times the local data memory bandwidth of the Xtensa LX3 DPU, with up to two 512-bit load/store operations per cycle. Designers can now create super-wide SIMD (single instruction multiple data) DSPs that pump more data into more MAC (multiply accumulate) units each clock cycle for extremely fast performance. This makes Xtensa LX4 DPUs suitable for wired and wireless baseband processing, video pre- and post-processing, image signal processing, and various network packet processing functions.This enhanced local memory bandwidth comes in addition to Tensilica's existing customizable local port and queue interfaces that provide unlimited point-to-point data and control signal bandwidth. Tensilica now offers both the unique Port/Queue interfaces that allow connections between Xtensa DPUs and other system blocks just like traditional RTL block interconnection, and the new ultra-high bandwidth local memory connections.
With Xtensa LX4, Tensilica doubles the allowable width of its Flexible Length Instruction eXtensions (FLIX) instructions from 64- to 128-bits wide. This allows the execution of twice the number of independent operations per clock cycle. Every wide FLIX instruction is seamlessly intermixed with the shorter base Xtensa instruction set so there is no mode switch penalty when using FLIX.
With FLIX, the Xtensa LX4 DPU can deliver the ultra-high-performance characteristics of a specialty VLIW processor with smaller code size than competing VLIW DSPs, says the company. Tensilica's Xtensa C/C++ compiler automatically extracts parallelism from source code and bundles multiple operations into single FLIX instructions. An Xtensa LX4 DPU with wide FLIX instructions running parallel operations at low clock frequency can often deliver performance matching that of larger, higher MHz non-VLIW cores but consumes far less energy completing the same task.The new data prefetch option reduces cycle counts in long-latency designs by fetching data from system memory ahead of its use. This way, the data is ready and waiting when the application code needs it, reducing wasted cycles when the DPU would have to wait for data. The base Xtensa LX4 DPU can reach speeds of
over 1 GHz in 45 nm process technology (45GS) with an area of just 0.044 mm2.
Courtesy of EE Times Europe