ARM announces 'Osprey' A9 core as Atom-beater

TechOnline India - September 16, 2009

Processor intellectual property licensor ARM Holdings plc (Cambridge, England) has developed two implementations of a dual-core Cortex-A9 processor design that it has called Osprey. The 40-nm hard-macro processor, capable of achieving 2-GHz clock frequency, is one of the highest performing cores yet developed by ARM.

LONDON — Processor intellectual property licensor ARM Holdings plc (Cambridge, England) has developed two implementations of a dual-core Cortex-A9 processor design that it has called Osprey.

The 40-nm hard-macro processor, capable of achieving 2-GHz clock frequency, is one of the highest performing cores yet developed by ARM and the design would appear to be similar to an OMAP-4 chip that Texas Instruments is expected to sample in the fall that puts two ARM Cortex A9 cores in the space of a single Intel Atom core (see Samsung, Intrinsity pump ARM to GHz rate).

The Osprey, which is being put up as an Atom-killer at least until Intel turns its manufacturing process crank, comes in the form of hard macros designed for manufacture using the 40G 40-nm manufacturing process technology from Taiwan Semiconductor Manufacturing Co. Ltd. (Hsinchu, Taiwan).

The hard macro has been optimized once for power consumption and once for performance and in the later case should take the ARM processor into unchartered territory in terms of competing for performance applications.

"The goal is performance, performance, performance," said Eric Schorn, vice president of marketing for ARM's processor division. "We are into unlocking some new markets; netbooks, smartbooks, MIDs, consumer electronics in TV and entertainment devices, and enterprise networking, such as things like printers."

Osprey is itself a dual-core processor, but there is nothing to stop licensees laying down multiple cores on a die, Schorn said. Although ARM is still waiting to put a full test chip through TSMC, which is set to happen in the fourth quarter, the two designs are already available for licensing today, with IP delivery in 4Q 2009. That should allow customers to produce their own SoCs sometime in 2010.

The speed-optimized implementation is aimed at enterprise servers, networking, printers and other peak performance applications requiring clock frequencies up to and in excess of 2-GHz. The core occupies 6.7 square millimetres of silicon die and at 2-GHz the core delivers 10,000 Dhrystone MIPS while consuming about 1.9 watts. {pagebreak}The power optimized implementation is aimed at mobile computing devices, smartbooks, and other consumer devices requiring 800-MHz to in excess of 1-GHz clock frequency. It occupies 4.9 square millimeters of die and at 800-MHz delivers 4000-dhrystone MIPS and consumes 0.5-W. Both implementations are targeted on the TSMC 40G process, with support for the low-leakage GL process option.

The design includes a fixed L1 cache of 32-kbytes instruction and 32-kbytes data. There is an L2 cache controller that supports between 128-kbytes and 8-Mbytes.

Schorn claimed that on like-for-like comparison Osprey is coming in at between one third and one quarter the size of Intel's Atom processor which is in a similar 40/45-nm process technology. ARM has also run its Osprey past the Coremark benchmark from the Embedded Microprocessor Benchmark Consortium. According to ARM both implementations will outperform the Intel Atom N270 operating at 1.6-GHz. The power optimized version will do so at 800-MHz clock frequency will the speed optimized version, albeit at 2-GHz, will do so by a factor of 2.5.

Each core in the dual core design includes a Neon SIMD unit and a floating point unit in support of imaging and multimedia processing. "It is true networking is not a big user of Neon or floating point for that matter. But when you do hard macros you have to make some hard choices. This will have the advantage of being silicon-proven and done and dusted," said Schorn.

It has been some time since ARM introduced such a hard macro, going back to the time of the ARM922 and ARM926. "The ARM926 had a configurable cache and together with increased use of foundry and the foundries themselves offering low-power, general-purpose and performance variants of process nodes, the number of targets increased," said Schorn. "Now as we see the variables decrease, the number of targets reducing the viability of hard macros is increasing again," said Schorn. "We like to engineer once, license many."

{pagebreak}The earliest adopters of the Cortex-A9, ARM's semiconductor partners, have been implementing the processor core in low power processes, Schorn said. "Many partners are on LP processes so we were not going to duplicate what our partners have done. LP was about wireless. This high performance core provides the other axis with four or five times the Atom power efficiency," said Schorn.

The hard macro does not include a graphics processor but intriguingly the test chip being taped out does. "There is s a MALI-400 multimedia processor and a MALI-VE video engine on the dual-Osprey test chip," said Schorn.

Similarly the Osprey core does not include the Fast14 technology from Intrinsity Inc. (Austin, Texas) used to pump the Samsung implementation of the Cortex-A8 processor above 1-GHz clock frequency. "The Intrinsity technology is pretty fantastic. It has been applied to Cortex-A8 but it's not something we are offering here. It's certainly not ruled out in the future."

Osprey does include the clock gating and low-power design techniques used in in other ARM low-power processor designs. Major processing units consume no power if there is no instruction in the pipeline and the design comprises six independent power domains to manage leakage power when performance is not required. The integer pipeline can be can be turned off with SRAM retention to allow immediate reload possible and the cache snoop unit and L2 cache controller unit are also independently controlled.

Overall Schorn concluded: "This is big departure from what we've done in the past. It complements are partners and can expand application of the ARM architecture."

Related links and articles:

In 'phoney war' Intel attacks ARM at home

Intel slowly gears up for system-on-chips

ARM releases IP platform for TSMC's 40-nm process

Freescale, Qualcomm mint smartbooks at Computex

Microsoft Windows Embedded CE 6.0 on Intel Atom processor


blog comments powered by Disqus