NAND flash is the dominant type of non-volatile memory technology used today. Developers commonly face difficulties developing and maintaining firmware, middleware and hardware IP for interfacing with raw NAND devices. After reviewing the history and differentiated features of various memory devices, we’ll take a detailed look at common obstacles to NAND device development and maintenance, particularly for embedded and system-on-chip (SoC) developers, and provide some recommendationsfor handling these challenges.
Background on non-volatile memory technologies
There are many different non-volatile memory technologies in use today.
Electrically erasable programmable read-only memory, or EEPROM, is one of the oldest forms of technology still in use for user-modifiable, non-volatile memories. In modern usage, EEPROM means any non-volatile memory where individual bytes can be read, erased, or written independently of all other bytes in the memory device. This capability requires more chip area, as each memory cell requires its own read, write, and erase transistor. As a result, the size of EEPROM devices is small (64 Kbytes or less).
EEPROM devices are typically wrapped in a low-pin-count serial interface, such as I2C or SPI. Parallel interfaces are now uncommon due to required larger pin count, footprint, and layout costs. Like almost all available non-volatile memory types, EEPROMs use floating gate technology in a complementary metal-oxide-semiconductor (CMOS) process.
Flash memory is a modified form of EEPROM memory in which some operations happen on blocks of memory instead of on individual bytes. This allows higher densities to be achieved, as much of the circuitry surrounding each memory cell is removed and placed around entire blocks of memory cells.
There are two types of flash memory arrays in the marketplace — NOR flash and NAND flash. Though these names are derived from the internal organization and connection of the memory cells, the types have come to ignify a particular external interface as well. Both types of memory use floating gates as the storage mechanism, though the operations used to erase and write the cells may be different.
NOR flash was the first version of flash memory. Until about 2005, it was the most popular flash memory type (measured by market revenue). In NOR flash, bytes of memory can be read or written individually, but erasures happen over a large block of bytes. Because of their ability to read and write individual bytes, NOR flash devices aren’t suitable for use with block error correction. Therefore, NOR memory must be robust to errors.
The capability to read individual bytes also means it can act as a random access memory (RAM), and NOR flash devices will typically employ an asynchronous parallel memory interface with separate address and data buses. This allows NOR flash devices to be used for storing code that can be directly executed by a processor. NOR flash can also be found wrapped in serial interfaces, where they act similar to SPI EEPROM implementations for reading and writing.
The organization and interface of the NOR flash devices places limitations on how they can scale with process shrinks. With a goal of replacing spinning hard disk drives, the inventor of NOR flash later created NAND flash. He aimed to sacrifice some of the speed offered by NOR flash to gain compactness and a lower cost per byte . This goal has largely been met in recent years, with NAND sizes increasing to multiple gigabytes per die while NOR sizes have stagnated at around 128 MB. This has come at a cost, as will be discussed later.
Raw NAND memory is organized into blocks, where each block is further divided into pages.
Figure 1: MT29F2G08AACWP NAND memory organization (courtesy Micron Inc.)
In NAND memories, read and write operations happen on a per-page basis,
but erase operations happen per block. The fact that read and write operations are done block-wise means that it’s suitable to employ block error correction algorithms on the data. As a result, NAND manufacturers have built in spare bytes of memory for each page to be used for holding this and other metadata. NOR flash doesn’t have such spare bytes.
Also in contrast to NOR flash, the NAND flash interface isn’t directly addressable, and code cannot be executed from it. The NAND flash has a single bus for sending command and address information as well as for sending and receiving memory contents. Therefore, reading a NAND device requires a software device driver.
NAND flash is the underlying memory type for USB memory sticks, memory
cards (e.g. SD cards and compact flash cards) and solid state hard drives. In all cases, the raw NAND flash devices are coupled with a controller that translates between the defined interface (e.g. USB, SD and SATA) and the NAND’s own interface. In addition, these controllers are responsible for handling a number of important tasks for maintaining the reliability of the NAND memory array.
Raw NAND issues and requirements
Let’s take a detailed look at the issues and challenges presented by incorporating raw NAND devices into an embedded system or SoC.
Errors and error correction
Despite being based on the same underlying floating gate technology, NAND flash has scaled in size quickly since overtaking NOR flash. This has come at a cost of introducing errors into the memory array.
To increase density, NAND producers have resorted to two main techniques. One is the standard process node and lithography shrinks, making each memory cell and the associated circuitry smaller. The other has been to store more than one bit per cell. Early NAND devices could store one of two states in a memory cell, depending on the amount of charge stored on the floating gate. Now, raw NAND comes in three flavors: single-level cell (SLC), multi-level cell (MLC) and tri-level cell (TLC). These differ in the number of charge levels possibly used in
each cell, which corresponds to the number of bits stored in each cell. SLC, the original 2 levels per cell, stores 1 bit of information per cell. MLC uses 4 levels and stores 2 bits, and TLC uses 8 levels and stores 3 bits.
While reducing silicon feature sizes and storing more bits per cell reduces the cost of the NAND flash and allows for higher density, it increases the bit error rate (BER). Overcoming the increasing noisiness of this storage medium requires larger and larger error correcting codes (ECCs). An ECC is redundant data added to the original data. For example, the latest SLC NANDs in the market require 4 or 8 bits ECC per 512 bytes, while MLC NAND requires more than 16 bits ECC per 512 bytes.
But four years ago, SLC NANDs only required 1 bit of ECC, and the first MLC NANDs only required 4 bits of ECC. In the event of errors, the combined data allows the recovery of the original data. The number of errors that can be recovered depends on the algorithm used.
Figure 2: Device issues versus process node shrinks (courtesy Micron)
Ideally, any ECC algorithm can be used to implement ECC as long as the encoder and decoder match. The popular algorithms used for NAND ECC are:
* Hamming Code: For 1-bit correction 
* Reed Solomon: For up to 4 bits of correction. This is less common .
* BCH : For 4 or more bits of correction .
Extra memory (called the "spare memory area" or "spare bytes region") is provided at the end of each page in NAND to store ECC. This area is similar to the main page and is susceptible to the same errors. For the present explanation, assume that the page size is 2,048 bytes, the ECC requirements are 4 bits per 512 bytes and the ECC algorithm generates 16 bytes of redundant data per 512 bytes. For a 2,048-byte page, 64 bytes of redundant data will be generated. For example, in current Texas Instruments (TI) embedded processors, the ECC data is generated for every 512 bytes, and the spare bytes area will be filled with the ECC redundant data. As ECC requirements have gone up, the size of the spare
regions provided by the NAND manufacturers have increased as well.
The manufacturers of NAND devices specify the data retention and the write/erase endurance cycles under the assumption of the specified ECC requirements being met. When insufficient ECC is used, the device’s usable lifetime is likely to be severely reduced. If more errors are detected than can be corrected, data will be unrecoverable.
Before raw NAND operations can begin, the first step is to determine the NAND geometry and parameters. The following list is the minimum set of NAND parameters needed by a bootloader or other software layer to determine NAND geometry:
* 8-bit or 16-bit data width
* Page size
* Number of pages per block (block size)
* Number of address cycles (usually five in current NANDs)
Raw NAND provides various methods for NAND manufacturers to determine its geometry at run time:
4th byte ID: All raw NANDs have a READ ID (0x90 at Address 0x00) operation which returns 5 bytes of identifier code. The first and second byte (if the starting byte number is 1 aka "one based") are the manufacturer and device IDs, respectively. The fourth byte (one-based) has information on the NAND parameters discussed above, which can be used by the ROM bootloader.
This 4th byte information can be used to determine raw NAND geometry, yet the interpretation of the 4th byte ID changes from raw NAND manufacturer to manufacturer and between generations of raw NANDs. There are two noteworthy interpretations. The first is a format used by Toshiba, Fujitsu, Renesas, Numonyx, STMicroelectronics, National and Hynix, with certain bits used to represent the page size, data bus size, spare bytes size and number of pages per block. The second is a format particular to the latest Samsung NANDs, holding similar information to the first representation, but with different bit combinations representing different possible values. Since the 4th ID byte format isn’t standardized in any way, its use for parameter detection isn’t
ONFI: Many NAND manufacturers, including Hynix, Micron, STMicroelectronics, Spansion and Intel, have joined hands to simplify NAND flash integration and offer Open NAND Flash Interface (ONFI)-compliant NANDs. ONFI offers a standard approach to reading NAND parameters.
The physical connection between the embedded processor and the raw NAND
device is also of concern. NAND devices can operate at either 3.3V or 1.8V, so it’s important to purchase NANDs with compatible voltage levels. It should be pointed out that 1.8V NAND devices are often specified with worse performance than 3.3V equivalent parts.
Challenges to address
Hardware design challenges (memory interfaces)
Advanced ECC operations, like Reed Solomon or BCH, are computationally
very expensive. Many solutions offer hardware (HW) support for ECC. However, these fixed solutions lag behind the growing ECC requirement. MLC NANDs may now require more than 16 bits ECC per 512 bytes, yet HW support designed just a few years ago may not support 16 bit. In that case, the HW ECC support would become useless, and either the NAND couldn’t be used or the ECC computation would need to be done in the software (SW), which takes CPU cycles from other important tasks and assigns it to ECC calculation.
Firmware and ROM Bootloaders
The dynamic raw NAND market (raw NANDs have relatively short lifecycles
of about two years) and the initial lack of standardization has resulted in heterogeneous interfaces from different NAND manufacturers. Not only do these approaches vary between manufacturers, but, at times, they also differ between different generations of NANDs from within the same manufacturer! This offers significant challenges for firmware and middleware that might not be updated very often (if at all).
The main concern when designing ROM bootloaders for raw NAND booting is
whether or not future NAND devices will work. The ONFI standard alleviates this somewhat since it provides a way to guarantee device identification commands that should not have to change in the future.
Another major concern related to the hardware design issue is what level of ECC is sufficient. Since the NAND parts that will be connected to the system cannot be known a priori, the safest solution is to leverage the maximum ECC possible with the memory interface or controller. Using more ECC than required for booting simply improves robustness, with the possible downsides being increased boot time and more complex factory programming procedures.
Since NAND manufacturers don’t guarantee that all blocks of the memory are good (nor will all blocks remain good over the device’s lifetime), another issue is how to handle bad blocks with unrecoverable errors if encountered during booting. Some strategies include placing multiple copies of the boot image and letting the boot loader locate and load the first good one or having the boot loader respect a bad block table stored somewhere else in the NAND. Another useful strategy is to have the system run-time software periodically check and correct any issues with the boot block.
Middleware / OS software issues
The middleware, or run-time software, suffers from similar issues to those faced by the ROM boot loader. Although it might be easier to adapt the middleware to handle newer devices, newer detection schemes and newer command sets offered by more recent device, there is overhead every time a change has to be propagated through different support structures, from middleware teams to customers. For example, the memory technology device (MTD) layer of the Linux® OS kernel had issues when device sizes reached 4GB since the size had originally been defined as a 32-bit value. In another case, there was no support for NAND device with page sizes larger than 2KB. Modern NAND devices have 4KB or 8KB page sizes. Fixing these issues isn’t necessarily trivial.
In addition, the run-time middleware must deal with activities such as wear-leveling and bad block management. Wear-leveling is software mechanism to spread the write/erase cycles around the chip so that all blocks wear evenly. Failure to do so will result in oft-used blocks failing very early. This is more important now than ever before, as the cells of the MLC devices have much reduced endurance rating (3,000 - 5,000 compared to 10,000 to 100,000 for SLC NANDs). The middleware must also track which blocks are bad and make sure they aren’t used for any further reads and writes. The more stringent requirements of recent and future NAND devices may require even more complicated schemes to be enacted to manage wear-leveling and bad blocks.
Supporting different custom ECC hardware is another challenge as one ports the middleware from one generation of processor to the next and improves the ECC capabilities in new drivers. Additionally, there is no good solution if the ECC HW cannot meet the ECC requirement of the NAND, as software ECC has proven to be too slow and cumbersome for most embedded processors.
Potential solutions to NAND challenges
In our opinion, the main issue with using raw NAND devices in the embedded processor space is the skyrocketing ECC requirements. In five years, ECC requirements have gone from 1 bit/4 bit to pushing past 24 bits. Memory interface hardware designed six years ago is very likely incompatible with any new chips available on the market today. Though the issue of device identification and parameterization has been a problem, it isn’t considered as critical as the fundamental incompatibility resulting from insufficient ECC hardware in the memory controller/interface of embedded processors. Given the lifetime of products based on such devices (10-15 years), the lack of NAND supply that fits the original design requirements could force major rework in both hardware and software part of the way through the product life cycle.
Fortunately, the memory manufacturers have realized the issues with rapidly increasing ECC requirements and have taken steps to address it. The solution is managed NAND.
Managed NANDs perform some or all of the three NAND management tasks
(e.g. ECC, wear leveling and bad block management) on the memory device
instead of in a host controller so that they are no longer a concern for the system developer. Perhaps the most compelling of these is the Enhanced Multi-Mini Cap (eMMC). eMMC is a Joint Electron Devices Engineering Council (JEDEC) standard that combines electrical and physical chip specifications with the interface commands and protocols of the MMC 4.3 into a single entity.
Figure 3: eMMC block diagram (courtesy Samsung)
There are also “partially-managed” NAND devices that maintain the NAND
interface but move the ECC into the memory device. The solution, which Micron has branded as ClearNAND™, may be attractive to those looking to replace the NANDs in current designs with minimal changes to the system SW or HW . Toshiba has recently released a nearly identical solution, dubbed SmartNAND™. It seems certain that other NAND vendors will soon follow the same path.
Figure 4: Standard RAW NAND versus ClearNAND (courtesy Micron)
There are additional costs to the managed NAND solutions, but having seen the problems of a rapidly maturing NAND market as both a silicon producer and a silicon consumer, it’s clear that some form of managed NAND is the only sensible choice for future design and development.
Embedded processor vendors, such as Texas Instruments, will continue to support the existing ECC hardware in their memory interfaces, but there is little reason for them to spend design time and resources trying to keep pace with the skyrocketing ECC requirements. At this point in the evolution of NAND, it’s best to leave the implementation of advanced ECC solutions to the memory vendors as they push their devices to the physical limits of CMOS technology.
About the Authors:
Daniel Allred is a senior applications engineer for Texas Instruments. He works in the C6000™ DSP software architecture team, developing ways to make TI's DSP technology more accessible. He has been with TI for four years, working with the DaVinci™ digital media processor and catalog OMAP™ embedded processor products. He holds a Bachelor of Science degree from the University of Florida and a Master of Science degree from Georgia Tech, where he studied microphone array signal processing and novel computational methods for signal processing in hardware.
Gaurav Agarwal is an applications engineer for Texas Instruments. He works in the C6000™ DSP application team. Gaurav has been with TI for five years, working with the VoIP, DaVinci™ digital media processor and catalog OMAP™ embedded processor products. Since last year, he has been leading efforts to design generic and bug-free boot loaders. Before joining TI, he worked at Motorola, where he published several research papers in the field of video-transcoding systems. He holds a Bachelor of Technology degree from IIT Kanpur, India and an Master of Science degree from University of Maryland, College Park, where he developed a novel machine vision system for leaf recognition.