Error correcting code (ECC) enables increased memory density and bandwidth while maintaining power neutrality and reliability. Here’s a detailed look at why that is.
The Evolution of LPDDR Memory
With the growing number of new applications for mobile devices, designers of handsets and tablets continue to face the challenge of increasing memory throughput without compromising battery life. The introduction of the LPDDR4 standard in 2014 doubled data rates and lowered operating voltage versus LPDDR3, significantly improving the mobile user experience by increasing performance and enabling longer battery life. The launch of LPDDR4x in 2016 further improved battery life by providing up to 20% more energy efficiency than standard LPDDR4 memory.
LPDDR4 memory is important for IoT applications, such as wearable electronics, where power is a critical design constraint. For automotive applications, LPDDR4’s higher bandwidth and power advantages are ideal for vehicle subsystems such as center consoles and advanced driver assistance systems (ADAS).
The Challenges of Evolving DRAM
The LPDDR4 specification was designed to accommodate continued advancement in DRAM process technology, which involves shrinking the dimensions of the memory cell. To maintain cell capacitance in less area, more complex manufacturing is required. As a result of shrinking cell sizes, the time required for each memory cell to reach maximum charge increases. These effects make it increasingly challenging for manufacturers to maintain yields and reliability when transitioning to subsequent process generations.
DRAM yields are primarily limited by single-bit errors. A few of these errors may be “hard” fail bits, where a bit is stuck at 1 or 0. These are always repaired using redundant elements. However, most failing single bits are simply marginal; they work correctly if they are refreshed often enough or written for a longer period of time.
Repairing these bits, which are still a very small percentage of the array, does require an increasing amount of redundant elements, which increases die size and complexity. It should be noted that the DRAM write recovery time and 64ms or 32ms refresh specifications are set very conservatively to allow most of these weak bits to pass. Without these bits, the refresh and write recovery time specifications could be relaxed substantially, which would provide performance and power benefits.
Variable Refresh Time Bits
Another phenomenon that becomes more prevalent with each process shrink is variable refresh time bits, or VRT bits. These are occasional random single bit failures that occur because their refresh time changes after the DRAM is heated (that is, when the solder reflow is performed for board mounting). While these VRT bits are relatively rare, they are troublesome if they occur after the DRAM has passed final test at the manufacturer, which makes repair difficult or impossible.
In an effort to mitigate the costs of post-package repair or scrapped parts, and to maintain acceptable field failure rates, DRAM manufacturers currently test the memory bits at conditions much more stringent than the specification requires. The goal is to find the VRT bits before they actually fail.
While this testing is largely successful, it comes at the expense of reduced yields. There can be a significant degree of overkill associated with more stringent testing because numerous die that would not actually produce a VRT fail are discarded in the process of identifying actual VRT die. Also, no testing is perfect, and some VRTs may escape and still manage to find their way to OEMs. Given these continuing problems caused by VRT bits, memory manufacturers needed to implement a new technology to increase reliability and control costs for future devices.
The Benefits of Error Correcting Code
Error correcting code (ECC) is an established memory technology used in a vast array of applications to increase reliability. ECC provides the next level of redundancy for memory ICs by using a Hamming code, which generates a small number of parity bits that are stored in the memory array with the user data. The Hamming code enables a short run of bits to protect a much longer data word. For example, Micron’s LPDDR4 devices use 8 parity bits to provide correction for a 128-bit data word. These parity bits can be used to detect and correct a single-bit error in the 128-bit word.
Single-Bit Error Correction
Whenever data is written to memory, the associated parity bits are updated as well. When the data is read, the DRAM verifies the integrity of the entire 136-bit (128 data + 8 parity) code word. If a single bit failure is detected, such as a VRT bit that arises after mounting, ECC will automatically correct the error.
Given that the odds of two single-bit errors occurring in the same code word are extremely remote, ECC technology provides an effective way to eliminate random single-bit errors.
Because ECC is a passive technology, errors are detected and corrected automatically. There is no intervention required by developers. The correction is also completely transparent to the rest of the system.
Another benefit of adding ECC to LPDDR4 is that it can result in a lower total cost of ownership (TCO) in terms of power, performance and cost. For example, adding ECC to LPDDR4 causes a modest increase in active power (on the order of 5–7%). This increase arises from the additional memory bits and logic circuitry that is required to store and process the ECC parity bits. At the same time, ECC can result in a substantial decrease in standby and refresh power. When a device is in sleep mode, DRAM-based memory needs to be regularly refreshed to replace the leakage current from each memory cell. The use of ECC increases reliability, which enables the DRAM to reduce the self refresh rate. For most low-power applications, the added reliability of ECC and its superior standby efficiency outweigh the slightly higher active current.
The power efficiency of LPDDR4 with ECC also helps OEMs achieve power neutrality when migrating or transitioning to next-generation mobile devices; that is, they can provide greater functionality without needing a larger battery and without negatively impacting operating life.
In terms of performance, with ECC there is a small amount of additional read latency, which is accounted for in the specified read latency values. Additional write delay is also needed to allow the DRAM time to calculate the parity bits. This time is reflected in the 18ns write recovery time specification (compared to 15ns for LPDDR3).
Some in the industry have contemplated moving to a write recovery time specification of 45ns to address scaling issues. The inclusion of ECC could mitigate the necessity for this increase in the LPDDR4 specification. This mitigation of a write recovery time increase can more than make up for the performance lost due to the small amount of additional read latency.
ECC also requires a modest increase in die size to accommodate the parity bits and ECC logic. However, these costs are easily offset by higher reliability for OEMs, as well as higher yields and reduced test costs for DRAM manufacturers.
In keeping up with today’s evolving mobile, automotive and IoT applications, developers and memory manufacturers continue to face the challenge of increasing memory performance without compromising power and reliability. Meeting these requirements becomes more challenging since shrinking die sizes often introduce manufacturing challenges that can affect memory reliability. Micron’s LPDDR4 and LPDDR4x memory with ECC technology provides an effective way to eliminate these challenges, while providing high bandwidth and power advantages for today’s next-generation devices
Last modified: 30th October 2017