Open access peer-reviewed chapter - ONLINE FIRST

Harnessing RRAM Technology for Efficient AI Implementation

Written By

Fang-Yi Gu

Submitted: 19 June 2024 Reviewed: 25 June 2024 Published: 02 September 2024

DOI: 10.5772/intechopen.1006094

Recent Advances in Neuromorphic Computing IntechOpen
Recent Advances in Neuromorphic Computing Edited by Kang Jun Bai

From the Edited Volume

Recent Advances in Neuromorphic Computing [Working Title]

Dr. Kang Jun Bai and Prof. Yang (Cindy) Yi

Chapter metrics overview

20 Chapter Downloads

View Full Metrics

Abstract

This chapter details the utilization of RRAM devices as key components in neuromorphic computing for efficient in-memory computing. Beginning with the fundamental mechanism of RRAM and its data storage capabilities and followed by efficient AI implementations with RRAM. This includes discussions on RRAM-based accelerators facilitating DNN computations with remarkable O(1) time complexity efficiency, as well as the RRAM’s multi-level characteristics. Subsequently, the chapter addresses challenges encountered in RRAM technology, such as variations, IR-drop issues, and the substantial energy and area requirements associated with DAC/ADC operations. Solutions to these challenges are briefly summarized. Emphasis is then placed on the critical issue of programming RRAM devices, with challenges including cycle-to-cycle variation and energy-intensive processes. Various programming techniques are explicated, accompanied by a comparative analysis of their respective advantages and drawbacks.

Keywords

  • In-memory computing
  • RRAM
  • RRAM-based DNN accelerators
  • reliability
  • RRAM programming

1. Introduction

In the conventional Von-Neumann architecture, the separation of memories, logic units, and control units has led to the emergence of the ‘memory wall’ phenomenon, namely an enlarging divergence in performance evolution between processors and memories. It is reported that over recent years, this trend has become increasingly evident, with processor performance boosted by over 10,000 times while memory performance has seen only modest improvements, around 10 times [1]. In response to this challenge, researchers have turned their attention to exploring alternative memory technologies with superior performance and higher density storage. Nonvolatile memories (NVMs) have emerged as particularly promising candidates, whose superiorities include simple structures, nonvolatile data storage, low power consumption, high scalability, and compatibility with CMOS devices. Moreover, some of them possess the capability of not only storing data but also performing computational tasks.

Filament-type resistive random-access memory (RRAM) stands out as a prominent member of the NVM family, drawing significant interest from researchers and industry alike. Leveraging reversible resistive switching (RS) mechanisms, RRAM is capable of data storage. Beyond its storage capabilities, RRAM exhibits analog behavior, enabling it to perform multiplication computations. Later in this chapter, we will present the filamentary mechanism of RRAM, elucidating its intricacies in a straightforward manner. Additionally, we will explore how RRAM facilitates time-efficient vector matrix multiplication.

Advertisement

2. Resistive switching (RS) mechanism

An RRAM memory cell consists of a metal–insulator–metal (MIM) structure, where an insulating layer is sandwiched between two metal electrodes. The filamentary mechanism uses external voltage applied to the electrodes to control the growth or reduction of a filament to modulate the desired resistance value of the memory cell. The cell can be switched between a high resistance state (HRS) and a low resistance state (LRS), or logic ‘0’ and logic ‘1’. In the HRS, the filament is disconnected, resulting in high resistance, while in the LRS, the filament is connected, leading to low resistance. Additionally, RRAM possesses multi-level characteristics, which is an essential feature to achieve more compact memory density. This will be further explained later in the chapter.

An RRAM cell begins in its pristine state, with an insulating layer that requires a high voltage pulse, known as the ‘forming voltage,’ to create a conductive path. This process, called electroforming, prepares the cell for filament modulation using different voltages. To switch the cell from a HRS to a LRS, a set voltage is applied. Conversely, to switch from LRS back to HRS, a reset voltage is applied.

There are two types of RRAM based on filament composition: ion-based RRAM and oxygen-vacancy-based RRAM. In ion-based RRAM, the filament is formed through the migration of metal ions and the oxidation/reduction of the electrochemically active top metal electrode (anode). When a positive voltage is applied, oxidation occurs, generating metal ions that drift toward the inert bottom electrode (cathode) and reduce to form a conductive filament, switching the cell to LRS. During the reset process, applying a negative voltage oxidizes the deposited metal atoms back into ions, rupturing the filament and switching the cell to HRS. Figure 1 illustrates the set and reset process of an ion-based RRAM cell.

Figure 1.

Set (left) and reset (right) process of an ion-based RRAM cell.

In oxygen-vacancy-based RRAM, the filament is formed through the migration of oxygen ions. The electroforming process produces a sufficient amount of oxygen vacancies to initiate the RS—when a high electric field is applied toward the anode interface, the dielectric material in the insulating layer (typically an oxide layer) experiences breakdown, weakening its insulating properties and allowing charge carriers to move freely. During the set process, oxygen atoms within the dielectric lattice are further ejected, becoming oxygen ions (O2−) and leaving behind positively charged vacancies (Vo2+). The drifting O2− can transfer electrons to the anode (oxidize) and form an interfacial oxide layer, which can act as a protective barrier against further oxidation and conveniently serve as an oxygen reservoir. While the Vo2+ reduce at cathode and serve as electron hopping sites for current conduction paths, namely conductive filament, allowing the cell to switch to LRS. During the reset process, the oxide interface releases oxygen ions, which combine with the oxygen vacancies within the oxide layer, switching the cell back to HRS. An example of an oxygen-vacancy-based RRAM device is shown in Figure 2, which illustrates the RS process of a TiN/Ti/HfO2/TiN RRAM [1]. Two types of oxygen vacancies exist in the insulating layer: HfO2+ and Hf4+, which are represented as Vo2+ in the figure.

Figure 2.

Set (left) and reset (right) process of an Vo2+-based RRAM cell.

During the process of forming the filament in the dielectric layer, a compliance current (Icc), which is the maximum current allowed to flow through the selected transistor, is enforced on the RRAM device to prevent permanent breakdown, ensuring the device does not enter an irreversible state in which even the highest reset voltage cannot dissolve the filament. The magnitude of gate voltage on the transistor in a 1T1R device controls the magnitude of Icc. The higher Icc is, the stronger the filament is formed, meaning that this LRS state can be stable over longer time, namely longer retention time. However, it also means that higher reset voltage is required to switch the device back to HRS.

Advertisement

3. Efficient vector matrix multiplication (VMM) with RRAM devices

RRAM devices are capable of storing weights as resistance states. When RRAM devices are constructed into a crossbar structure, as depicted in Figure 3, vector matrix multiplication (VMM) can be easily executed due to the analog behavior of RRAM.

Figure 3.

RRAM crossbar structure.

Eq. (1) illustrates an example of VMM, where W represents a matrix of weights, V represents a vector of input, and Q represents a resulting vector of outputs. As shown in Figure 3, if inputs are applied as voltages and weights are represented in the form of conductances, the output can be easily obtained in O(1) time complexity by measuring the current flowing through the bitline, in accordance with Ohm’s law, In=i=0mviGi,n.

E1

This in-memory VMM computation allows us to efficiently perform convolutional computations commonly used in deep neural networks (DNNs). Figure 4 presents a weight mapping example, where (a) shows a W*W*C input feature map convolved with N kernels, each of size F*F*C, which results in an output feature map of size O*O*N. When unrolled into 1D, each kernel can be mapped on a column of the crossbar, and VMMs are performed as the corresponding inputs are fed in each cycle, which demonstrates the practicality and efficiency of the neuromorphic computation with RRAM.

Figure 4.

Weight mapping method from (a) a convolutional computation to (b) RRAM XB with the corresponding IFM in each cycle.

A more advanced RRAM crossbar structure, such as the 1-Transistor-1-Rram (1T1R) configuration, can prevent the crosstalk between adjacent cells to build large-scale arrays. As depicted in Figure 5(a), this configuration enables the meticulous control of each row of memristors with wordlines (WLs), which control the gates of the transistors. Input vector voltages are applied to the bitlines (BLs), with which the top electrodes of RRAM devices are connected. The weighted sum currents are readout through selected lines (SLs) in parallel. However, this configuration does not support the execution of parallel weighted sum currents. This limitation can be tackled by rotating the BLs by 90o, illustrated in Figure 5(b), thereby simulating the behavior of matrix multiplication shown in Figure 4.

Figure 5.

Transformation from (a) conventional 1T1R array to (b) the suitable crossbar array.

Advertisement

4. RRAM multilevel characteristics

An essential characteristic allowing RRAM to perform neuromorphic computation is its capability to store multiple resistance states, which significantly enhances the memory storage density. The RRAM devices with multiple resistance states are called multilevel cells (MLC). There are two primary methods to achieve multilevel RRAMs:

First, achieving one HRS and multiple LRS states by altering the magnitude of Icc. As previously mentioned, Icc determines the strength of the conductive filament; a higher Icc results in a stronger filament and thus lower resistance. Therefore, several distinct LRS states can be achieved by controlling Icc through the gate voltage. Second, achieving one LRS and several HRS states by altering the magnitude of the reset voltage with a constant Icc. A stronger reset voltage leads to greater dissolution of the filament, increasing the gap between the filament and the electrode, which increases the resistance of HRS. Therefore, several HRS states can be obtained by varying the reset voltages.

The inherent variability within RRAM, caused by the stochastic nature of the filament formation during the RS procedure, presents as a reliability challenge for RRAM-based DNN accelerators. The variability in a resistance level can be modeled using normal or log-normal distribution. As illustrated in Figure 6, this variability leads to overlapping areas in the consecutive resistance levels, which can cause computational errors. MLC with more resistance levels is prone to higher variability issue. Therefore, precise control over different resistance states is crucial for achieving reliable operations with RRAM devices.

Figure 6.

Probability distribution of a RRAM device with five conductance values.

Advertisement

5. RRAM-based DNN accelerators

An RRAM-based DNN Accelerator, as illustrated in Figure 7, consists of numerous tiles, each containing multiple processing elements (PEs), functional units necessary for DNN computations and embedded DRAM (eDRAM) for storing input data. Within a PE, there are multiple RRAM crossbars (XBs) for performing VMMs, supported by digital-to-analog converters (DACs), analog-to-digital converters (ADCs), input/output registers, and shift-and-add (S&A) units. Due to the bit-level decomposition of inputs and weights, S&A units are responsible for assembling the outputs based on the bit positions of both inputs and weights.

Figure 7.

Hierarchical architecture of an RRAM-based DNN accelerator.

VMM can be executed time-efficiently through RRAM-based crossbar structure thanks to its analog behavior. However, performing DNN computations necessitates converting the digital input data to analog voltage. Similarly, to obtain the output, the current measured in the bitline should also be converted back to digital data. Therefore, the inclusion of DACs and ADCs in an RRAM-based DNN Accelerator is essential. To better measure the currents for feeding into ADCs, sample-and-hold (S&H) circuits are usually utilized.

Although necessary, the presence of ADCs and DACs degrades the performance of the DNN accelerator due to their high energy consumption and area requirements. Previous studies have shown that ADCs and DACs account for over 60% of the energy consumption and 30% of the area. Furthermore, the energy consumption of DACs and ADCs scales exponentially with resolution.

To reduce the resolution of DACs, a common way is to perform bit-slicing of input, feeding it into the XBs bit-by-bit or two bits at a time, and then integrating the results with S&A operations. Bit-slicing can also contribute to the reduction of ADC resolution, while to further reduce its resolution, another way is to utilize lower-resolution RRAM devices. The ADC resolution A can be calculated with A=log2R×v×w, where R represents number of rows in a XB, v represents input bit, and w represents RRAM cell resolution.

For example, assuming a 128×128 XB is used, and both weights and inputs are quantized to 8 bits with each RRAM device representing 2-bit weight, the output currents captured in this case require 8-bit resolution ADCs. If the input is sliced into 1-bit increments, 1-bit resolution DACs are used, but 8 cycles are required to perform a complete computation due to the bit-slicing of an 8-bit input data.

Besides the above-mentioned approaches, McDaniel et al. [2] proposed term quantization (TQ), which imposes a limit on the maximum analog value for each BL to reduce the resolution requirement for ADCS. Specifically, TQ limits the number of nonzero power-of-two terms across a group of values. This approach leverages bit-level sparsity, allowing the replacement of 6-bit ADCs with six 3-bit ADCs, thereby significantly reducing both the area and power consumption.

E2

To better comprehend the VMM operation on RRAM-based XB, let us consider an example with 2-bit input data, 4-bit weights, and 1-bit cell resolution, as shown in Figure 8. The XB size is 2 by 8, with 4 weights mapped onto it. We aim to perform a VMM operation presented in Eq. (2) using 1-bit input bit-sliding.

Figure 8.

Example of a VMM operation in RRAM XB.

A key detail to note is that conductance is always a continuous positive value, whereas weights in neural networks can be both positive and negative. To address this, it is common to use two XBs: one to represent positive values and one to represent negative values. The final result is then obtained by subtracting the output of the negative XB from the output of the positive XB.

In the example in Figure 8, Wijm denotes the m-th bit of the weight, and Vin denotes the n-th bit of the input. It requires 2 cycles to process the entire input, typically starting from the most-significant bit (MSB) to the least-significant bit (LSB) in the input bit-slicing order. This way, when the partial result (Ijbn)obtained in each cycle is accumulated and stored in the output buffer, it only needs to be shifted x bits per cycle, where x is the increment in input bit-slicing. In this case, x is 1.

To generalize, assume the input is quantized to N×x bits, and each cycle feeds an x-bit increment of the input into the XB. Over N cycles, partial results IjbN to Ijb1 are generated. The final result Qj can be obtained using Eq. (3), which sums up the partial results after appropriate bit shifts.

Qj=n=1N2xn×Ijbn=IjbNx+IjbN1x+IjbN2x+x+Ijb1E3

To understand how Ijbn is generated, let us first consider the operations within a single cycle. In the first cycle, the 1st bit of the input is fed into the XB, and 8 outputs are generated. These outputs, Ij0 to Ij3 represent partial results, and they are passed to the S&A unit, where each is shifted to its bit significance: Ij1 is shifted by 1 bit (×2), Ij2 by 2 bits (×22), and Ij3 by 3 bits (×23), to obtain the Ijb1.

To generalize, Ijbn can be obtained with Eq. (4). Assume each weight is represented by M×c bits, where M is the number of cells, and c is the cell resolution in bits. Therefore, each bit of the weight is shifted by c×m bits, where m represents the m-th bit of the weight. The bitline current (Ijm) is accumulated across all enabled rows, say R rows, in accordance with Ohm’s law.

Ijbn=m=0M2cm×Ijm=m=0M2cm×i=1RVin×WijmE4

Increasing the resolution of the bit number of an input per cycle and the resolution of each RRAM device reduces the cycle numbers required for one computation and the number of computations, respectively, but exponentially increases the energy consumption due to the higher resolution requirement for DACs and ADCs. Furthermore, higher resolution of MLC RRAM devices is more susceptible to variability, leading to computational errors. This trade-off between resolution, energy consumption, and variability must be carefully considered in the design of RRAM-based DNN accelerators.

IR-drop or parasitic voltage drop is another critical concern for the reliability of RRAM-based DNN accelerators, with its impact increasing with the XB size. As the advancement of technology, the metallic wires in a chip shrink, resulting in an increase in the resistance per unit length. Because of the IR-drop, the voltage distribution within the crossbar becomes non-uniform, causing the input data sent to the RRAM cells on the same row to differ. Moreover, it also causes the current deviation of the output column in the crossbar.

The voltage of the top and bottom electrode of each RRAM device can be represented, respectively, as Eq. (5) and (6), where r is the wire resistance, which is assumed to be identical over the whole crossbar.

Vi,j=Vi,j1r·k=jNGi,kVi,kVi,kE5
Vi,j=Vi,j1r·k=1i1Gk,jVk,jVk,jE6

Eq. (5) and (6) can be easily derived by referring to Figure 9. According to Kirchhoff’s current law, the current across row i can be represented by the sum of current split to the columns (Eq. (7)), each of which can be represented by voltage drop across the conductance times the conductance value.

Figure 9.

Simplified 3x3 RRAM crossbar with IR-drop.

Ii=Ii,1+Ii,2+Ii,3=j=13Ii,j=j=13ΔVi,jGi,j=j=13Vi,jVi,jGi,jE7

Top electrode Vi,j can be derived by removing the voltage drop of the wire resistance from the top electrode potential of the previous column. Similarly, bottom electrode Vi,j can be derived by removing the voltage drop of the wire resistance from the bottom electrode potential of the previous row. The followings are the examples for deriving the voltages of the top and bottom electrodes of G3,2.

V3,2=V3,1ΔV=V3,1r·I3I3,1=V3,1r·I3,2+I3,3=V3,1r·j=23V3,jV3,j·G3,jV3,2=V2,2r·I1,2+I2,2=V2,2r·i=12Vi,2Vi,2·Gi,2E8

To mitigate the impact of IR-drop on the accuracy, Huang et al. [3] proposed to alleviate the current and voltage deviation by adding an additional tunable RRAM row. This approach adaptively compensates for current differences by tuning the input voltage and conductance of the RRAM cells in the additional row. The tunable RRAM cells adjust their resistance to balance the current flow, thereby reducing the discrepancies caused by IR-drop and improving the overall accuracy of computations in the crossbar array.

Taking into account the limitations of the resolution of DAC and DAC as well as the impact of IR-drop in the design of the RRAM-based DNN accelerators, operation-unit (OU)-based or block-based computation, can be adopted [4]. An example of OU-based computation is shown in Figure 10. OU-based computation splits the XB into multiple OUs, each of which can be activated individually at a time with adjustable sizes. Even though this approach requires more cycles to complete one computation, it significantly reduces the resolution requirement and power consumption of ADCs and helps mitigate the voltage inconsistencies across the crossbar. To compensate for the potential performance degradation caused by the increased number of cycles, pipelining techniques can be employed in OU-based computation to enhance the overall efficiency and throughput of the RRAM-based DNN accelerators.

Figure 10.

OU-based RRAM architecture.

Advertisement

6. Real world AI applications with RRAM-based accelerators

In the previous sections, we detailed how RRAM-based accelerators efficiently perform VMM operations in O(1) time complexity. As mentioned in Section 3, VMM operations are fundamental to DNNs, which are mainly composed of convolutional (CONV) layers and fully connected (FC) layers. As illustrated in Figure 4, convolutional operations can be easily executed through RRAM XBs by sequentially providing the corresponding input across the sliding window in the input feature map (IFM). Similarly, FC layer operations follow the same procedure, with the primary difference being the size of the kernels, which match the IFM size, as shown in Figure 11.

Figure 11.

(a) A fully connected layer (FC) and (b) a convolutional layer (CONV).

Convolutional neural networks (CNNs), a prevalent form of DNNs, excel in object recognition tasks, including image classification [5], speech recognition [6], and self-play games [7, 8]. CNN typically consists of five [5] to a thousand [9] CONV layers, each performing high-dimensional convolution. One to three layers of FC layers follow the CONV layers to learn the nonlinear combinations of high-level features for classification.

Aside from CONV layers and FC layers, various optional layers can be included in a CNN, such as nonlinearity, normalization, and pooling. These optional layers are usually carried out in nonlinear function units rather than in RRAM XBs in RRAM-based DNN accelerators [10].

6.1 Training and inference

We discussed performing VMM operations on RRAM-based DNN accelerators in previous sections, which are fundamental for the inference phase. Typically, these DNN models are trained in software before programmed onto the RRAM XBs allowing for immediate inference. However, there are scenarios where on-device learning is necessary to enhance performance. For instance, in environments with continuous data streams, incremental learning supports real-time inference more effectively. Additionally, classification performance can significantly degrade if a pre-trained model is directly programmed onto RRAM XBs due to defects in RRAM cells [11].

Small-scale RRAM XBs have been successful in simple pattern recognition tasks [11, 12]. However, in-situ training on RRAM XBs for larger-scale classification tasks faces challenges. The stochastic nature of the device introduces variability and abrupt switching during the set operation [12], while these challenges can be mitigated to a certain extent through material advancements such as utilizing HfAlyOx switching layer, which will be discussed in the next section. Another obstacle for larger-scale DNN application on RRAM XBs is the much higher power consumption of online learning compared to offline learning [13]. Inference typically uses low-resolution ADCs [14], while training requires high-resolution ADCs, ranging from 13-bit [15] to 16-bit [11] resolution. To address this, Yeo et al. [16] proposed replacing ADCs with 1-bit dynamic comparators, providing 1-bit A/D conversion by comparing the output voltage of the integrator, which converts the summed current of RRAM XBs to voltage, with a reference voltage.

Endurance is another critical reliability issue of RRAM, with cycles ranging from 105 to 107, potentially limiting its use for in-situ training. However, weight updates during training only require weak programming pulse to incrementally change conductance. Zhao et al. [17] demonstrated that RRAM cells remain operational even after applying over 1011 update programming pulses.

The training process of neural networks involves back propagation, which includes computing weight gradients and updates. This process is illustrated with a simplified 3-layer network example in Figure 12, where the nodes use a sigmoid function as the activation function (AF), as shown in Eq. (9), with the output presented as nodefi. The derivative of the sigmoid function, necessary for backpropagation, is shown in Eq. (10).

Figure 12.

Learning process of a simple 3-layer feedforward network.

AFx=11+expxE9
xAFx=x1+expx1=(1+expx2x1+expx=(1+expx2expx=expx1+expx2=11+expx·expx1+expx=11+expx·111+expx=AFx·1AFxE10

To update w1, the weight gradient is derived as in Eq. (11), where hf1h1 and yf1y1 are the derivatives of the sigmoid function shown (Eq. (10)), and ∂Eyf1 is the derivative of the mean squared error (MSE) loss function.

E11

Similarly, the weight gradients of w2 to w4 are derived as follows, where the same parameters are marked as the same colors.

E12
E13
E14

For the last layer weight w5 and w6, the derivatives are simpler, as shown in Eqs. (15) and (16).

∂Ew5=y1w5×yf1y1×∂Eyf1=hf1×yf1·1yf1×yf1ytarget=0.0409E15
∂Ew6=y1w6×yf1y1×∂Eyf1=hf2×yf1·1yf1×yf1ytarget=0.04107E16

Extending weight updates to conductance updates in RRAM XBs can be represented by Eqs. (17)-(20), where PLn is the output of the L -th layer for the n-th image in a batch size B, and η is the learning rate. PL1 in Eq. (18) indicates the input of the L-th layer (output of the L1-th layer), derived from PLG, corresponding to hiwj in Eqs. (11)-(14) and y1wj in Eqs. (15) and (16). Eqs. (19) and (20) represent the output error of the L-th layer and the last layer, respectively.

GnewL=GoldLΔGLE17
ΔGL=ηn=1BPL1nδLnE18
δLn=PLn1PLnGLδL+1nE19
δLn=PLn1PLn·PLnTnE20

In Yeo et al.’s implementation [13], ΔGL is obtained through software and then converted into the number of programming pulses via FPGA. With these programming pulses, each RRAM conductance is updated.

By addressing these challenges and implementing innovative solutions, RRAM-based accelerators can potentially support both training and inference for large-scale DNN applications, making them a viable option for real-time, edge-based AI computations.

Advertisement

7. Challenges in RRAM-based computation

VMM computations can be executed in O(1) time complexity on an RRAM crossbar, but performing accurate computations poses several challenges. Some of these challenges and their corresponding solutions have been briefly discussed earlier, including the high energy and area requirements for DACs and ADCs, as well as the impacts of IR-drop. Variability and stuck-at-faults are also crucial problems affecting the reliability of RRAM-based computation.

Variability, or variation, can be categorized into temporal (cycle-to-cycle, C2C) and spatial (device-to-device, D2D) variations. As previously discussed, the stochastic nature of RRAM filaments is prone to variability, leading to different final states after each set and reset process within a single RRAM device, known as C2C variation. D2D variation, on the other hand, refers to the variability among different RRAM devices, where two devices may reach different final states even if the same programming pulse is applied. This is due to the imperfect fabrication, leading to non-uniform thickness of switching film, surface roughness of electrodes, etc.

Variations issues can be addressed at both the device level and algorithm level. At the device level, a wide variety of materials and material engineering methods are being investigated to enhance the uniformity of RRAM devices. For example, embedding a thin layer of Al buffer is found to stabilize the oxygen vacancies in HfO2 films, which contributes to a more uniform set voltage and resistance distribution [18]. Programming techniques also significantly affect variability. For instance, the conductance spread can be reduced by increasing the gate voltage instead of top electrode voltage when programming a resistance state within an RRAM device [19].

While device-level techniques improve the variability of RRAM devices, leading to more consistent and mature memory, algorithm-level approaches are also essential for ensuring the reliability of RRAM-based computations. Unary-encoding has been proposed to mitigate the effects of variation on the most-significant bit (MSB) of a weight by equalizing the significance of each bit [20]. Leveraging the inherent redundancy within DNNs also proves to be effective against variation. Fritscher et al. [21] proposed three approaches: fault-aware training, the use of dropout layers, and the insertion of redundancy, to train DNNs that are less susceptible to variations.

Stuck-at-faults (SAFs) can occur due to immature fabrication. As reported by Chen et al. [22], SAFs frequently affect RRAM cells, with around 10% of RRAM cells exhibiting these faults, specifically, 9.04% of SA1 and 1.75% of SA0 in a fabricated 4-Mb HfO2-based RRAM test chip. When an RRAM cell is stuck at LRS, where a conductive filament cannot be dissolved through the reset voltage, it is referred to as stuck-at-zero (SA0). SA0 can occur due to over-forming, which results in temporarily failed cells at excessively low resistance states, and require a stronger reset voltage for reset process, but their operation region may be mismatched with the capabilities of the write circuit [23]. However, 60% of these temporarily failed cells are capable of recovering from over-forming to normal cells, which makes the real physical mechanism that governs the phenomenon of SA0 unknown so far [22]. Conversely, stuck-at-one (SA1) occurs when an RRAM cell is stuck at a HRS. This fault is due to the broken WLs or permanently open switches in RRAM XBs, which results in no current flow, mimicking a HRS despite the actual state of the cell.

To address the over-forming issue, Shih et al. [23] proposed including an additional sequence of write operations after the initial forming process, which aims to concentrate reset state and set state of cells into a range approximate to the margin of normal HRS and LRS region. Although this technique can help repair some cells, there remain unrepairable cells. Several algorithm-level approaches have been proposed to tackle the SAF problem. These approaches generally fall into three categories: 1. (Re-)Training-based approaches, 2. Error-correction-based approaches, and 3. (Re-)Mapping-based approaches.

(Re-)Training-based methods typically include modifying the objective functions, injecting the disturbance during training, or retraining the unmapped layers [24, 25]. The primary challenge with training-based approaches is the high runtime required and the potential difficulty in converging to high accuracy. Error-correction-based approaches involve measuring the defected output and performing the post-processing computation to compensate for the errors. This can include adding additional RRAM rows to compensate for the erroneous outputs or using alternative weight value to represent the original value [26, 27, 28]. However, these methods induce additional hardware overhead. (Re-)Mapping-based methods typically involve exchanging defected devices with normal devices through column/row-swapping or cell swapping [24, 25, 27]. While effective, these approaches also require additional peripheral hardware to manage the disordering input or output. Each of these strategies presents its own set of advantages and challenges, and often a combination of these methods is employed to ensure reliable operation in RRAM-based systems.

Advertisement

8. RRAM programming

To program an RRAM device, both set and reset processes are employed to modulate the resistance state. As illustrated in Figure 13, during the set operation, a set voltage is applied to the BL terminal of the cell, causing current to flow toward the SL terminal. While during the reset operation, a reset voltage is applied to the SL terminal of the cell, with current flowing toward the BL terminal.

Figure 13.

(a) 1T1R RRAM crossbar (b) Set (left) and reset (right) operations.

Various programming techniques have been developed. L. Gao et al. [29] have suggested incremental step pulse programming (ISPP), where the amplitude or width of the VBL or VSL is incrementally increased with a fixed gate voltage (Vg), as illustrated in Figure 14(a). Alternatively, Chen et al. [30] proposed incremental gate voltage programming (IGVP), which incrementally increases Vg while keeping the VBL and VSL constant, as illustrated in Figure 14(b). To precisely tune the resistance to the target value, write-and-verify programming scheme has been adopted. Specifically, after each programming pulse, the device conductance is read out with a large Vg to minimize the voltage drop across the transistor, which allows us to monitor the programming state, ensuring that the resistance is adjusted to the desired target value. If the conductance has reached the target conductance with a defined tolerance margin, the programming procedure stops. Otherwise, an additional pulse with an incremental voltage step is applied.

Figure 14.

(a) ISPP scheme and (b) IGVP scheme.

8.1 RRAM programming model

The relationship between the applied voltages and the conductance change during programming is nonlinear [31], as illustrated in Figure 15. This programming nonlinearity can be modeled with Eqs. (21)-(24) where long-term potentiation (LTP) and long-term depression (LTD) indicate the continuous process of set and reset operations, respectively. The nonlinearity factor θLTP/LTD is used to adjust the nonlinear curve of conductance change during the set (LTP) and reset (LTD) operations. Pmax denotes the maximum number of programming pulses, while Gmin/Gmax correspond to the maximum/minimum conductance of an RRAM device.

Figure 15.

Nonlinear programming curve.

GLTP=βLTP×1ePαLTP+GminE21
GLTD=βLTD×1ePPmaxαLTD+GmaxE22
βLTP/LTD=GmaxGmin1ePmaxαLTP/LTDE23
αLTP/LTD=θLTP/LTD×PmaxE24

As conductance changes over the pulse numbers, random variation occurs, depicted by the red and blue lines in Figure 15. These variations are due to C2C variation, which is induced by the stochastic nature of filament formation and dissolution, as previously discussed.

8.2 Row-by-row programming scheme

The most common programming scheme for RRAM devices is the cell-by-cell write-and-verify scheme [29]. However, as the model size increases, this approach becomes time-consuming. To improve programming efficiency, a row-by-row parallel-friendly IGVP scheme has been proposed [30]. By adjusting the conductance through controlling the Vg, programming multiple cells at a time becomes feasible. Although it is theoretically possible to program an entire row at once, practical limitations restrict the number of cells that can be simultaneously programmed. Therefore, the authors use the term ‘parallel group’ to describe the cells programmed at the same time.

Unlike the cell-by-cell programming scheme, which aims to minimize the average programming numbers (PN) for all cells across the entire crossbar, the parallel-friendly IGVP scheme focuses on minimizing the average critical path—the maximum PN within a parallel group—across all parallel groups. The presence of a critical cell can significantly reduce programming speed. Therefore, selecting an appropriate Vg step is crucial. A too small Vg step increases the number of pulses needed to fine-tune the conductance, while a too large Vg step leads to overshooting and oscillation around the tolerance margin, which also necessitates more pulses. It was shown that a Vg step of 0.2 V yields optimal results by minimizing the average critical path.

8.3 Multi-row programming scheme

To further enhance programming efficiency, the two-phase multi-row programming scheme has been proposed [32]. In the first phase, known as the predictive phase, the programming number for each RRAM device is initially estimated with the RRAM programming model at the largest voltage amplitude. A multi-row grouping method is then employed to determine which rows can be grouped together for simultaneous programming. An example of this method is shown in Figure 16. The estimated set and reset programming pulses are marked in the figure, and the cell with the highest pulse number in each row is encircled by a red block, referred to as the dominant cell. The BL voltage is applied based on the requirements of the dominant cell; if it requires set operations, a set voltage is applied, and if it requires reset operations, a reset voltage is applied.

Figure 16.

Illustration of the multi-row grouping method.

The objective is to find as many rows as possible to be programmed simultaneously without affecting the dominant cell of each row. For example, the 0th and 1st rows cannot be programmed together because after one programming pulse, the dominant cell of the 1st row shifts from the leftmost to the rightmost cell, changing the pulse number from 3 to 4 reset pulses while the pulse number of the original dominant cell is changed from 4 to 3 set pulses. Therefore, in the example, 1st and 2nd rows are the only rows that can be selected as a group and programmed together in this programming pulse. The first phase continues until all the estimated programming numbers reach zeros. In the second phase, the fine-tuning phase, the actual conductances are read out and compared with the estimated conductance to calibrate the RRAM programming model. Subsequently, row-by-row and cell-by-cell programming are employed to fine-tune the resistance states.

The two-phase multi-row programming significantly enhances the programming efficiency compared to row-by-row programming. However, it faces challenges such as conflicts in voltage requirements for cells within the same column, forcing some programming to still be performed row-by-row. Additionally, even though it reduces the overall pulse numbers, it can increase the pulse number of certain cells, such as the middle cell on the 1st row whose pulse numbers increase from 1 to 2 set pulses. To address these problems, Chen et al. [31] have proposed a block-based programming scheme, which effectively manages the necessary pulse numbers and alleviates the potential IR-drop concerns. By confining the programming operations to smaller blocks, the scheme ensures more uniform voltage distribution and enhances the accuracy of the RRAM device states.

8.4 Block-based multi-line programming scheme

The block-based programming scheme is split into two phases: the approximation phase and the fine-tuning phase. The primary goal is to reach the target conductances as closely as possible with larger voltage steps during the approximation phase, thereby reducing the time and energy required in the fine-tuning phase, in which a finer voltage step is applied. Both phases utilize the block-based multi-line programming algorithm (BLMP), as depicted in Figure 17(a).

Figure 17.

(a) An example of BLMP and (b) the block-based hierarchical architecture [31].

In the BLMP, the strategy is to program only the necessary cells. Initially, two predictive programming maps are generated, each recording the number of pulses required for set and reset operations estimated by the RRAM programming model. Figure 17(a) exemplifies a set predictive programming map, where each value represents the required number of set pulses for a cell. To maximize the number of rows programmed simultaneously, the column with the highest total pulse count is selected for each iteration. In the example provided, during the first iteration, three columns have identical pulse numbers of 2; thus, the leftmost column is selected, in which 1st and 2nd rows require one programming pulse each, enabling the WLs of these two rows. Subsequently, the columns where the 1st and 2nd rows also require programming pulses are also selected, optimizing the number of cells programmed simultaneously. After each iteration, the predictive programming map is updated. During the second iteration, as shown in the figure, only the middle column is selected with its 0th and 2nd rows programmed. The BMLP algorithm ends when all values in the predictive map reach zeros.

To fully optimize the number of rows and columns simultaneously activated, it is desirable to have similar weight configurations, specifically, the same distribution of zeros and non-zeros in a column across different columns. This allows more columns to be chosen in a programming cycle. To achieve this, a programming-aware retraining method is proposed. In this method, the column with the fewest zero weights is selected, and all the rows where the zero weights are located are retrained to zeros across the columns. Meanwhile, the rest of the rows are ensured not to be zeros. This approach maximizes the number of rows and columns that can be programmed simultaneously, thereby improving the overall efficiency of the programming process.

To facilitate this multi-line programming, a hierarchical block-based architecture is designed, as illustrated in Figure 17(b). Given the standard XB size of 128 by 128, reducing the hardware overhead for decoding such a large array necessitates a hierarchical design, which comprises decoders and drivers: the decoder selects the block to be programmed, and the driver enables the specific columns and rows.

Advertisement

9. Conclusion

In summary, resistive random-access memory (RRAM) has emerged as a focal point in research due to its ability to perform vector-matrix multiplication—an essential operation in deep neural networks—within O(1) time complexity using crossbar structures for in-memory computation. However, RRAM technology is not as mature as other memory types, and reliability remains a significant concern, including issues related to variation, stuck-at faults, and IR-drop. Additionally, the need for analog-to-digital converters (ADC) and digital-to-analog converters (DAC) hinders RRAM-based accelerators from achieving optimal energy efficiency. Programming weights onto RRAM crossbars is also a time-consuming process, emphasizing the importance of efficient programming and minimizing redundant energy consumption.

Researchers have proposed various approaches to address these challenges, as briefly summarized in this chapter. Furthermore, real-word AI applications on RRAM-based DNN accelerators have been discussed, illustrating how training and inference are realized through the analog switching behavior of RRAM.

References

  1. 1. Hennessy JL, Patterson DA. Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design). 5th ed. Morgan Kaufmann; 2011
  2. 2. McDanel B et al. Saturation RRAM leveraging bit-level sparsity resulting from term quantization. In: Proc. International Symposium on Circuits and Systems. 2021 IEEE International Symposium on Circuits and Systems (ISCAS); 2021. pp. 1-5
  3. 3. Huang C et al. Efficient and optimized methods for alleviating the impacts of IR-drop and fault in RRAM based neural computing systems. IEEE Journal of the Electron Devices Society. 2021;9:645-652
  4. 4. Lin MY et al. DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning. In: Proc. International Conference on Computer-Aided Design. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD); 2018. pp. 1-8
  5. 5. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Vol. 1. Lake Tahoe, Nevada: Curran Associates Inc.; 2012. pp. 1097-1105
  6. 6. Dahl GE et al. Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. pp. 8609-8613
  7. 7. Silver D et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016;529(7587):484-489
  8. 8. Silver D et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science. 2018;362(6419):1140
  9. 9. He K et al. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. pp. 770-778
  10. 10. Shafiee A et al. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Vol. 44, No. 3. ACM SIGARCH Computer Architecture News; 2016. pp. 14-26
  11. 11. Can L et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nature Communications. 2018;9(1):2385
  12. 12. Yao P et al. Face classification using electronic synapses. Nature Communications. 2017;8(1):15199
  13. 13. Injune Y et al. A hardware and energy-efficient online learning neural network with an RRAM crossbar array and stochastic neurons. IEEE Transactions on Industrial Electronics. 2020;68(11):11554-11564
  14. 14. Chen W-H et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE; 2018
  15. 15. Cai F et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nature Electronics. 2019;2(7):290-299
  16. 16. Yeo I, Myonglae C, Byung GL. A power and area efficient CMOS stochastic neuron for neural networks employing resistive crossbar array. IEEE Transactions on Biomedical Circuits and System. 2019;13(6):1678-1689
  17. 17. Zhao M et al. Characterizing endurance degradation of incremental switching in analog RRAM for neuromorphic systems. In: 2018 IEEE International Electron Devices Meeting (IEDM). IEEE; 2018
  18. 18. Milo V et al. Multilevel HfO2-based RRAM devices for low-power neuromorphic networks. APL Materials. 2019;7(8)
  19. 19. Milo V et al. Optimized programming algorithms for multilevel RRAM in hardware neural networks. In: 2021 IEEE International Reliability Physics Symposium (IRPS). 2021. pp. 1-6
  20. 20. Sun Y et al. Unary coding and variation-aware optimal mapping scheme for reliable ReRAM-based neuromorphic computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2021;40(12):2495-2507
  21. 21. Fritscher M et al. Mitigating the effects of RRAM process variation on the accuracy of artificial neural networks. In: International Conference on Embedded Computer Systems. Springer Interna-tional Publishing; 2021
  22. 22. Chen CY et al. RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Transactions on Computers. 2015;64(1):180-190
  23. 23. Shih HC et al. Training-based forming process for RRAM yield improvement. In: 29th VLSI Test Symposium, Dana Point. 2011. pp. 146-151
  24. 24. Xu Q et al. Reliability-driven neuromorphic computing systems design. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2021. pp. 1586-1591
  25. 25. Huang Y et al. Bit-aware fault-tolerant hybrid retraining and remapping schemes for RRAM-based computing-in-memory systems. IEEE Transactions on Circuits and Systems II: Express Briefs. 2022;69(7):3144-3148
  26. 26. Shin H et al. Fault-free: A fault-resilient deep neural network accelerator based on realistic ReRAM devices. In: 2021 58th ACM/IEEE Design Automation Conference (DAC). 2021. pp. 1039-1044
  27. 27. Zhang F, Hu M. Defects mitigation in resistive crossbars for Analog vector matrix multiplication. In: 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). 2020. pp. 187-192
  28. 28. He Z et al. Noise injection adaption: End-to-end ReRAM crossbar non-ideal effect adaption for neural network mapping. In: 2019 56th ACM/IEEE Design Automation Conference (DAC). 2019. pp. 1-6
  29. 29. Gao L, Chen PY, Yu S. Programming protocol optimization for analog weight tuning in resistive memories. IEEE Electron Device Letters. 2015;36(11):1157-1159
  30. 30. Chen J et al. A parallel multibit programing scheme with high precision for RRAM-based neuromorphic systems. IEEE Transactions on Electron Devices. 2020;67(5):2213-2217
  31. 31. Chen WL et al. A novel and efficient block-based programming for ReRAM-based neuromorphic computing. In: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). 2023. pp. 1-9
  32. 32. Zhang GL et al. An efficient programming framework for memristor-based neuromorphic computing. In: 2021 Design, Automation & Test in Europe Conference & Exhibition. 2021. pp. 1068-1073

Written By

Fang-Yi Gu

Submitted: 19 June 2024 Reviewed: 25 June 2024 Published: 02 September 2024