Abstract
This chapter details the utilization of RRAM devices as key components in neuromorphic computing for efficient in-memory computing. Beginning with the fundamental mechanism of RRAM and its data storage capabilities and followed by efficient AI implementations with RRAM. This includes discussions on RRAM-based accelerators facilitating DNN computations with remarkable O(1) time complexity efficiency, as well as the RRAM’s multi-level characteristics. Subsequently, the chapter addresses challenges encountered in RRAM technology, such as variations, IR-drop issues, and the substantial energy and area requirements associated with DAC/ADC operations. Solutions to these challenges are briefly summarized. Emphasis is then placed on the critical issue of programming RRAM devices, with challenges including cycle-to-cycle variation and energy-intensive processes. Various programming techniques are explicated, accompanied by a comparative analysis of their respective advantages and drawbacks.
Keywords
- In-memory computing
- RRAM
- RRAM-based DNN accelerators
- reliability
- RRAM programming
1. Introduction
In the conventional Von-Neumann architecture, the separation of memories, logic units, and control units has led to the emergence of the ‘memory wall’ phenomenon, namely an enlarging divergence in performance evolution between processors and memories. It is reported that over recent years, this trend has become increasingly evident, with processor performance boosted by over 10,000 times while memory performance has seen only modest improvements, around 10 times [1]. In response to this challenge, researchers have turned their attention to exploring alternative memory technologies with superior performance and higher density storage. Nonvolatile memories (NVMs) have emerged as particularly promising candidates, whose superiorities include simple structures, nonvolatile data storage, low power consumption, high scalability, and compatibility with CMOS devices. Moreover, some of them possess the capability of not only storing data but also performing computational tasks.
Filament-type resistive random-access memory (RRAM) stands out as a prominent member of the NVM family, drawing significant interest from researchers and industry alike. Leveraging reversible resistive switching (RS) mechanisms, RRAM is capable of data storage. Beyond its storage capabilities, RRAM exhibits analog behavior, enabling it to perform multiplication computations. Later in this chapter, we will present the filamentary mechanism of RRAM, elucidating its intricacies in a straightforward manner. Additionally, we will explore how RRAM facilitates time-efficient vector matrix multiplication.
2. Resistive switching (RS) mechanism
An RRAM memory cell consists of a metal–insulator–metal (MIM) structure, where an insulating layer is sandwiched between two metal electrodes. The filamentary mechanism uses external voltage applied to the electrodes to control the growth or reduction of a filament to modulate the desired resistance value of the memory cell. The cell can be switched between a high resistance state (HRS) and a low resistance state (LRS), or logic ‘0’ and logic ‘1’. In the HRS, the filament is disconnected, resulting in high resistance, while in the LRS, the filament is connected, leading to low resistance. Additionally, RRAM possesses multi-level characteristics, which is an essential feature to achieve more compact memory density. This will be further explained later in the chapter.
An RRAM cell begins in its pristine state, with an insulating layer that requires a high voltage pulse, known as the ‘
There are two types of RRAM based on filament composition:
In
During the process of forming the filament in the dielectric layer, a
3. Efficient vector matrix multiplication (VMM) with RRAM devices
RRAM devices are capable of storing weights as resistance states. When RRAM devices are constructed into a crossbar structure, as depicted in Figure 3, vector matrix multiplication (VMM) can be easily executed due to the analog behavior of RRAM.
Eq. (1) illustrates an example of VMM, where
This in-memory VMM computation allows us to efficiently perform convolutional computations commonly used in deep neural networks (DNNs). Figure 4 presents a weight mapping example, where (a) shows a W*W*C input feature map convolved with N kernels, each of size F*F*C, which results in an output feature map of size O*O*N. When unrolled into 1D, each kernel can be mapped on a column of the crossbar, and VMMs are performed as the corresponding inputs are fed in each cycle, which demonstrates the practicality and efficiency of the neuromorphic computation with RRAM.
A more advanced RRAM crossbar structure, such as the 1-Transistor-1-Rram (1T1R) configuration, can prevent the crosstalk between adjacent cells to build large-scale arrays. As depicted in Figure 5(a), this configuration enables the meticulous control of each row of memristors with wordlines (WLs), which control the gates of the transistors. Input vector voltages are applied to the bitlines (BLs), with which the top electrodes of RRAM devices are connected. The weighted sum currents are readout through selected lines (SLs) in parallel. However, this configuration does not support the execution of parallel weighted sum currents. This limitation can be tackled by rotating the BLs by 90o, illustrated in Figure 5(b), thereby simulating the behavior of matrix multiplication shown in Figure 4.
4. RRAM multilevel characteristics
An essential characteristic allowing RRAM to perform neuromorphic computation is its capability to store multiple resistance states, which significantly enhances the memory storage density. The RRAM devices with multiple resistance states are called multilevel cells (MLC). There are two primary methods to achieve multilevel RRAMs:
First, achieving one HRS and multiple LRS states by altering the magnitude of Icc. As previously mentioned, Icc determines the strength of the conductive filament; a higher Icc results in a stronger filament and thus lower resistance. Therefore, several distinct LRS states can be achieved by controlling Icc through the gate voltage. Second, achieving one LRS and several HRS states by altering the magnitude of the reset voltage with a constant Icc. A stronger reset voltage leads to greater dissolution of the filament, increasing the gap between the filament and the electrode, which increases the resistance of HRS. Therefore, several HRS states can be obtained by varying the reset voltages.
The inherent variability within RRAM, caused by the stochastic nature of the filament formation during the RS procedure, presents as a reliability challenge for RRAM-based DNN accelerators. The variability in a resistance level can be modeled using normal or log-normal distribution. As illustrated in Figure 6, this variability leads to overlapping areas in the consecutive resistance levels, which can cause computational errors. MLC with more resistance levels is prone to higher variability issue. Therefore, precise control over different resistance states is crucial for achieving reliable operations with RRAM devices.
5. RRAM-based DNN accelerators
An RRAM-based DNN Accelerator, as illustrated in Figure 7, consists of numerous tiles, each containing multiple processing elements (PEs), functional units necessary for DNN computations and embedded DRAM (eDRAM) for storing input data. Within a PE, there are multiple RRAM crossbars (XBs) for performing VMMs, supported by digital-to-analog converters (DACs), analog-to-digital converters (ADCs), input/output registers, and shift-and-add (S&A) units. Due to the bit-level decomposition of inputs and weights, S&A units are responsible for assembling the outputs based on the bit positions of both inputs and weights.
VMM can be executed time-efficiently through RRAM-based crossbar structure thanks to its analog behavior. However, performing DNN computations necessitates converting the digital input data to analog voltage. Similarly, to obtain the output, the current measured in the bitline should also be converted back to digital data. Therefore, the inclusion of DACs and ADCs in an RRAM-based DNN Accelerator is essential. To better measure the currents for feeding into ADCs, sample-and-hold (S&H) circuits are usually utilized.
Although necessary, the presence of ADCs and DACs degrades the performance of the DNN accelerator due to their high energy consumption and area requirements. Previous studies have shown that ADCs and DACs account for over 60% of the energy consumption and 30% of the area. Furthermore, the energy consumption of DACs and ADCs scales exponentially with resolution.
To reduce the resolution of DACs, a common way is to perform
For example, assuming a 128×128 XB is used, and both weights and inputs are quantized to 8 bits with each RRAM device representing 2-bit weight, the output currents captured in this case require 8-bit resolution ADCs. If the input is sliced into 1-bit increments, 1-bit resolution DACs are used, but 8 cycles are required to perform a complete computation due to the bit-slicing of an 8-bit input data.
Besides the above-mentioned approaches, McDaniel et al. [2] proposed
To better comprehend the VMM operation on RRAM-based XB, let us consider an example with 2-bit input data, 4-bit weights, and 1-bit cell resolution, as shown in Figure 8. The XB size is 2 by 8, with 4 weights mapped onto it. We aim to perform a VMM operation presented in Eq. (2) using 1-bit input bit-sliding.
A key detail to note is that conductance is always a continuous positive value, whereas weights in neural networks can be both positive and negative. To address this, it is common to use two XBs: one to represent positive values and one to represent negative values. The final result is then obtained by subtracting the output of the negative XB from the output of the positive XB.
In the example in Figure 8,
To generalize, assume the input is quantized to
To understand how
To generalize,
Increasing the resolution of the bit number of an input per cycle and the resolution of each RRAM device reduces the cycle numbers required for one computation and the number of computations, respectively, but exponentially increases the energy consumption due to the higher resolution requirement for DACs and ADCs. Furthermore, higher resolution of MLC RRAM devices is more susceptible to variability, leading to computational errors. This trade-off between resolution, energy consumption, and variability must be carefully considered in the design of RRAM-based DNN accelerators.
The voltage of the top and bottom electrode of each RRAM device can be represented, respectively, as Eq. (5) and (6), where
Eq. (5) and (6) can be easily derived by referring to Figure 9. According to Kirchhoff’s current law, the current across row
Top electrode
To mitigate the impact of IR-drop on the accuracy, Huang et al. [3] proposed to alleviate the current and voltage deviation by adding an additional tunable RRAM row. This approach adaptively compensates for current differences by tuning the input voltage and conductance of the RRAM cells in the additional row. The tunable RRAM cells adjust their resistance to balance the current flow, thereby reducing the discrepancies caused by IR-drop and improving the overall accuracy of computations in the crossbar array.
Taking into account the limitations of the resolution of DAC and DAC as well as the impact of IR-drop in the design of the RRAM-based DNN accelerators, operation-unit (OU)-based or block-based computation, can be adopted [4]. An example of OU-based computation is shown in Figure 10. OU-based computation splits the XB into multiple OUs, each of which can be activated individually at a time with adjustable sizes. Even though this approach requires more cycles to complete one computation, it significantly reduces the resolution requirement and power consumption of ADCs and helps mitigate the voltage inconsistencies across the crossbar. To compensate for the potential performance degradation caused by the increased number of cycles, pipelining techniques can be employed in OU-based computation to enhance the overall efficiency and throughput of the RRAM-based DNN accelerators.
6. Real world AI applications with RRAM-based accelerators
In the previous sections, we detailed how RRAM-based accelerators efficiently perform VMM operations in O(1) time complexity. As mentioned in Section 3, VMM operations are fundamental to DNNs, which are mainly composed of convolutional (CONV) layers and fully connected (FC) layers. As illustrated in Figure 4, convolutional operations can be easily executed through RRAM XBs by sequentially providing the corresponding input across the sliding window in the input feature map (IFM). Similarly, FC layer operations follow the same procedure, with the primary difference being the size of the kernels, which match the IFM size, as shown in Figure 11.
Convolutional neural networks (CNNs), a prevalent form of DNNs, excel in object recognition tasks, including image classification [5], speech recognition [6], and self-play games [7, 8]. CNN typically consists of five [5] to a thousand [9] CONV layers, each performing high-dimensional convolution. One to three layers of FC layers follow the CONV layers to learn the nonlinear combinations of high-level features for classification.
Aside from CONV layers and FC layers, various optional layers can be included in a CNN, such as nonlinearity, normalization, and pooling. These optional layers are usually carried out in nonlinear function units rather than in RRAM XBs in RRAM-based DNN accelerators [10].
6.1 Training and inference
We discussed performing VMM operations on RRAM-based DNN accelerators in previous sections, which are fundamental for the inference phase. Typically, these DNN models are trained in software before programmed onto the RRAM XBs allowing for immediate inference. However, there are scenarios where on-device learning is necessary to enhance performance. For instance, in environments with continuous data streams, incremental learning supports real-time inference more effectively. Additionally, classification performance can significantly degrade if a pre-trained model is directly programmed onto RRAM XBs due to defects in RRAM cells [11].
Small-scale RRAM XBs have been successful in simple pattern recognition tasks [11, 12]. However, in-situ training on RRAM XBs for larger-scale classification tasks faces challenges. The stochastic nature of the device introduces variability and abrupt switching during the set operation [12], while these challenges can be mitigated to a certain extent through material advancements such as utilizing HfAlyOx switching layer, which will be discussed in the next section. Another obstacle for larger-scale DNN application on RRAM XBs is the much higher power consumption of online learning compared to offline learning [13]. Inference typically uses low-resolution ADCs [14], while training requires high-resolution ADCs, ranging from 13-bit [15] to 16-bit [11] resolution. To address this, Yeo et al. [16] proposed replacing ADCs with 1-bit dynamic comparators, providing 1-bit A/D conversion by comparing the output voltage of the integrator, which converts the summed current of RRAM XBs to voltage, with a reference voltage.
Endurance is another critical reliability issue of RRAM, with cycles ranging from 105 to 107, potentially limiting its use for in-situ training. However, weight updates during training only require weak programming pulse to incrementally change conductance. Zhao et al. [17] demonstrated that RRAM cells remain operational even after applying over 1011 update programming pulses.
The training process of neural networks involves back propagation, which includes computing weight gradients and updates. This process is illustrated with a simplified 3-layer network example in Figure 12, where the nodes use a sigmoid function as the activation function (AF), as shown in Eq. (9), with the output presented as
To update
Similarly, the weight gradients of w2 to w4 are derived as follows, where the same parameters are marked as the same colors.
For the last layer weight w5 and w6, the derivatives are simpler, as shown in Eqs. (15) and (16).
Extending weight updates to conductance updates in RRAM XBs can be represented by Eqs. (17)-(20), where
In Yeo et al.’s implementation [13],
By addressing these challenges and implementing innovative solutions, RRAM-based accelerators can potentially support both training and inference for large-scale DNN applications, making them a viable option for real-time, edge-based AI computations.
7. Challenges in RRAM-based computation
VMM computations can be executed in O(1) time complexity on an RRAM crossbar, but performing accurate computations poses several challenges. Some of these challenges and their corresponding solutions have been briefly discussed earlier, including the high energy and area requirements for DACs and ADCs, as well as the impacts of IR-drop. Variability and stuck-at-faults are also crucial problems affecting the reliability of RRAM-based computation.
Variations issues can be addressed at both the device level and algorithm level. At the device level, a wide variety of materials and material engineering methods are being investigated to enhance the uniformity of RRAM devices. For example, embedding a thin layer of Al buffer is found to stabilize the oxygen vacancies in HfO2 films, which contributes to a more uniform set voltage and resistance distribution [18]. Programming techniques also significantly affect variability. For instance, the conductance spread can be reduced by increasing the gate voltage instead of top electrode voltage when programming a resistance state within an RRAM device [19].
While device-level techniques improve the variability of RRAM devices, leading to more consistent and mature memory, algorithm-level approaches are also essential for ensuring the reliability of RRAM-based computations. Unary-encoding has been proposed to mitigate the effects of variation on the most-significant bit (MSB) of a weight by equalizing the significance of each bit [20]. Leveraging the inherent redundancy within DNNs also proves to be effective against variation. Fritscher et al. [21] proposed three approaches: fault-aware training, the use of dropout layers, and the insertion of redundancy, to train DNNs that are less susceptible to variations.
To address the over-forming issue, Shih et al. [23] proposed including an additional sequence of write operations after the initial forming process, which aims to concentrate reset state and set state of cells into a range approximate to the margin of normal HRS and LRS region. Although this technique can help repair some cells, there remain unrepairable cells. Several algorithm-level approaches have been proposed to tackle the SAF problem. These approaches generally fall into three categories: 1. (Re-)Training-based approaches, 2. Error-correction-based approaches, and 3. (Re-)Mapping-based approaches.
(Re-)Training-based methods typically include modifying the objective functions, injecting the disturbance during training, or retraining the unmapped layers [24, 25]. The primary challenge with training-based approaches is the high runtime required and the potential difficulty in converging to high accuracy. Error-correction-based approaches involve measuring the defected output and performing the post-processing computation to compensate for the errors. This can include adding additional RRAM rows to compensate for the erroneous outputs or using alternative weight value to represent the original value [26, 27, 28]. However, these methods induce additional hardware overhead. (Re-)Mapping-based methods typically involve exchanging defected devices with normal devices through column/row-swapping or cell swapping [24, 25, 27]. While effective, these approaches also require additional peripheral hardware to manage the disordering input or output. Each of these strategies presents its own set of advantages and challenges, and often a combination of these methods is employed to ensure reliable operation in RRAM-based systems.
8. RRAM programming
To program an RRAM device, both set and reset processes are employed to modulate the resistance state. As illustrated in Figure 13, during the set operation, a set voltage is applied to the BL terminal of the cell, causing current to flow toward the SL terminal. While during the reset operation, a reset voltage is applied to the SL terminal of the cell, with current flowing toward the BL terminal.
Various programming techniques have been developed. L. Gao et al. [29] have suggested incremental step pulse programming (ISPP), where the amplitude or width of the VBL or VSL is incrementally increased with a fixed gate voltage (Vg), as illustrated in Figure 14(a). Alternatively, Chen et al. [30] proposed incremental gate voltage programming (IGVP), which incrementally increases Vg while keeping the VBL and VSL constant, as illustrated in Figure 14(b). To precisely tune the resistance to the target value,
8.1 RRAM programming model
The relationship between the applied voltages and the conductance change during programming is nonlinear [31], as illustrated in Figure 15. This programming nonlinearity can be modeled with Eqs. (21)-(24) where long-term potentiation (LTP) and long-term depression (LTD) indicate the continuous process of set and reset operations, respectively. The nonlinearity factor
As conductance changes over the pulse numbers, random variation occurs, depicted by the red and blue lines in Figure 15. These variations are due to C2C variation, which is induced by the stochastic nature of filament formation and dissolution, as previously discussed.
8.2 Row-by-row programming scheme
The most common programming scheme for RRAM devices is the cell-by-cell write-and-verify scheme [29]. However, as the model size increases, this approach becomes time-consuming. To improve programming efficiency, a row-by-row parallel-friendly IGVP scheme has been proposed [30]. By adjusting the conductance through controlling the Vg, programming multiple cells at a time becomes feasible. Although it is theoretically possible to program an entire row at once, practical limitations restrict the number of cells that can be simultaneously programmed. Therefore, the authors use the term ‘parallel group’ to describe the cells programmed at the same time.
Unlike the cell-by-cell programming scheme, which aims to minimize the average programming numbers (PN) for all cells across the entire crossbar, the parallel-friendly IGVP scheme focuses on minimizing the average critical path—the maximum PN within a parallel group—across all parallel groups. The presence of a critical cell can significantly reduce programming speed. Therefore, selecting an appropriate Vg step is crucial. A too small Vg step increases the number of pulses needed to fine-tune the conductance, while a too large Vg step leads to overshooting and oscillation around the tolerance margin, which also necessitates more pulses. It was shown that a Vg step of 0.2 V yields optimal results by minimizing the average critical path.
8.3 Multi-row programming scheme
To further enhance programming efficiency, the two-phase multi-row programming scheme has been proposed [32]. In the first phase, known as the predictive phase, the programming number for each RRAM device is initially estimated with the RRAM programming model at the largest voltage amplitude. A multi-row grouping method is then employed to determine which rows can be grouped together for simultaneous programming. An example of this method is shown in Figure 16. The estimated set and reset programming pulses are marked in the figure, and the cell with the highest pulse number in each row is encircled by a red block, referred to as the
The objective is to find as many rows as possible to be programmed simultaneously without affecting the dominant cell of each row. For example, the 0th and 1st rows cannot be programmed together because after one programming pulse, the dominant cell of the 1st row shifts from the leftmost to the rightmost cell, changing the pulse number from 3 to 4 reset pulses while the pulse number of the original dominant cell is changed from 4 to 3 set pulses. Therefore, in the example, 1st and 2nd rows are the only rows that can be selected as a group and programmed together in this programming pulse. The first phase continues until all the estimated programming numbers reach zeros. In the second phase, the fine-tuning phase, the actual conductances are read out and compared with the estimated conductance to calibrate the RRAM programming model. Subsequently, row-by-row and cell-by-cell programming are employed to fine-tune the resistance states.
The two-phase multi-row programming significantly enhances the programming efficiency compared to row-by-row programming. However, it faces challenges such as conflicts in voltage requirements for cells within the same column, forcing some programming to still be performed row-by-row. Additionally, even though it reduces the overall pulse numbers, it can increase the pulse number of certain cells, such as the middle cell on the 1st row whose pulse numbers increase from 1 to 2 set pulses. To address these problems, Chen et al. [31] have proposed a block-based programming scheme, which effectively manages the necessary pulse numbers and alleviates the potential IR-drop concerns. By confining the programming operations to smaller blocks, the scheme ensures more uniform voltage distribution and enhances the accuracy of the RRAM device states.
8.4 Block-based multi-line programming scheme
The block-based programming scheme is split into two phases: the approximation phase and the fine-tuning phase. The primary goal is to reach the target conductances as closely as possible with larger voltage steps during the approximation phase, thereby reducing the time and energy required in the fine-tuning phase, in which a finer voltage step is applied. Both phases utilize the block-based multi-line programming algorithm (BLMP), as depicted in Figure 17(a).
In the BLMP, the strategy is to program only the necessary cells. Initially, two predictive programming maps are generated, each recording the number of pulses required for set and reset operations estimated by the RRAM programming model. Figure 17(a) exemplifies a set predictive programming map, where each value represents the required number of set pulses for a cell. To maximize the number of rows programmed simultaneously, the column with the highest total pulse count is selected for each iteration. In the example provided, during the first iteration, three columns have identical pulse numbers of 2; thus, the leftmost column is selected, in which 1st and 2nd rows require one programming pulse each, enabling the WLs of these two rows. Subsequently, the columns where the 1st and 2nd rows also require programming pulses are also selected, optimizing the number of cells programmed simultaneously. After each iteration, the predictive programming map is updated. During the second iteration, as shown in the figure, only the middle column is selected with its 0th and 2nd rows programmed. The BMLP algorithm ends when all values in the predictive map reach zeros.
To fully optimize the number of rows and columns simultaneously activated, it is desirable to have similar weight configurations, specifically, the same distribution of zeros and non-zeros in a column across different columns. This allows more columns to be chosen in a programming cycle. To achieve this, a programming-aware retraining method is proposed. In this method, the column with the fewest zero weights is selected, and all the rows where the zero weights are located are retrained to zeros across the columns. Meanwhile, the rest of the rows are ensured not to be zeros. This approach maximizes the number of rows and columns that can be programmed simultaneously, thereby improving the overall efficiency of the programming process.
To facilitate this multi-line programming, a hierarchical block-based architecture is designed, as illustrated in Figure 17(b). Given the standard XB size of 128 by 128, reducing the hardware overhead for decoding such a large array necessitates a hierarchical design, which comprises decoders and drivers: the decoder selects the block to be programmed, and the driver enables the specific columns and rows.
9. Conclusion
In summary, resistive random-access memory (RRAM) has emerged as a focal point in research due to its ability to perform vector-matrix multiplication—an essential operation in deep neural networks—within O(1) time complexity using crossbar structures for in-memory computation. However, RRAM technology is not as mature as other memory types, and reliability remains a significant concern, including issues related to variation, stuck-at faults, and IR-drop. Additionally, the need for analog-to-digital converters (ADC) and digital-to-analog converters (DAC) hinders RRAM-based accelerators from achieving optimal energy efficiency. Programming weights onto RRAM crossbars is also a time-consuming process, emphasizing the importance of efficient programming and minimizing redundant energy consumption.
Researchers have proposed various approaches to address these challenges, as briefly summarized in this chapter. Furthermore, real-word AI applications on RRAM-based DNN accelerators have been discussed, illustrating how training and inference are realized through the analog switching behavior of RRAM.
References
- 1.
Hennessy JL, Patterson DA. Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design). 5th ed. Morgan Kaufmann; 2011 - 2.
McDanel B et al. Saturation RRAM leveraging bit-level sparsity resulting from term quantization. In: Proc. International Symposium on Circuits and Systems. 2021 IEEE International Symposium on Circuits and Systems (ISCAS); 2021. pp. 1-5 - 3.
Huang C et al. Efficient and optimized methods for alleviating the impacts of IR-drop and fault in RRAM based neural computing systems. IEEE Journal of the Electron Devices Society. 2021; 9 :645-652 - 4.
Lin MY et al. DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning. In: Proc. International Conference on Computer-Aided Design. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD); 2018. pp. 1-8 - 5.
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Vol. 1. Lake Tahoe, Nevada: Curran Associates Inc.; 2012. pp. 1097-1105 - 6.
Dahl GE et al. Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. pp. 8609-8613 - 7.
Silver D et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016; 529 (7587):484-489 - 8.
Silver D et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science. 2018; 362 (6419):1140 - 9.
He K et al. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. pp. 770-778 - 10.
Shafiee A et al. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Vol. 44, No. 3. ACM SIGARCH Computer Architecture News; 2016. pp. 14-26 - 11.
Can L et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nature Communications. 2018; 9 (1):2385 - 12.
Yao P et al. Face classification using electronic synapses. Nature Communications. 2017; 8 (1):15199 - 13.
Injune Y et al. A hardware and energy-efficient online learning neural network with an RRAM crossbar array and stochastic neurons. IEEE Transactions on Industrial Electronics. 2020; 68 (11):11554-11564 - 14.
Chen W-H et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In: 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE; 2018 - 15.
Cai F et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nature Electronics. 2019; 2 (7):290-299 - 16.
Yeo I, Myonglae C, Byung GL. A power and area efficient CMOS stochastic neuron for neural networks employing resistive crossbar array. IEEE Transactions on Biomedical Circuits and System. 2019; 13 (6):1678-1689 - 17.
Zhao M et al. Characterizing endurance degradation of incremental switching in analog RRAM for neuromorphic systems. In: 2018 IEEE International Electron Devices Meeting (IEDM). IEEE; 2018 - 18.
Milo V et al. Multilevel HfO2-based RRAM devices for low-power neuromorphic networks. APL Materials. 2019; 7 (8) - 19.
Milo V et al. Optimized programming algorithms for multilevel RRAM in hardware neural networks. In: 2021 IEEE International Reliability Physics Symposium (IRPS). 2021. pp. 1-6 - 20.
Sun Y et al. Unary coding and variation-aware optimal mapping scheme for reliable ReRAM-based neuromorphic computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2021; 40 (12):2495-2507 - 21.
Fritscher M et al. Mitigating the effects of RRAM process variation on the accuracy of artificial neural networks. In: International Conference on Embedded Computer Systems. Springer Interna-tional Publishing; 2021 - 22.
Chen CY et al. RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Transactions on Computers. 2015; 64 (1):180-190 - 23.
Shih HC et al. Training-based forming process for RRAM yield improvement. In: 29th VLSI Test Symposium, Dana Point. 2011. pp. 146-151 - 24.
Xu Q et al. Reliability-driven neuromorphic computing systems design. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2021. pp. 1586-1591 - 25.
Huang Y et al. Bit-aware fault-tolerant hybrid retraining and remapping schemes for RRAM-based computing-in-memory systems. IEEE Transactions on Circuits and Systems II: Express Briefs. 2022; 69 (7):3144-3148 - 26.
Shin H et al. Fault-free: A fault-resilient deep neural network accelerator based on realistic ReRAM devices. In: 2021 58th ACM/IEEE Design Automation Conference (DAC). 2021. pp. 1039-1044 - 27.
Zhang F, Hu M. Defects mitigation in resistive crossbars for Analog vector matrix multiplication. In: 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). 2020. pp. 187-192 - 28.
He Z et al. Noise injection adaption: End-to-end ReRAM crossbar non-ideal effect adaption for neural network mapping. In: 2019 56th ACM/IEEE Design Automation Conference (DAC). 2019. pp. 1-6 - 29.
Gao L, Chen PY, Yu S. Programming protocol optimization for analog weight tuning in resistive memories. IEEE Electron Device Letters. 2015; 36 (11):1157-1159 - 30.
Chen J et al. A parallel multibit programing scheme with high precision for RRAM-based neuromorphic systems. IEEE Transactions on Electron Devices. 2020; 67 (5):2213-2217 - 31.
Chen WL et al. A novel and efficient block-based programming for ReRAM-based neuromorphic computing. In: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). 2023. pp. 1-9 - 32.
Zhang GL et al. An efficient programming framework for memristor-based neuromorphic computing. In: 2021 Design, Automation & Test in Europe Conference & Exhibition. 2021. pp. 1068-1073