Open access peer-reviewed chapter - ONLINE FIRST

FPGA-Based Spiking Neural Networks

Written By

Ali Mehrabi and André van Schaik

Submitted: 04 June 2024 Reviewed: 03 July 2024 Published: 09 August 2024

DOI: 10.5772/intechopen.1006168

Recent Advances in Neuromorphic Computing IntechOpen
Recent Advances in Neuromorphic Computing Edited by Kang Jun Bai

From the Edited Volume

Recent Advances in Neuromorphic Computing [Working Title]

Dr. Kang Jun Bai and Prof. Yang (Cindy) Yi

Chapter metrics overview

101 Chapter Downloads

View Full Metrics

Abstract

This chapter explores the development and application of Spiking Neural Networks (SNNs) on Field-Programmable Gate Arrays (FPGAs), tracing their evolution since the debut of FPGAs in mid-1980s. It begins by examining the historical growth of FPGAs, emphasizing their role in developing complex neural network architectures. The narrative then charts the advancement of SNN designs on FPGAs, from early experiments to modern-day applications, spotlighting significant technological milestones and breakthroughs. The main emphasis is on the design and implementation strategies for SNNs on FPGAs, incorporating the latest research aimed at optimizing hardware use and computational efficiency. The chapter outlines effective techniques for mapping SNN models onto FPGA resources. Discussions include computational models of biological neurons on FPGAs, designing SNN accelerators to harness FPGA’s parallel processing capabilities, implementing SNN simulators, time-multiplexed neuronal networks, large SNN architectures on FPGA, and self-trainable neural architectures. This comprehensive blend of concepts and practical methodologies sets the foundation for designing modern SNNs that can be adapted for a range of advanced applications.

Keywords

  • neuromorphic computing
  • FPGA
  • Spiking Neural Networks
  • system on chip
  • network on chip
  • reconfigurable hardware

1. Introduction

Field-Programmable Gate Arrays (FPGAs) have undergone a remarkable transformation since their inception, evolving from simple configurable logic fabrics to sophisticated System on Chips (SoCs) capable of tackling a broad spectrum of advanced computing challenges. This evolution has been driven by persistent advancements in Moore’s Law [1], continuous refinement of fabrication processes, and a series of significant architectural innovations. Moore’s Law’s impact is vividly demonstrated in the exponential growth observed in FPGA logic capacity. This is illustrated by the dramatic increase in the number of logic elements (LEs) available in commercial FPGA products from leading manufacturers like Altera (now part of Intel) and Xilinx (now part of AMD), from just a few in their initial offerings to millions in today’s high-end devices (as depicted in Figure 1). Initially, FPGAs in the mid-1980s were limited to thousands of logic elements, constraining their use to relatively simple combinatorial and sequential functions. In contrast, modern high-end FPGAs routinely boast millions of logic elements, enabling them to handle the realization of highly complex digital subsystems or even entire standalone systems within a single device. Fundamentally, early FPGAs primarily utilized Look-Up Tables (LUTs) as their configurable building block. While LUTs are versatile, they have limitations in the complexity of logic functions they can express within a given area. The introduction of Adaptive LUTs (ALUTs) marked a significant advancement, allowing for more efficient mapping of a broader range of logic circuits onto FPGAs. This enhancement significantly improved FPGA usability in the design of complex finite-state machines, control logic, and data paths. Recognizing the performance bottlenecks and specialized needs of certain industry segments, FPGA architectures have expanded to include dedicated hardware blocks integrated directly into the programmable fabric. Examples include Digital Signal Processing (DSP) blocks optimized for mathematical operations commonly found in communications, image, and video processing systems. Modern FPGAs often incorporate numerous DSP blocks, enabling designers to efficiently implement complex signal processing algorithms. Furthermore, the integration of high-speed transceivers, which support multi-gigabit communication protocols, has propelled FPGAs into a pivotal role within data-intensive networking, telecommunications, and high-performance computing (HPC) platforms. Arguably, the most transformative architectural shift has been the emergence of Programmable Systems-on-Chip (SoC) [2]. These devices integrate large amounts of programmable logic with embedded processors (both soft-core and hard-core), internal memory hierarchies, and a rich assortment of analog and digital peripherals. This paradigm shift has redefined the concept of FPGAs. What were once platforms for configurable logic are now highly capable heterogeneous computing engines, blurring the traditional lines between FPGAs, embedded processors, and even general-purpose computing systems. This relentless advancement in architecture is supported by remarkable improvements in semiconductor manufacturing processes. From technology nodes in the multiple micrometers range decades ago, persistent adherence to Moore’s Law has driven a dramatic scaling down of transistor feature sizes. Contemporary FPGAs use manufacturing processes at the 10 nm scale, and researchers are pushing into even smaller realms, exploring techniques as fine as 5 nm and smaller. The impact of this miniaturization is significant. The density of logic circuits on a chip skyrockets, the fastest speeds that FPGAs can operate increase, and, importantly, they use much less power. This dramatic improvement in efficiency and performance makes it possible to use FPGAs in situations where heat, power availability, and environmental conditions were once limiting factors—applications previously thought to be out of reach.

Figure 1.

Increase in reconfigurable logic FPGA products Altera and Xilinx products from 1984 to date.

1.1 FPGAs from configurable logic to neuromorphic architectures

The interest in Spiking Neural Networks (SNNs) is rapidly growing due to their potential for energy-efficient computing that resembles the functioning of the human brain. Inspired by the brain’s remarkably efficient communication and power usage, SNNs could fundamentally transform the way we process large datasets and recognize complex patterns. However, implementing them is challenging. Their event-driven nature and complex neuronal interactions pose significant obstacles. This is where FPGAs excel. Their adaptability, finely-tuned parallel processing capabilities, and seamless hardware-software integration make them an ideal fit for SNNs.

1.1.1 The FPGA advantage: flexibility, parallelism, and event-driven design

The strengths of FPGAs in the context of SNNs lie in three core areas:

  • Tailored architectures: unlike CPUs and GPUs, whose fixed hardware excels within established computational paradigms, FPGAs provide a blank canvas for hardware design. Researchers can escape traditional computation models and instead craft architectures closely mirrored to the specific structure, dynamics, and connectivity of SNNs. This customization translates directly into superior performance and power efficiency gains compared to conventional computing platforms.

  • True parallelism: the massive parallelism inherent in neural networks well aligns with the programmable fabric of FPGAs. FPGAs excel at implementing numerous neuron and synapse models concurrently. This allows for the exploration of SNNs at scales difficult to achieve with traditional sequential or vector-based processors.

  • Embracing spikes: FPGAs offer a unique advantage for handling the fundamental characteristic of SNNs—their spike-driven communication model. The fine-grained control over clocking mechanisms, logic resources, and I/O within FPGAs enables the design of systems that are inherently event-driven. This compatibility minimizes overhead and maximizes accuracy in representing spiking behavior, an area where CPUs and GPUs can introduce timing distortions and inefficiencies.

1.1.2 Key considerations for SNN implementation on FPGAs

While FPGAs provide compelling advantages, the successful implementation of SNNs necessitates a careful balance between biological realism and hardware efficiency. Key considerations include:

  • Neuron design: implementing biologically plausible neuron models that fit efficiently into the FPGAs is favorable, yet the realization of these models on a digital platform can be complex and may not always be resource-efficient. Researchers often face trade-offs between the complexity of neuronal dynamics, such as choosing between simpler models like Leaky Integrate-and-Fire (LIF) [3] or more complex ones like Hodgkin-Huxley [4], and the constraints imposed by available hardware resources. This balancing act is crucial to optimizing both the performance and the feasibility of implementing these models on FPGAs.

  • Synaptic representation: the choice of memory structures to store and manage synaptic connectivity and weights has profound implications for the scalability, speed, and power consumption of the SNN. Both FPGA on-chip memory resources (e.g., BRAM) and external (off-chip) memory may be utilized, often in a hierarchical fashion to balance capacity and access speeds.

  • Learning mechanisms: the implementation of on-chip learning algorithms tailored to FPGAs is an active and exciting area of research. On-chip learning provides flexibility and the potential for real-time adaptation, key requirements for autonomous and embedded systems. FPGA implementations need to address the efficient implementation of algorithms such as Spike-Timing-Dependent Plasticity (STDP) and its variants.

  • Communication and spike encoding/decoding: Spiking Neural Networks (SNNs) uniquely process spatio-temporal information, known as “spikes.” For SNNs to function, traditional data types must be converted into spikes. This transformation involves various methods for encoding and decoding spikes [5], which are crucial to the architectural design of the SNN. The chosen method for spike encoding/decoding directly influences how the network is structured and operates. Efficient and high-speed communication with the external world, alongside robust spike coding and decoding, are vital elements in the design of SNNs, impacting their overall performance and effectiveness.

  • Modularity and expandability: designing an SNN in a modular fashion allows for easier expansion and upgrading of the network. Modularity ensures that different parts of the network can be independently developed, tested, and optimized. This approach supports scalable solutions that can grow in complexity and size as needed without requiring a complete redesign.

Advertisement

2. Early implementations of SNNs on FPGA

The mid-1990s marked an important era for the application of Field-Programmable Gate Arrays (FPGAs) in modeling biological neurons. In 1996, a study by Rossmann et al. [6] made notable progress by implementing a Hebbian learning algorithm [7] on a Xilinx XC3090 FPGA as depicted in Figure 2. This effort was among the first to realize such an algorithm in hardware, with the complexity of each neuron requiring a dedicated FPGA. The researchers built a system consisting of 18 individual neurons, each connected to five sensory synapses. Distributed across multiple FPGAs operating at 5 MHz, the system achieved an update rate of 333 kCUPS (Kilo Connection Updates Per Second). Notably, this design was successfully applied in an autonomous vehicle (Wheelchair), enabling it to learn and navigate obstacles. Building on their earlier achievements, Rossmann et al. continued to refine their work in [8], implementing a neuron model with Hebbian learning (Figure 3) using a more advanced Xilinx FPGA, XC4028. This iteration featured a modular system that could accommodate up to eight units, each equipped with an FPGA and additional RAM. These units were linked through a backplane, which allowed for effective communication across the system. Within each module, individual neurons were tasked with autonomously computing their assigned portion of the neural network. Dedicated RAM served the purpose of storing input/output signals and internal register values. The neuronal architecture comprised various fundamental building blocks such as adders, multipliers, and non-linear low-pass elements. The Xilinx XC4028 FPGAs, connected to a global bus, provided access to synaptic weights and local registers’ information from a central controller. This setup allowed for a flexible connection strategy between neurons, where the output from one neuron could directly feed into another. The flexibility of this system made it suitable for a variety of practical applications. For instance, it was used in autonomous robots, and in developing adaptive velocity controllers for slot car racing. The system’s ability to receive sensor inputs also enhanced its capability to interact with and respond to its environment, making it a powerful tool for real-world applications. The FPGA implementation exhibited high-speed computation capabilities, with a single input/output operation taking only 212 ns. However, the internal clock frequency was limited to around 5 MHz. Despite achieving impressive individual module performance (198 MCPS and 66 MCUPS), the overall system encountered challenges inherent to discrete hardware implementations, i.e., the impact of the FPGA’s propagation delay on overall computation speed.

Figure 2.

Neuron architecture in [6]. A: a four synapse neuron with conventional excitatory and inhibitory synapses with fixed positive and negative weights and two Hebbian learning synapses. B: structure of Hebbian learning synapse including short-term memories at the input of each synapse realized by a digital LPF with a non-linear characteristic, threshold elements determining the different learning- and forgetting-modes, weight memories for each adaptable synapse. Adapted from [6].

Figure 3.

Neuronal computation module architecture consisting of a neuron and input/output memory. Adapted from [8].

Advertisement

3. FPGA evolution and impacts on SNN designs

In the early 2000s, as the capacity of FPGAs approached the threshold of millions of configurable logics. Computational elements, i.e., DSP modules, and block memories were integrated into FPGA fabrics. Partial configuration capabilities and soft-processor IPs added to FPGA design tools. Research on reconfigurable hardware gained significant attention during this period. An influential paper, “CAM-brain Machine” by de Garis [9], presented ideas on reconfigurable neural networks and the implementation of high-density neurons on FPGA chips. The CAM project, in particular, achieved a breakthrough by realizing a modular design comprising 75.4 million neurons using 72 Xilinx’s XC6264 FPGAs. Unlike previous approaches that relied on time-multiplexed utilization of neurons, these efforts aimed to more closely emulate brain functionality using a large array of neurons. This led to the emergence of two primary design methodologies in hardware-based Spiking Neural Networks. The first approach takes advantage of the inherent parallel processing capabilities of FPGAs and time multiplexing to precisely model neurons and speed up the neuronal computations that tend to be slow in traditional serial von Neumann architectures. The second approach focuses on building densely packed Spiking Neural Networks by utilizing the extensive configurable logic resources that FPGAs offer. Researchers typically navigate a balance between these two approaches, aiming to maximize both the size of the network and the speed of computations. This balancing act allows them to draw on the advantages of each method, enhancing both the performance and the scalability of neural network implementations on FPGAs.

Upegui et al. [10] proposed a functional spiking neuron model designed specifically for efficient hardware implementation. While this model simplifies many biological and software-based neuron characteristics, it retains functionality and tackles complex tasks like temporal pattern recognition. To compensate for the limited representational power of individual simplified neurons, the model utilizes a larger number of neurons. This trade-off between neuron complexity and network size becomes advantageous in hardware contexts due to the model’s architectural simplicity. The model captures key elements of spiking neurons, including membrane potential, resting potential, threshold potential, post-synaptic response, and after-spike response. The membrane potential acts as a counter, accumulating (or subtracting) a specific value upon receiving excitatory (or inhibitory) synaptic inputs. This potential gradually decays toward a resting state. If an input arrives while a previous post-synaptic response is ongoing, its effect is summed with the existing potential. When the threshold is reached, the neuron fires a spike, resets its membrane potential to a dedicated after-spike potential, and enters a refractory period. During this refractory period, the neuron recovers from firing, with its membrane potential returning to its resting state. Two types of refractoriness, absolute and partial, were implemented. Absolute refractoriness completely ignores incoming spikes, while partial refractoriness weakens their influence by a constant factor. The model uses a Moore finite-state machine (FSM), which is a basic but powerful hardware design tool for outlining sequential operations. It functions in two main states: operational and refractory (see Figure 4). In the operational state, the neuron processes and integrates incoming spikes, which affects its membrane potential. Once the membrane potential crosses a specific threshold, the neuron activates, fires a spike, and enters a hyperpolarized after-spike state before moving into the refractory phase. This refractory period essentially serves as a cooldown during which the neuron’s activity is paused, allowing its membrane potential to return to a resting state. After this, the neuron switches back to the operational state, ready to process new spikes. This architecture was implemented in a multi-layered setup and showed promise in solving pattern recognition problems. However, due to the simplified nature of the neuron model, the limited computational capabilities of individual neurons had to be compensated for by increasing the number of neurons. This trade-off in hardware resources is considered acceptable for less complex pattern recognition tasks, where a larger network of simpler neurons can still achieve effective results.

Figure 4.

Demonstration of refractory and operational states in neuron model proposed by Upegui et al. [10].

The mid-2000s marked a significant era in FPGA development, known as the age of accumulation [2], characterized by the emergence of SoC (System on Chip) FPGAs and a paradigm shift in FPGA manufacturing. SoC FPGAs integrate high-density reconfigurable logic elements, DSP units, and block memories alongside hard-processor units. These systems also typically include Physical layers (PHY) that support a variety of standard interface peripherals, such as DRAM and SRAM controllers, as well as high-speed Ethernet, I2C, SPI, UART, PCI, and USB interfaces, providing a robust platform for mixed hardware-software system design. During these years, researchers focused on harnessing the potential of reconfigurable and highly interconnected arrays of neural network elements within hardware to create powerful signal processing units. To accomplish this, they explored various architectures for Spiking Neural Networks across different platforms, ranging from multi-processor software-based systems to analog/mixed-signal implementations. Notably, FPGA-based architectures stood out for their exceptional flexibility in system design, positioning them as a preferred choice for implementing complex computational models. For instance, the processor-based SpiNNaker project [11] aims to develop a massively parallel computer capable of simulating SNNs of various sizes and topologies, with programmable neuron models. While this approach provides the flexibility necessary for modeling complex SNN computations, due to its size, cost, and power requirements, SpiNNaker does not target embedded systems applications, highlighting the need for more scalable and energy-efficient solutions.

Upegui et al. [12] enhanced their previous work by introducing an SNN platform comprising three key components: a hardware substrate, a computation engine, and an adaptation mechanism. The hardware substrate supports the computation engine also, offers the flexibility required for the adaptation mechanism to modify the engine. Taking advantage of the latest features in Xilinx FPGAs, such as Dynamic Partial Reconfiguration (DPR) [13], the authors utilized independent design (ID) partial reconfiguration flexibility in their platform. The computation engine, serving as the problem solver of the platform, is centered around the functional neuronal model introduced in their earlier work [10]. Meanwhile, the adaptation mechanism enables modifications to the function described by the computational part. This encompasses synaptic weight learning, which involves altering only the contents of memory, as well as module-restricted growing and pruning techniques where neurons may be enabled or disabled. The design integrates a simplified version of the Hebbian learning rule, which is well-suited for hardware implementation. The neural network is implemented in the reconfigurable section of the FPGA (refer to Figure 5B), and a Genetic search algorithm is employed to find the optimal configuration for the network. Thanks to the inclusion of the PowerPC hard-processor as a new feature in Xilinx Virtex II Pro devices, the genetic algorithm can be executed on the hard-processor on the fixed logic section of the FPGA. While this approach presents a trade-off between flexibility and performance through reconfigurable computing, it is important to note that the basic genetic algorithm [14], used to optimize connection weights during simulations, is computationally expensive and memory-intensive. Additionally, the authors’ experiments showed that it does not consider certain information that could be valuable for optimizing the network, such as the direction of the error, making it unsuitable for hardware implementation due to the hardware’s limited on-chip memory resources. Therefore, the development of a new learning algorithm specifically designed for online and on-chip implementation on FPGAs remains an open challenge.

Figure 5.

A: neuronal computation module architecture consisting of a neuron and input/output memory and B: design layout with partially re-configurable modules and fixed modules. Adapted from [12].

Hellmich and Klar [15] developed the Spiking Neural Network Emulation Engine (SEE), a multi-FPGA platform designed to overcome the challenge of memory bandwidth during the simulation of large-scale neural networks. SEE efficiently manages millions of neurons and over 800,000 synaptic weights. It features a flexible architecture that supports both sparse and dense connection schemes for neuron state calculations. The SEE architecture comprises three FPGAs, each dedicated to specific simulation tasks. The first FPGA used for simulation control employs two PowerPC (PPC) processors, a feature introduced in Xilinx Virtex II Pro devices, which handle simulation control tasks including initial network configuration, real-time monitoring, and management of event lists that track neuronal activity. This setup is shown in (Figure 6A). The second FPGA focuses on network topology computation (NTC), dynamically determining the impact of firing events on neurons and updating neuron states within event lists to maintain simulation accuracy and responsiveness, as depicted in (Figure 6B). The third FPGA, dedicated to neuron state computation (NSC), updates neuron states using numerical integration methods and manages synaptic weight updates for dynamic synapses, as illustrated in (Figure 6C). SEE’s distributed memory architecture allows for parallel access to event lists, tag fields, and weight memories, enhancing the system’s performance. The flexibility and modular design of SEE make it particularly effective for studying the intricate architectures found in biological neural systems, offering a robust platform for large-scale neural network simulations. The distributed memory architecture, coupled with its task-tailored FPGAs in SEE, provides significant performance enhancements. This setup allows for parallel memory access, a crucial feature when handling the typically large datasets in complex SNN simulations. The modular nature of SEE also boosts its utility, enabling the neuron state calculation module to be reconfigured to support diverse SNN models with varying connectivity patterns. SEE emphasizes numerical accuracy by utilizing the Bulirsch-Stoer integration method, ensuring that the simulated network dynamics closely mirror those observed in real-world SNN behavior. This approach helps avoid synchronization issues that might arise from simpler integration methods or analytical approximations. However, while SEE’s advanced design provides numerous advantages, it also introduces potential challenges. The complexity of its multi-FPGA setup and the blend of hardware and software components increase the intricacy of system setup, configuration, and debugging. Additionally, while SEE significantly enhances memory bandwidth, potential bottlenecks in memory access could still appear as SNNs scale to even larger sizes. Moreover, achieving real-time simulation for exceptionally large and complex networks within SEE’s design environment requires careful consideration and planning.

Figure 6.

SEE architecture across 3 Virtex-II pro FPGAs. A: simulation control by 2 embedded PowerPCs, B: network topology computation module, C: reconfigurable neuron state computation module. Adapted from [15].

Glackin et al. [16] developed a similar strategy to tackle the challenges of deploying large-scale Spiking Neural Networks (SNNs) on FPGAs. They chose to use the Integrate-and-Fire (I and F) conductance model to navigate the intricate logic demands that make fully parallel implementations impractical for expansive networks. Their approach cleverly balanced computational speed with efficient resource utilization by incorporating time multiplexing. This technique enhances the simulation capabilities of broad network topologies without overloading the FPGA hardware. At the heart of their method was a significant adaptation of the traditional FPGA-based neural network architecture to support the larger scale and complexity of SNNs by involving several key components. Firstly, the Integrate-and-Fire model was used to simplify the dynamic representation of neurons, significantly reducing the computational overhead compared to more complex models like Hodgkin-Huxley. This simplification made it more efficient to use hardware resources. Moreover, time multiplexing was implemented to address the limitations of FPGA resources by cycling through neuron simulations in distinct time slots. This allowed for the simulation of a larger network than would be possible at any single instance, effectively scaling up the network size through dynamic resource reuse. Another crucial element of their strategy was the integration of hybrid parallel processing and memory management. Multiple MicroBlaze soft-processors were employed to manage tasks requiring sequential processing, such as control operations and system management. These processors were essential for enhancing the system’s flexibility, managing the control flow within the FPGA, enabling dynamic reconfiguration of simulation parameters, and facilitating the interaction between computational elements and memory components. The architecture also utilized a combination of fast-access Local Memory Bus (LMB) Block RAM (BRAM) and external SRAM for additional storage, which were managed via the On-chip Peripheral Bus (OPB) and controlled by External Memory Controllers (EMC). Figure 7 illustrates Glackin et al.’s intricate multi-processor, multi-FPGA system tailored for large-scale SNN implementations. This schematic details how multiple MicroBlaze soft-processor cores are configured and interconnected within an FPGA to meet the demanding parallel processing requirements of extensive neural simulations. The diagram visually illustrates the complex setup involving MicroBlaze processors, LMB, OPB, and EMC, highlighting the sophisticated architecture and its intricate interconnections. Glackin et al.’s design implementation achieves significant performance improvements, executing real-time operations up to 12,500 times faster than real time and demonstrating the system’s ability to manage complex SNNs unmanageable on less dynamic platforms. However, the system faces several challenges. These mainly include the complexity of the setup, involving time multiplexing and multiple MicroBlaze processors, which adds significant complexity in system configuration and management. While time multiplexing permits larger networks, the absolute scale of implementable networks remains confined by the physical resources available on the FPGA. The performance is also dependent on the specifications of the FPGA used, potentially limiting the system’s application to high-end FPGA platforms.

Figure 7.

Architecture of FPGA-based multi-processor system for Spiking Neural Networks. This diagram depicts the sophisticated arrangement of multiple MicroBlaze soft-processor cores interconnected within an FPGA environment, showcasing the system’s capability to handle large-scale neural simulations. Adapted from [16].

Cassidy et al. [17] introduced an FPGA-based neural array employing Address Event Representation (AER) to interconnect neuromorphic sensors, such as event-based cameras and cochleas. Their design illustrated in Figure 8, utilized a simplified LIF neuron model, strategically selected for its balance between biological realism and computational feasibility. This system architecture not only supports the efficient simulation of neural dynamics but also enhances the array’s capability to interface seamlessly with various sensory inputs, providing a robust platform for studying and implementing Spiking Neural Networks. The neural array incorporates 32 identical LIF neurons arranged to exchange spikes via a multiplexer and demultiplexer configuration connected to a shared AER bus. This setup facilitates dynamic interaction across the neural network, supported by an AER mapper block that establishes extensive connectivity among neurons. Each neuron within the array operates on the LIF model, implemented using a 16-bit digital accumulator to manage the membrane potential. When this accumulator’s value exceeds a predefined threshold, it triggers a spike output and subsequently resets. The process resumes after an absolute refractory period during which the neuron is unresponsive to further inputs. This model also includes programmable parameters that define the relative refractory period and the exponential decay of the membrane potential, emulating a biological neuron’s leak current. Moreover, each neuron possesses 128 8-bit synaptic weights, capable of modeling both excitatory and inhibitory synapses. Synapses additionally incorporate a delay mechanism modeled as a circular, dual-port buffer to emulate the time delay caused by the neuron’s dendritic tree, with a maximum per-synapse delay of 128 clock cycles. The architecture integrates the Spike-Time-Dependent Plasticity (STDP) learning rule, essential for facilitating online learning within the neural network. STDP adjusts synaptic weights based on the timing of pre- and post-synaptic spikes. Specifically, synapses that effectively contribute to an output spike are strengthened, while those that do not are weakened. The modification of synaptic weights according to STDP is depicted in a function that shows weight adjustments relative to the timing of pre-synaptic spikes. The implementation involves storing pre-synaptic spike times in a circular buffer, which are then compared to the times of output spikes to calculate synaptic adjustments using a look-up table (LUT). The resulting weight change (ΔW) is applied via a read-modify-write operation, ensuring that synaptic strengths are updated in line with their contribution to neuron firing (Figure 8).

Figure 8.

Overview of the FPGA-based neuron array design by Cassidy et al. [17]. A: details the individual neuron architecture featuring components such as the synapses, delay mechanisms, a circular buffer for synaptic inputs, and an accumulator for integrating the inputs to trigger output spikes based on a threshold influenced by the exponential leak and relative refractory periods. B: shows the overall network architecture including interfaces for spike generation, AER mapping, and interfaces for USB communication with a PC. This section emphasizes the integration of the 32 I and F neuron model with system-wide components that manage the input, processing, and learning phases. C: implementation of STDP, illustrating the flow from spike timing to synaptic weight adjustment through a LUT. This block diagram displays the process of reading pre-synaptic spike times, determining synaptic weight adjustments, and applying these modifications to synapses in real time.

Pearson et al. [18] describe an FPGA-based array processor architecture capable of simulating large networks with more than a thousand neurons. This architecture’s scalability, however, is limited by a bus-based communication protocol. In contrast, Ros et al. [19] present a hybrid computing platform where the neuron model is implemented in hardware, and the network model and learning are managed in software, emphasizing a blend of execution environments. However, for large-scale SNNs, hardware interconnect poses significant challenges due to the high levels of required inter-neuron connectivity, often limiting the number of neurons that can be realized in hardware. The direct neuron-to-neuron interconnection also leads to non-linear growth in switching requirements as network size increases.

Advertisement

4. Advanced SNN architectures

Since the late 2000s, there has been a significant paradigm shift in designing Spiking Neural Networks (SNNs) on FPGAs, driven by the quest to develop neuromorphic computing systems capable of processing complex tasks. Central to this shift has been the focus on designing extra-large SNNs and simplifying neuron models. This has necessitated the adoption of alternative communication frameworks, such as Network-on-chip (NoC) technology, along with techniques like time multiplexing and leveraging off-chip memory access. The late 2000s witnessed the rise of Network-on-chip (NoC) technology in System on Chip (SoC) FPGAs, which became a game-changer for designing high-density SNNs. NoCs offer a scalable and efficient way to handle the complex communication needs of large-scale SNNs, which traditional bus-based systems cannot manage. Incorporating NoCs into SoC FPGAs has enabled the creation of highly interconnected and reconfigurable neural networks that significantly boost scalability and performance. This advancement allows for the real-time simulation of vast neural networks, closely mimicking the intricate connectivity and dynamics of biological systems. The flexibility of NoCs also allows for the dynamic reconfiguration of neural network parameters, making these systems more adaptive and efficient. Overall, the integration of NoC technology into SoC FPGAs has revolutionized the design of high-density SNNs, opening up new possibilities in neuromorphic computing and bringing neural networks closer to operating like the human brain.

In this context, the EMBRACE-FPGA architecture presented by Morgan-Cawley et al. [20, 21] is notable. The EMBRACE (Embedded Multicore Building Blocks for Reconfigurable Architecture Computational Emulation) project focuses on developing a scalable and flexible FPGA-based platform for emulating Spiking Neural Networks (SNNs). Rather than employing traditional analog neurons, EMBRACE models these neurons using soft-processors on the FPGA, facilitating a novel approach to neural computation that leverages the reconfigurability of digital systems. EMBRACE-FPGA architecture illustrated in Figure 9A features a 2-dimensional N×M array of interconnected SNN neural tiles, each linked in the North, East, South, and West directions to facilitate nearest-neighbor connections. The neural tile depicted in Figure 9B comprises critical elements like the NoC router (zoomed in Figure 9C) and neural cell components—neurons and synapses. Communication across these tiles is conducted through these cardinal direction ports, where the NoC router’s configuration memory stores vital information about the SNN topology, synaptic weights, and neuron firing thresholds. Each neural cell within the EMBRACE-FPGA can contain up to 32 synapses and a neuron, which are managed using a picoBlaze soft-processor. This setup enhances the SNN’s flexibility and functionality, supporting a wide range of applications by enabling the evolution and reconfigurability of the network’s architecture. The NoC routers manage data flow within the system; they use a round-robin arbitration policy to handle spike packet generation and routing, ensuring efficient communication between the neural tiles. Furthermore, the NoC supports dynamic SNN configuration changes, including adjusting synaptic connections and neuron thresholds. This reconfigurability helps adapt to different computational requirements and environmental changes. The routers enable spike packets to be routed from their source to the appropriate destination synapse through time-multiplexed connections, integrating sophisticated packet handling protocols to ensure that data reaches its target effectively. The NoC router’s finite-state machine (FSM) oversees packet routing based on the neuron’s activity—firing spikes when the neuron’s output, or when incoming spike packets are received. This system’s flexibility is further demonstrated by its ability to configure SNN parameters through NoC configuration packets, which carry information about the destination neural tile and the specific synaptic or neuron adjustments needed. The EMBRACE-FPGA architecture thus not only provides a robust platform for simulating neural activities and interactions but also facilitates complex network evolutions and configurations. This capability is essential for developing adaptive neural networks that can efficiently process information and respond to changes, making EMBRACE-FPGA a powerful tool in the field of neuromorphic computing. Carrillo et al. [22] introduced a hierarchical Network-on-Chip (H-NoC) architecture aimed at resolving scalability challenges in hardware implementations of SNNs. The H-NoC architecture employs a multi-level hierarchical structure comprising neuron facilities, tile facilities, and cluster facilities, each with specialized NoC routers to efficiently manage both local and global neuron connectivity. The architecture leverages a combination of star and mesh topologies, which significantly enhances scalability by allowing seamless expansion through the addition of more clusters without incurring substantial communication delays. Implemented on an FPGA, the architecture was tested using a 3 × 3 array of cluster facilities, demonstrating its high-throughput capability of handling up to 3.33 billion spikes per second and low-power consumption of 13.16 mW per cluster facility, thus validating its practical feasibility and efficiency for large-scale SNN applications. One of the standout features of this architecture is the spike traffic compression technique, which reduces data packet volume, thereby improving throughput and reducing power consumption. The hierarchical design introduces complexity, particularly in managing routing and traffic across different levels, and the system’s effectiveness is highly dependent on specific traffic patterns of the SNN, which may vary across applications. Moreover, while the FPGA implementation offers practical insights, much of the performance validation is based on simulations, which may not fully capture real-world conditions.

Figure 9.

A: EMBRACE-FPGA Platform, which features a two-dimensional N×M array of interconnected SNN neural tiles. Each tile in this array is connected in the north, east, south, and west directions. B: SNN neural tile block diagram. C: EMBRACE NoC. Illustrating a packet routing example (neuron connectivity inset). Adapted from [21].

A Polychronous Spiking Neural Network (PSNN) is an advanced form of neural network that utilizes the process of polychronization. In this process, spikes from different neurons travel down axons with precise delays, arriving at a target neuron simultaneously and causing it to fire, even though the source neurons fire asynchronously. This mechanism allows PSNNs to store and recognize complex spatio-temporal patterns, known as polychronous groups. The introduction of adaptive delay mechanisms allows these networks to modify synaptic delays dynamically, enhancing their ability to learn temporal patterns. According to Izhikevich [23], these networks can form more groups than there are neurons, suggesting a vast memory capacity. This feature is particularly important for implementing short-term memory in cognitive systems, as it enables the network to handle a large variety of behaviors and patterns. In essence, PSNNs are ideal for applications that require the recognition and storage of intricate temporal sequences, mimicking the sophisticated processing capabilities of the human brain. Wang et al. in their paper [24] argued that FPGAs are ideally suited for this implementation due to their reconfigurability and parallel processing capabilities, which can efficiently handle the complex computations required by PSNNs. Their proposed architecture primarily consists of spiking neurons interconnected through synapses capable of delay adaptation. The key innovation in this architecture lies in its ability to dynamically adjust synaptic delays based on the timing of spikes, which is a critical factor for learning temporal patterns. Each neuron in the network is modeled to emit a spike when its membrane potential exceeds a certain threshold. The neuron model typically integrates input spikes using a LIF model, a common approach in Spiking Neural Networks. This integration respects the timing and delay of each input spike, which can vary as the network operates and learns. The synapses are modeled not only to transmit spikes between neurons but also to adjust the transmission delay of these spikes. The delay adaptation mechanism is central to the architecture. Each synapse includes a delay line that can be dynamically adjusted. These delay lines are implemented using shift registers in the FPGA, allowing for the precise control of spike timing. Accompanying each synaptic unit is adaptation logic that modifies the delay based on the temporal patterns of activity observed. This adaptation is driven by a set of rules and algorithms that determine how and when to adjust the delay based on the network’s performance and learning progress. Each neuron and its associated synapses are implemented as a processing element (PE) on the FPGA. These PEs are capable of processing spikes independently, allowing for parallel processing across the network. The FPGA’s memory resources are utilized to store the state of each neuron, the synaptic weights, and the configuration of the delay lines. An on-chip communication infrastructure facilitates the transmission of spikes between neurons. This infrastructure must handle the variable delays introduced by the synaptic models, ensuring that spikes are delivered at the correct times. The network’s learning and adaptation capabilities are enhanced by plasticity mechanisms integrated into the neuron and synapse models. Synaptic weights and delays are dynamically adjusted through local learning rules, which are hard-coded into the hardware to function in real time. These rules account for the timing and sequence of spikes, optimizing the network’s structure to more accurately encode temporal patterns. The architecture is designed to be both scalable and modular, employing time multiplexing to support up to 1.15 million axons. This enables the network to comprehensively store and recall spatial-temporal patterns. The innovative deployment of a multiplexed neuron array, which simulates 4000 virtual neurons using only 128 physical neurons, showcases efficient resource utilization and scalability. Figure 10A illustrates the PSNN design communicating via AER protocol. In their later work, Wang et al. [25] presented a systematic approach to FPGA-based SNN implementation that significantly improves upon prior methodologies by incorporating memory management techniques, parallel processing, and modular design principles. They introduced a robust FPGA design framework tailored for large-scale Spiking Neural Networks, especially those characterized by high-density or all-to-all synaptic connections. This framework leverages a reconfigurable neural layer implemented through a time-multiplexing strategy, enabling the simulation of up to 200 K virtual neurons with a single physical neuron (compared to 4 K virtual neurons in their previous work). This approach utilizes only a minimal portion of the hardware resources available on commercial FPGAs, including entry-level models. Unlike traditional methods that rely on mathematical computational models, the physical neuron in this architecture is realized using a conductance-based model with randomized parameters across neurons, mimicking the natural variability found in biological neurons. Additionally, it incorporates a novel time-multiplexed reconfigurable neural layer equipped with an address buffer. This buffer dynamically generates fixed random weights for each synaptic connection as spikes arrive, significantly reducing memory usage. They showcased a network comprising 23 of these neural layers. Each layer hosts 64 k neurons, culminating in a network with 1.5 million neurons and 92 billion synapses. This configuration achieved a total spike throughput of 1.2 trillion spikes per second, operating in real time on a Virtex 6 FPGA, demonstrating the framework’s scalability and efficiency and its potential to handle extraordinarily large neural networks with reduced hardware demand. The real bottleneck of this design is the limited on-chip memory while the use of off-chip memory is limited by the communication bandwidth and significant increase in hardware complexity due to the off-chip memory interface requirements. Figure 10B shows the architecture of a Neural layer in [25].

Figure 10.

A: Structure of the PSNN proposed by [24]. The neuron array creates post-synaptic spikes, which are then passed to the axon array. The axon array, with adjustable delays, propagates these spikes and generates pre-synaptic spikes at the axon terminals. These pre-synaptic spikes are then fed back into the neuron array, potentially triggering neurons to fire. Both the connectivity and delay within the axon array are programmable, allowing for flexible network configurations. B: A neuron soma equipped with a single post-synaptic current generator (PSC_Gen) capable of producing both excitatory and inhibitory post-synaptic currents, modulated by synaptic weights from the address buffer. Linear summation of currents requiring only one generator per neuron in this model. The soma integrates these currents in a leaky manner to compute the membrane potential. Upon reaching a threshold, it generates a spike and resets the potential, entering a refractory period. Membrane potential values are read from and written back to on-chip SRAM at each cycle. Adapted from [25].

Neil et al. [26] introduced Minitaur, an event-driven Spiking Neural Network (SNN) accelerator designed to process data dynamically, making it particularly efficient for handling sporadic or bursting input patterns. Minitaur achieves significant computational efficiency, managing 19 million post-synaptic currents per second while consuming only 1.5 W of power. It supports up to 65,536 neurons per board, showcasing its capability for large-scale neural network tasks. In practical applications, Minitaur demonstrated impressive performance, achieving 92% accuracy on the MNIST handwritten digit classification and 71% on the 20 newsgroups text classification dataset. Minitaur addresses some of the challenges in training and implementing SNNs on FPGA platforms. One significant challenge is the difference in dynamics between spiking neurons and traditional artificial neurons, which complicates the training process. Minitaur mitigates this issue by utilizing rule-based connections that follow predictable patterns common in artificial neural networks like deep belief networks (DBNs), autoencoders, and multi-layer perceptrons. These rules allow connections to be defined in a ranged-rule format, simplifying storage and retrieval by only requiring start and end addresses for both source (SRC) and destination (DEST) neurons within each layer. This method reduces the complexity of connection lookups, decreases storage demands, and speeds up the processing of post-synaptic currents (PSCs) by retrieving destination addresses only when necessary, after the axonal delay. To optimize memory usage and processing efficiency, Minitaur stores spikes according to their source addresses rather than destination addresses. This approach significantly reduces the storage requirements and allows the system to process post-synaptic currents more efficiently. The cache locality is another critical aspect of Minitaur’s design. By exploiting patterns in the input data and assigning neuron IDs to specific computational cores, Minitaur ensures that each core handles a subset of neurons, thereby reducing contention and improving access times for neuron weights and state information. Additionally, Minitaur’s architecture integrates several design principles to enhance performance. It uses a simplified LIF neuron model, which balances biological realism with computational efficiency. The system employs an event-driven processing model, where computation occurs only upon receiving input events, thereby minimizing unnecessary processing and reducing power consumption. The FPGA implementation leverages the inherent parallelism of the hardware, with 32 parallel cores, and dedicated memory caches for neuron state and weight storage. This configuration allows Minitaur to handle a high volume of post-synaptic currents efficiently. Figure 11 demonstrates a simplified diagram of Minitaur architecture.

Figure 11.

The simplified structure of the Minitaur system, featuring 32 parallel cores and 128 MB of DDR2 main memory. Each core is equipped with 2 MB of state cache and 8 MB of weight cache, alongside two DSP units for fixed-point mathematical operations—one for decay multiplication and another for summing input currents. Additionally, 2 MB of RAM is utilized for exponential decay lookup, preloaded, and functioning as ROM. Adapted from [26].

Wang et al. [27] presented a massively parallel neuromorphic cortex simulator that was a significant achievement in the field of neuromorphic engineering, particularly in simulating SNNs on a large scale. This highly scalable and flexible architecture (Figure 12) can be deployed on both single and multiple FPGA boards, enabling the simulation of neural networks ranging from 20 million to 2.6 billion neurons in real time. This massive scale is achieved with remarkable energy efficiency, with the system consuming only about 1.62 μW per neuron. The architecture effectively mimics the hierarchical and localized connectivity of biological neural networks through its innovative use of minicolumns and hypercolumns as functional units, which are not direct replicas of biological structures but serve to enhance computational efficiency and parallelism. They introduced a sophisticated memory management system that utilizes both on-chip and off-chip memory to handle the extensive data throughput required by such large-scale simulations. Furthermore, the system’s ability to dynamically assign computational resources based on neuronal activity helps in optimizing power consumption and processing speed, making it a groundbreaking tool for real-time applications in computational neuroscience and potentially in other areas like robotics and artificial intelligence. The architecture of the neuromorphic cortex simulator, as depicted in Figure 13, is structured around a neural engine core that operates in conjunction with a master control unit, off-chip memories, and a high-speed serial interface. The neural engine itself is subdivided into three primary components: the minicolumn array, the synapse array, and the axon array. The minicolumn array hosts time-multiplexed minicolumns which simulate the neurons’ behavior, integrating inputs to generate spiking events. These events are then passed to the axon array, which is responsible for propagating the events with specific axonal delays to the synapse array. Here, events are modulated based on synaptic weights and routed to their intended destinations across the neural network. The entire process is underpinned by a parameter look-up table (LUT), which stores crucial information on neuron parameters and connection rules, facilitating the dynamic configuration and flexibility required for simulating different neural network models. This architecture is designed to ensure that each component efficiently manages data flow and computation, directed by the master control unit which coordinates the interaction between the neural engine and the external memories. This setup supports a seamless and scalable emulation of complex neural dynamics, emphasizing real-time processing capabilities necessary for interactive applications. This design has been adapted by the authors in 2023 to create, a Neuromorphic Supercomputer that will come online in 2024 using 96 Stratix 10 MX boards [28]. This computer will be the first to simulate the same number of neurons with the same number of synaptic operations per second as a human brain. The system will be made available online for researchers worldwide to investigate SNNs at massive scales. The performance and scalability of this simulator are highly dependent on the underlying FPGA hardware. This means that the performance of the system will improve as FPGA technology improves over time, but it also makes it a costly endeavor as it requires high-end, often expensive FPGA boards to function optimally. While the system is designed to be flexible and can be adapted to different types of neural networks, neuron models, learning rules, or other specific experimental setups, doing so could require extensive reconfiguration and reprogramming, and would thus require significant effort.

Figure 12.

Architecture of the cortex simulator. This system includes a neural engine, a master controller, off-chip memory, and a serial interface. The neural engine mimics biological neural system structures, facilitating the simulation of their functions. The master controller manages data flow between the neural engine and off-chip memories, which are used for storing neural states and event data. Additionally, the serial interface facilitates communication with additional FPGAs and external controllers such as PCs. Adapted from [27].

Figure 13.

Diagram of the ODESA network. This illustration showcases the neuron layers, the dedicated training hardware modules, and the interconnections between them. Each layer is equipped with its own training module, enabling dynamic online adjustments to neuron weights and thresholds.

Recent research has increasingly focused on SNN architectures suitable for edge computing, particularly in areas like Intelligent Internet of Things (IoT) and autonomous systems. These applications benefit from the low-power consumption and high-speed processing capabilities of FPGAs. Moreover, the scalability of FPGA platforms allows for the deployment of complex SNN models that can be dynamically reconfigured as per the application’s requirements. This adaptability, combined with the inherent advantages of SNNs, such as their event-driven nature and potential for low-power operation, positions FPGA-based SNNs as a promising technology for advancing edge computing applications across various sectors. FireFly is a high-throughput hardware accelerator for Spiking Neural Networks introduced by Li et al. [29] aimed to address the limitations in arithmetic and memory efficiency often found in FPGA implementations. FireFly utilizes the DSP blocks in Xilinx Zynq Ultrascale FPGAs to enhance arithmetic operations. By generalizing the SNN arithmetic operation to a multiplex-accumulate operation, FireFly utilizes the capabilities of DSP blocks for efficient multiplication, addition, and data routing, thereby significantly boosting arithmetic performance. To overcome memory efficiency issues, FireFly introduces an innovative memory system that reduces on-chip RAM consumption while ensuring efficient data access. The design incorporates a partial sum and membrane voltage (Psum-Vmem) unified buffer, which balances off-chip memory access bandwidth with on-chip data buffering. This system supports the extensive parallel processing of spikes and synaptic weights, vital for maintaining high performance in real-world applications. Moreover, the use of line buffers and a stream width upsizer in the design minimizes data transfer latencies and maximizes throughput, further enhancing the system’s memory efficiency. Together, these innovations allow FireFly to operate on edge devices with limited resources while still delivering peak performance and maintaining the adaptability needed for various SNN models and applications. This robust architecture not only advances the computational capabilities of neuromorphic hardware but also pushes the boundaries of what can be achieved in terms of power and area efficiency on FPGA platforms. FireFly architecture is designed to be scalable, supporting large SNN models with millions of neurons and synaptic connections, particularly when deployed on larger FPGA devices.

In a transformative approach, Mehrabi et al. [30] introduced the Optimized Deep Event-driven Spiking Neural Network (ODESA), a novel SNN architecture optimized for computational efficiency on FPGA-based systems and capable of online, local supervised training. This multi-layered architecture employs a gradient-free, threshold-based learning algorithm that dynamically adjusts synaptic weights and neuronal thresholds based on local synaptic activities and a Winner-Takes-All (WTA) mechanism. This design simplifies the learning process by eliminating the need for complex backpropagation calculations. Each neuron in ODESA operates in an event-driven manner, processing inputs as discrete binary spikes, significantly reducing power consumption and processing requirements. The system is engineered for real-time learning and adaptation, with an efficient communication method between layers that uses direct spike transmission. This approach not only emulates the operational principles of biological neural networks but also leverages the inherent strengths of FPGA technology, such as parallel processing capabilities and reconfigurability. Notably, the architecture avoids time multiplexing, yet its reduced computational complexity allows the implementation of up to 95 K physical neurons on a Stratix FPGA family [31]. Each neuronal layer includes a trainer module that adapts neurons’ weights and thresholds during the online training phase, enhancing the system’s adaptability. This portability feature extends beyond specific FPGA overlays, making ODESA particularly suitable for real-time pattern recognition tasks, such as anomaly and intrusion detection.

Advertisement

5. Applications of FPGA-based SNNs

FPGA-based Spiking Neural Networks have been increasingly utilized across a diverse range of fields. The following are some prominent applications where FPGA-based SNNs have demonstrated significant potential:

5.1 Robotics and control

FPGA-based Spiking Neural Networks (SNNs) have found diverse applications in robotics, taking advantage of their unique capabilities for real-time control and adaptive behavior. As robots become increasingly integrated into various societal functions, they benefit significantly from SNNs’ ability to model and control complex behaviors. In locomotion systems, SNNs serve as Central Pattern Generators (CPGs) for multi-legged robots. Studies using Spartan 6 FPGAs have developed compact and programmable architectures for bipedal, quadrupedal, and hexapodal locomotion [32]. Furthermore, advancements in hardware SNN models, such as pyramidal neurons inspired by hippocampal functions, have been pivotal for autonomous navigation tasks, enhancing robotic spatial awareness and decision-making [33]. The application of Dynamic Adaptive Neural Network Arrays (DANNA) further exemplifies FPGA-based SNNs’ utility in robotic navigation by facilitating configurable neuromorphic computing elements that mimic neuronal connectivity crucial for adaptive navigation strategies [34]. Additionally, FPGA-based implementations of SNNs have been integral in obstacle avoidance tasks [6], utilizing runtime reconfigurable connectivity within 2D arrays of spiking neurons to enable real-time decision-making and path planning in autonomous robots [35]. SNNs implemented on FPGAs have been utilized in robotic applications such as obstacle avoidance. These networks help in real-time decision-making and sensory processing, which are crucial for autonomous navigation and interaction with dynamic environments. Pearson et al. [18] designed a Neuro-processor on FPGA to be used as part of the closed-loop system of a feedback controller.

5.2 Signal processing

The implementation of SNNs on FPGAs has shown considerable promise in the field of real-time signal processing. Applications in audio and speech processing benefit from the event-driven nature of spiking neurons [36], which are adept at handling complex, time-varying signals. This capability is crucial for tasks such as speech recognition and audio filtering, where the ability to process signals in real time is paramount. FPGA-based SNNs have also been instrumental in edge detection tasks, offering scalable solutions to reduce computation times and enhance performance [37]. Moreover, They have been applied in image segmentation and sound source separation using algorithms like the Oscillatory Dynamic Link Matcher (ODLM), underscoring their versatility and effectiveness in complex signal processing applications [38]. As an example, Iakymchuk [39] introduced an SNN architecture on FPGA to perform simple image transforms such as edge detection, spot detection or removal, and Gabor-like filtering without any further computation requirements.

5.3 Pattern recognition, anomaly detection

FPGA-based Spiking Neural Networks (SNNs) are utilized in various pattern recognition tasks, including image and speech recognition. The event-driven nature of spiking neurons makes them well-suited for efficiently detecting and processing patterns in large datasets. Examples of such implementations can be found in [40, 41], as well as fault detection and prediction in industrial applications proposed in [42, 43]. In the realm of cybersecurity, FPGA-based SNNs have been employed for detecting intrusions and abnormal network activities. Neuromorphic implementations of deep learning networks often provide accuracy comparable to that of full-precision models while significantly reducing power consumption and cost [44]. Additionally, FPGA-based SNNs have extensive applications in the biomedical field. Recent research by Scrugli et al. [45] highlights the use of a customized FPGA-based SNN for the real-time detection of arrhythmia, demonstrating the expanding scope of FPGA-accelerated neural network applications in biomedical signal processing.

5.4 Neuromorphic computing

One of the primary applications of FPGA-based SNNs is in the development of neuromorphic computing systems. These systems aim to replicate the architecture and functionality of the human brain, providing insights into brain function and potential advancements in artificial intelligence. A notable example of this application is the DeepSouth [28].

Advertisement

6. Challenges in implementing FPGA-based SNNs

Implementing Spiking Neural Networks (SNNs) on Field-Programmable Gate Arrays (FPGAs) presents several significant challenges. One primary issue is the inherent complexity of SNN models, which require precise temporal dynamics and event-driven processing, making it difficult to map these biological computations onto the general-purpose logic blocks of FPGAs. This often leads to inefficient utilization of hardware resources, increased area size, and higher power consumption. Additionally, the von Neumann bottleneck poses a substantial challenge as the bandwidth limitations of modern memory systems cannot keep pace with the big data requirements of SNNs. Scalability and interconnectivity also pose significant obstacles. As SNN’s neuron density increases, the interconnect complexity grows non-linearly, exacerbating routing latency and performance issues. Current FPGA routing structures struggle to support the high levels of inter-neuron connectivity required by large-scale SNNs, often resulting in limited network sizes and sub-optimal performance. Another challenge is the limited on-chip memory of FPGAs, which often necessitates the use of external memory, increasing power consumption and latency. Efficient memory management and innovative data storage solutions are crucial to address this limitation. Interconnectivity challenges in FPGAs significantly impact the performance of Spiking Neural Networks (SNNs) by limiting their scalability, increasing latency, and reducing overall efficiency. As the density of neurons in an SNN increases, the interconnect complexity grows non-linearly. For example, a two-layered feed-forward network with m neurons per layer has an interconnect density of m2. This rapid increase in interconnect requirements can overwhelm the routing resources of an FPGA, leading to congestion and increased signal propagation delays. Current FPGA architectures typically employ diagonal, segmented, or hierarchical two-dimensional routing structures, which are not optimized for the dense and dynamic connectivity patterns characteristic of SNNs. This inadequacy results in inefficient routing and higher switching requirements, further increasing latency and diminishing the performance of the implemented SNN. The challenge is even more pronounced in large-scale implementations, where high levels of inter-neuron connectivity are essential for accurate and efficient neural processing.

Moreover, the non-linear growth in routing complexity leads to increased power consumption as more resources are dedicated to maintaining the connectivity between neurons. This increased power demand can negate the inherent energy efficiency advantages of using SNNs, making FPGA implementations less attractive for applications requiring low-power consumption.

To mitigate these issues, as detailed in Section 4, researchers are exploring various routing optimizations and topologies. Techniques such as Network-on-Chip (NoC) architectures, which offer scalable and efficient interconnectivity, have shown promise in addressing the connectivity problems in FPGA-based SNNs. NoC topologies can provide higher levels of connectivity without excessive interconnect-to-device area ratios, thereby improving performance and scalability. Despite these advancements, achieving efficient and scalable interconnectivity in FPGA-based SNNs remains a significant challenge that requires ongoing research and innovation. Addressing memory bandwidth limitations in FPGA-based Spiking Neural Networks (SNNs) is critical for improving performance and scalability. Several solutions have been proposed to mitigate these limitations:

  • High Bandwidth Memory (HBM): HBM is a stacked DRAM integrated with the processing elements through a silicon interposer, offering significantly higher bandwidth compared to traditional memory technologies. For example, a single HBM2 block can provide a bandwidth of 256 GB/s, and a stack with four HBM blocks can reach up to 1 TB/s. Using HBM in FPGA-based SNN implementations can help alleviate the memory bandwidth bottleneck by enabling faster data transfer rates, which is essential for handling the large volumes of data processed by SNNs.

  • In-Memory Computing (IMC): IMC involves moving computational logic into the memory itself, thereby reducing the need to transfer data back and forth between the memory and the processing units. This approach is particularly effective for SNNs, as it can significantly reduce latency and power consumption associated with memory access [46]. IMC allows for parallel processing of memory cells, enhancing the overall throughput of the system.

  • Model Compression Techniques: Techniques such as model compression [47], pruning, and quantization reduce the amount of data that needs to be processed and stored. Model compression involves simplifying the neural network model without significantly impacting its performance. Pruning removes redundant or less significant neurons and synapses, while quantization reduces the precision of the weights and activations. These techniques help in lowering the memory bandwidth requirements and improve the efficiency of data transfer and storage.

  • Efficient Memory Management: Implementing efficient memory management strategies can help optimize the use of available memory resources. Techniques such as memory partitioning, caching, and pipelining can improve data access patterns and reduce memory contention. Effective management of on-chip and off-chip memory resources ensures that critical data is readily accessible, minimizing delays caused by memory bottlenecks.

  • Emerging Memory Technologies: Researchers are exploring the use of emerging memory technologies such as Phase Change Memory (PCM) [48], Spin-Torque-Transfer Magnetoresistive RAM (STT-MRAM), and Resistive RAM (ReRAM). These technologies offer advantages in terms of speed, power consumption, and density, which can help address the memory bandwidth limitations in SNN implementations.

Moreover, the development of SNNs on FPGAs requires sophisticated toolchains that can handle both hardware and neural network algorithm complexities. Existing high-level synthesis tools are not fully optimized for SNN development, requiring significant expertise to achieve efficient implementations. Power consumption remains a critical issue, with FPGAs typically consuming more power than application-specific integrated circuits (ASICs) for similar tasks. Techniques such as probabilistic spike propagation [49, 50] and event-driven processing have been proposed to mitigate this, but they add to the design complexity. Lastly, the evolving landscape of AI hardware, with emerging technologies like Phase Change Memories (PCM) and Spin-Torque-Transfer Magnetoresistive RAM (STT-MRAM), presents both opportunities and challenges for integrating these novel memory technologies into FPGA-based SNNs, necessitating continuous research and adaptation.

Advertisement

7. Conclusion

This chapter on FPGA-Based Spiking Neural Networks offers a thorough review of how Spiking Neural Network designs on FPGAs have evolved since the advent of FPGAs. Starting with the foundational designs of SNNs on FPGAs, the narrative progresses to elaborate on sophisticated architectures that aim to model and understand the intricate computational processes of neurons. It highlights the pivotal transformations brought about by FPGAs in neuromorphic computing, tracing the trajectory from basic implementations to the complex systems capable of simulating extensive neural networks today. A significant portion of the chapter is dedicated to exploring various architectures that have been developed, including everything from basic neuron models like integrate-and-fire, to more advanced setups that integrate dynamic learning capabilities and adaptability—a standout feature enabled by the reconfigurability of FPGAs. The discussion of SNN accelerators is particularly notable. These systems utilize the parallel processing capabilities of FPGAs to enhance neural computation speed and efficiency significantly. Over time, these accelerators have incorporated soft-processors and SoC capabilities on FPGAs, which manage more complex control flows and integrate learning algorithms directly into the hardware, showcasing the versatile applications of FPGAs in building more intelligent systems. Time-multiplexed designs also receive significant attention. These innovative designs allow for the simulation of large-scale neural networks by efficiently cycling through neuron simulations, thus offering a way to study complex neural behaviors within limited hardware scopes. Moreover, the chapter covers the development of self-trainable SNNs. These networks adjust and learn in real time, illustrating the potential of FPGAs to support systems that learn autonomously. This capability not only makes neural networks more adaptable but also opens up new avenues for real-world applications where learning on-the-fly is crucial. In conclusion, the chapter reflects on the promising future of FPGA-based SNNs given the continuous advancements in FPGA technology and ongoing research into neural networks. As hardware continues to shrink in size and grow in capability, and as FPGA designs become increasingly sophisticated, the potential for enhancing SNN capabilities seems limitless. The merging of FPGA flexibility, growing computational power, and advanced neural modeling techniques is likely to have a profound impact across various fields such as robotics, AI, and Neuromorphic computing, potentially leading to innovative approaches in the design and implementation of intelligent systems.

Advertisement

Authors contribution

Ali Mehrabi collected and reviewed the information and drafted the documents. André van Schaik provided feedback, edits, and additional opinions and interpretations.

References

  1. 1. Moore GE. Cramming more components onto integrated circuits. Proceedings of the IEEE. 1998;86(1):82-85
  2. 2. Trimberger SM. Three ages of FPGAS: A retrospective on the first thirty years of FPGA technology. IEEE Solid-State Circuits Magazine. 2018;10(2):16-29
  3. 3. Gerstner W, Kistler WM. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge, UK: Cambridge University Press; 2002
  4. 4. Hodgkin AL, Huxley AF. A quantitative description of membrane current and its application to conduction and excitation in nerve. Bulletin of Mathematical Biology. 1990;52:25-71. DOI: 10.1007/BF02459568 [Accessed: April 12, 2024]
  5. 5. Guo W, Fouda ME, Eltawil AM, Salama KN. Neural coding in spiking neural networks: A comparative study for robust neuromorphic systems. Frontiers in Neuroscience. 4 Mar 2021;15:638474, p. 21
  6. 6. Rossmann M, Hesse B, Goser K, Buhlmeier A, Manteuffel G. Implementation of a biologically inspired neuron-model in FPGA. In: Proceedings of Fifth International Conference on Microelectronics for Neural Networks. Lausanne, Switzerland: IEEE; 1996. pp. 322-329
  7. 7. Hebb D. The Organisation of Behaviour. NewYork, NY: John Wiley and Sons; 1949
  8. 8. Rossmann M, Burwick C, Bühlmeier A, Manteuffel G, Goser K. Neural dynamics in real-time for large scale biomorphic neural networks. In: ICANN 98: Proceedings of the 8th International Conference on Artificial Neural Networks. Skövde, Sweden ICANN 1998. Perspectives in Neural Computing. London: Springer; 2-4 September 1998. pp. 481-486
  9. 9. de Garis H, Korkin M, Fehr G. The CAM-brain machine (CBM): An FPGA based tool for evolving a 75 million neuron artificial brain to control a lifesized kitten robot. Autonomous Robots. 2001;10:235-249
  10. 10. Upegui A, Peña-Reyes CA, Sanchez E. A functional spiking neuron hardware oriented model. In International Work-Conference on Artificial Neural Networks. Berlin, Heidelberg: Springer Berlin Heidelberg; 3 Jun 2003. pp. 136-143
  11. 11. Ijaz Khan M, Lester DF, Plana LA, Rast A, Jin X, Painkras E, et al. SpiNNaker: Mapping Neural Networks onto a Massively-Parallel Chip Multiprocessor. 2008. DOI: 10.1109/ijcnn.2008.4634199 [Accessed: April 12, 2024]
  12. 12. Upegui A, Peña-Reyes CA, Sanchez E. An FPGA platform for on-line topology exploration of spiking neural networks. Microprocessors and Microsystems. 2005;29(5):211-223
  13. 13. Lim D, Peattie M. Two flows for partial reconfiguration: Module based or small bit manipulations. Xilinx, Inc. 17 May 2002:2100
  14. 14. Vose MD. The simple genetic algorithm: Foundations and theory. Massachusetts, USA: MIT Press; 1999
  15. 15. Hellmich H, Klar H. An FPGA based simulation acceleration platform for spiking neural networks. In: The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS’04. Vol. 2. Hiroshima, Japan: IEEE; 2004. pp. II-II
  16. 16. Glackin B, McGinnity TM, Maguire LP, Wu QX, Belatreche A. A novel approach for the implementation of large scale spiking neural networks on FPGA hardware. In: Cabestany J, Prieto A, Sandoval F, editors. Computational Intelligence and Bioinspired Systems. Vilanova i la Geltrú, Barcelona, Spain: IWANN; 2005. DOI: 10.1007/11494669_68 [Accessed: April 12, 2024]
  17. 17. Cassidy A, Denham S, Kanold P, Andreou A. FPGA based silicon spiking neural array. In: 2007 IEEE Biomedical Circuits and Systems Conference. Montreal, QC, Canada: IEEE; 2007
  18. 18. Pearson MJ, Pipe AG, Mitchinson B, Gurney K, Melhuish C, Gilhespy I, et al. Implementing spiking neural networks for real-time signal-processing and control applications: A model-validated FPGA approach. IEEE Transactions on Neural Networks. 2007;18(5):1472-1487. DOI: 10.1109/tnn.2007.891203
  19. 19. Ros E, Ortigosa EM, Agis R, Carrillo R, Arnold M. Real-time computing platform for spiking neurons (RT-spike). IEEE Transactions on Neural Networks. 2006;17(4):1050-1063. DOI: 10.1109/tnn.2006.875980 [Accessed: April 12, 2024]
  20. 20. Morgan F, Cawley S, McGinley B, Pande S, McDaid LJ, Glackin B, et al. Exploring the Evolution of NoC-Based Spiking Neural Networks on FPGAs. 2009. DOI: 10.1109/fpt.2009.5377663 [Accessed: April 12, 2024]
  21. 21. Cawley S, Morgan F, McGinley B, Pande S, McDaid LJ, Carrillo S, et al. Hardware spiking neural network prototyping and application. Genetic Programming and Evolvable Machines. 2011;12:257-280
  22. 22. Carrillo S, Harkin J, McDaid LJ, Morgan F, Pande S, Cawley S, et al. Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations. IEEE Transactions on Parallel and Distributed Systems. 2013;24(12):2451-2461. DOI: 10.1109/tpds.2012.289 [Accessed: April 12, 2024]
  23. 23. Izhikevich EM. Poly-chronization: Computation with spikes. Neural Computation. 2006;18:245-282
  24. 24. Wang R, Cohen G, Stiefel KM, Hamilton TJ, Tapson J, van Schaik A. An FPGA implementation of a polychronous spiking neural network with delay adaptation. Frontiers in Neuroscience. 13 Feb 2013;7:14
  25. 25. Wang R, Hamilton TJ, Tapson J, and van Schaik A. An FPGA Design Framework for Large-Scale Spiking Neural Networks. 2014 DOI: 10.1109/iscas.2014.6865169. [Accessed: April 12, 2024]
  26. 26. Neil D, Liu SC. Minitaur, an event-driven FPGA-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2014;22(12):2621-2628
  27. 27. Wang RM, Thakur CS, Van Schaik A. An FPGA-based massively parallel neuromorphic cortex simulator. Frontiers in Neuroscience. 10 Apr 2018;12:213. DOI: 10.3389/fnins.2018.0021
  28. 28. Deep South. Home [Internet]. Available from: https://www.deepsouth.org.au
  29. 29. Li J, Shen G, Zhao D, Zhang QW, Zeng Y. FireFly: A high-throughput hardware accelerator for spiking neural networks with efficient DSP and memory optimization. IEEE Transactions on Very Large Scale Integration Systems. 2023;31(8):1178-1191. DOI: 10.1109/tvlsi.2023.3279349 [Accessed: April 12, 2024]
  30. 30. Mehrabi A, Bethi Y, van Schaik A, Wabnitz A, Afshar S. Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA. arXiv (Cornell University); 2023
  31. 31. Mehrabi A, Bethi Y, van Schaik A, Afshar S. An optimized multi-layer spiking neural network implementation in FPGA without multipliers. Procedia Computer Science. 1 Jan 2023;222:407-414. DOI: 10.1016/j.procs.2023.08.179
  32. 32. Guerra-Hernandez EI, Espinal A, Batres-Mendoza P, Garcia-Capulin CH, Romero-Troncoso RD, Rostro-Gonzalez H. A FPGA-based neuromorphic locomotion system for multi-legged robots. IEEE Access. 2017;5:8301-8312
  33. 33. Mokhtar M, Halliday DM, Tyrrell AM. Autonomous navigational controller inspired by the hippocampus. In: 2007 International Joint Conference on Neural Networks. Orlando, FL, USA: IEEE; 2007. pp. 813-818
  34. 34. Mitchell JP, Bruer G, Dean ME, Plank JS, Rose GS, Schuman CD. NeoN: Neuromorphic control for autonomous robotic navigation. In: 2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS). Ottawa, ON, Canada: IEEE; 2017. pp. 136-142
  35. 35. Roggen D, Hofmann S, Thoma Y, Floreano D. Hardware spiking neural network with run-time reconfigurable connectivity in an autonomous robot. In: NASA/DoD Conference on Evolvable Hardware, 2003. Proceedings. Chicago, IL, USA: IEEE; 2003. pp. 189-198
  36. 36. Deng B, Fan Y, Wang J, Yang S. Auditory perception architecture with spiking neural network and implementation on FPGA. Neural Networks. 2023;165:31-42
  37. 37. Glackin B, Harkin J, McGinnity TM, Maguire LP, Wu Q. Emulating spiking neural networks for edge detection on FPGA hardware. In: 2009 International Conference on Field Programmable Logic and Applications. Prague, Czech Republic: IEEE; 2009. pp. 670-673
  38. 38. Caron LC, Mailhot F, Rouat J. FPGA implementation of a spiking neural network for pattern matching. In: 2011 IEEE International Symposium of Circuits and Systems (ISCAS). Rio de Janeiro, Brazil: IEEE; 2011. pp. 649-652
  39. 39. Iakymchuk T, Rosado-Munoz A, Bataller-Mompean M, Guerrero-Martínez JF, Francés-Villora JV, Wegrzyn M, et al. Hardware-accelerated spike train generation for neuromorphic image and video processing. In: 2014 IX Southern Conference on Programmable Logic (SPL). Buenos Aires, Argentina: IEEE; 2014. pp. 1-6
  40. 40. Paz IT, Hernández Gress N, González Mendoza M. Pattern recognition with spiking neural networks. In: Advances in Soft Computing and its Applications: 12th Mexican International Conference on Artificial Intelligence, MICAI 2013; November 24-30, 2013; Mexico City, Mexico, Proceedings, Part II 12. Berlin Heidelberg: Springer; 2013. pp. 279-288
  41. 41. Lammie C, Hamilton T, Azghadi MR. Unsupervised character recognition with a simplified FPGA neuromorphic system. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS). Florence, Italy: IEEE; 2018. pp. 1-5
  42. 42. Mehrabi A, Dennler N, Bethi Y, van Schaik A, Afshar S. Real-time anomaly detection using hardware-based unsupervised spiking neural network (TinySNN). In: 2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE). Ulsan South Korea: IEEE; 18 Jun 2024. pp. 1-8
  43. 43. Wang J, Li T, Sun C, Yan R, Chen X. Improved spiking neural network for intershaft bearing fault diagnosis. Journal of Manufacturing Systems. 2022;65:208-219
  44. 44. Zahm W, Stern T, Bal M, Sengupta A, Jose A, Chelian S, et al. Cyber-neuro RT: Real-time neuromorphic cybersecurity. Procedia Computer Science. 2022;213:536-545
  45. 45. Scrugli MA, Busia P, Leone G, Meloni P. On-FPGA spiking neural networks for integrated near-sensor ECG analysis. In: 2024 Design, Automation and Test in Europe Conference and Exhibition (DATE). Valencia, Spain: IEEE; 2024. pp. 1-6
  46. 46. AlZaabi M, Halawani Y, Mohammad B. In-memory computing using phase change memory. In: In-Memory Computing Hardware Accelerators for Data-Intensive Applications. Cham: Springer Nature Switzerland; 2023. pp. 81-96
  47. 47. Deng L, Li G, Han S, Shi L, Xie Y. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE. 2020;108(4):485-532
  48. 48. Liu ZC, Wang L. Applications of phase change materials in electrical regime from conventional storage memory to novel neuromorphic computing. IEEE Access. 2020;8:76471-76499
  49. 49. Zhu X, Yuan L, Wang D, Chen Y. FPGA implementation of a probabilistic neural network for spike sorting. In: 2010 2nd International Conference on Information Engineering and Computer Science. Wuhan, China: IEEE; 2010. pp. 1-4
  50. 50. Nallathambi A, Chandrachoodan N. Probabilistic Spike Propagation for FPGA Implementation of Spiking Neural Networks. Online. arXiv preprint arXiv:2001.09725. 2020

Written By

Ali Mehrabi and André van Schaik

Submitted: 04 June 2024 Reviewed: 03 July 2024 Published: 09 August 2024