Transfer Learning for Non-Invasive BCI EEG Brainwave Decoding

Xiaoxi Wei

doi:10.5772/intechopen.115124

Abstract

Brain-computer interfaces (BCIs) represent a rapidly advancing domain that enables the interpretation of human cognitive states and intentions through brainwave analysis. This technology has demonstrated significant potential in augmenting the quality of life for individuals with conditions such as paralysis by decoding their neural patterns. Electroencephalograms (EEG) are the cornerstone of this progress, providing a non-invasive and secure means of capturing brain activity. The integration of machine learning, particularly deep learning techniques, has considerably enhanced the accuracy of EEG interpretation in the last decade. However, a critical challenge persists in the training of machine learning algorithms on EEG data due to pronounced variability among individual brain activities. Such variability can result in suboptimal model performance when data availability is scarce. Transfer learning, a strategy successful in other domains like computer vision and natural language processing, offers a promising avenue to deal with the variability of heterogeneous EEG datasets. This chapter provides a comprehensive review of the current state of EEG transfer learning methodologies and an outlook on large-scale brainwave decoding.

Keywords

machine learning
deep learning
transfer learning
domain adaptation
brain-computer-interfaces (BCI)
electroencephalography (EEG)
GPT
LLM
pre-trained model

Author Information

Show +

Xiaoxi Wei*
- Imperial College London, London, UK

*Address all correspondence to: xiaoxi.wei18@imperial.ac.uk

1. Introduction

Advancements in machine learning, particularly deep learning, are revolutionizing the interpretation of human brain signals. This progress is significantly impacting the fields of disease diagnosis and brain-computer interfaces (BCIs) [1, 2]. Medical professionals utilize EEG, fMRI, and other forms of brain signal analysis to diagnose conditions like epilepsy and sleep disorders, allowing for early detection and potential prevention or treatment [3]. Moreover, for patients paralyzed by such illnesses, BCIs that employ both non-invasive EEG [4] and invasive electrocorticography (ECoG) [5] are being developed to help them regain control over their environment. The research in BCI has significantly contributed to the development of real-world applications. By early 2024, companies such as NeuroLink and others have led the commercialization efforts, suggesting that the application of brainwave decoding is moving from laboratories into practical use as the relevant technologies mature.

However, the utilization of diverse brain signal data collected by various labs, hospitals, and companies remains a significant challenge in brainwave decoding literature. This issue limits the effective use of large-scale brainwave datasets due to substantial variability introduced by differences in datasets, which can affect cross-session, cross-subject, and cross-dataset compatibility of machine learning models [6]. There are many causes for the differences in the brainwave data, for instance, hardware. The distributions of electroencephalogram (EEG) brainwave, known for its low signal-to-noise ratio, are usually amplified up to 10,000 times by amplifiers from different companies. Different amplifiers can result in distributional variability in the data collected. The arrangement of neurons and activation of different brain regions affect brainwaves, and this varies from person to person. For instance, in motor imagery, some individuals show activations at EEG electrode positions C1 and C2, while others show them at C3 and C4. These positions are all part of the motor cortex but are located differently. Mood, mental states, and experimental setups also impact how data are distributed, making it challenging to use large-scale brainwaves for machine learning algorithms.

Typically, established algorithms used in medical or BCI settings rely heavily on EEG signals from the individual subject being tested [1]. In these cases, practitioners focus on various signal characteristics such as frequency spectrum and correlation, which are crucial for diagnosing diseases. Advanced BCIs that can accurately interpret a person’s intentions or aid in speech usually require a prolonged gathering of signals customized for each individual. This process is not only arduous and time-consuming but also costly, resulting in inefficient utilization of the brain signals that are already being collected in laboratories and hospitals. The emergence of transfer learning, especially deep transfer learning, highlights the possibility of addressing the data heterogeneity toward large-scale brainwave decoding.

In the field of image processing and natural language processing (NLP), the adoption of transfer learning techniques has been significant in addressing discrepancies across diverse datasets and among different individuals [7]. Particularly, deep transfer learning has made notable contributions in computer vision (CV) and NLP, showcasing its efficacy in enhancing classification accuracy and enabling knowledge transfer across similar tasks [8]. Research has shown that pre-training on large datasets, such as ImageNet, can improve classification performance by transferring pre-learned weights [9, 10]. This approach capitalizes on the acquired knowledge from related tasks to refine the search process for novel tasks and equips them with relevant, pre-established knowledge.

In computer vision, further innovations have led to the development of deep transfer learning models like the deep adaptation network (DAN) [11] and deep domain confusion (DDC) [12], which effectively mitigate domain-specific variances. In the field of NLP, the introduction of cross-language networks exemplifies the application of transfer learning, where knowledge from one language’s speech is applied to the speech classification in another [13]. These models typically employ common ‘feature transformation’ layers to learn shared features across languages, while utilizing unique final classification layers that address the distinct challenges of each language. The recent development of large pre-trained models (e.g., GPT) [14] has further increased the transferability to utilize large-scale text and datasets.

Transfer learning has also revolutionized EEG decoding, where it enhances the learning process by integrating data from various EEG sessions or subjects [6, 15]. This chapter delves into these advancements in both transfer learning and its utilization in EEG decoding, illustrating their significant implications for disease diagnosis, BCI, and the broader field of neuroscience in decoding human brainwaves. This comprehensive review highlights the development of these techniques and their potential to push the boundaries of what is achievable in the scientific understanding of the brain.

2. Machine learning in human brain wave and EEG decoding

This section explores neuroimaging techniques and brainwave decoding, focusing on BCI and EEG decoding.

The field of neuroimaging employs various techniques for brainwave acquisition. Electrocorticography (ECoG) [5] is an invasive method where electrodes are implanted directly into the skull, providing clear signals with a high signal-to-noise ratio, though its use is generally limited to surgical settings due to its invasive nature. Electroencephalography (EEG) [4], in contrast, is non-invasive with electrodes placed on the scalp, offering advantages such as high temporal resolution, ease of use, affordability, and safety, despite its lower signal-to-noise ratio. Magnetoencephalography (MEG) [16] measures the magnetic fields produced by neural activity, offering precise spatial and temporal resolution, but requiring specialized equipment such as SQUID detectors. Other significant techniques include positron emission tomography (PET) [17] and single-photon emission computed tomography (SPECT) [18], which rely on the metabolism of a radiopharmaceutical introduced to the subject and whose decay emits gamma photons captured by imaging devices. Additional methods such as functional magnetic resonance imaging (fMRI) [19] and functional near-infrared spectroscopy (fNIR) [20] further enrich our understanding and interpretation of brain signals, each providing unique insights into brain function and neural activity.

These neuroimaging techniques have greatly advanced the decoding of brainwaves, especially impacting the development of BCI [21]. BCIs are sophisticated communication systems that create a direct pathway between the brain and external devices, bypassing conventional neuromuscular routes. By decoding neural signals, BCIs enable groundbreaking applications in assistive technology, allowing individuals with severe motor disabilities to control prosthetic limbs, computers, and other devices merely through their thoughts. This technology not only promises to restore lost functions but also to augment human capabilities, representing an integration of biological and technological advancements.

Among the above neuroimaging methodologies, EEG is particularly important in BCI research due to their non-invasive nature and the relative ease and efficiency of signal collection. There are certain types of EEG that are particularly advantageous for BCI applications. Steady-state evoked potentials (SSEP) [22] are typical responses to periodic stimuli like flickering lights. A common example is steady-state visual evoked potentials (SSVEP) [23], usually triggered by visual stimuli. Slow cortical potentials (SCP) [24] are gradual voltage shifts in the cortical potential that subjects can learn to control. The P300 signal [25], evident about 300 milliseconds after a subject perceives a rare or significant stimulus, is notable for its application in BCI due to the stimulus’ rarity enhancing signal strength. Event-related desynchronization (ERD) [26] and event-related synchronization (ERS) [27] describe decreases and increases in alpha and beta wave amplitudes during conscious activity and rest, respectively, pinpointing active brain regions. Motor imagery (MI) [28] is a technique where individuals mentally simulate movement without physically executing it. MI’s capacity for voluntary control, as opposed to reactive signals like SSVEP or P300 which depend on external stimuli, positions it as an ideal option for neurorehabilitation and sophisticated BCI applications.

Before the rise of deep learning, EEG classification was dominated by traditional machine learning methods that primarily utilized feature extraction techniques. Among these, the common spatial pattern (CSP) [29, 30] was particularly prominent, designed to project data into a space that maximizes variance between tasks, effectively enhancing class separability for EEG signals that correspond to different brain activities. Building on CSP, the filter bank CSP (FBCSP) [31] method automated the frequency selection process that was typically performed manually. This involved filtering the EEG data across various frequency bands, applying CSP to each, and subsequently using techniques like mutual information feature selection (MIFS) [32] or recursive feature elimination (RST) [33] to refine the feature set for optimal classification efficacy. These selected features would then be classified using conventional machine learning algorithms, such as the support vector machine (SVM) [34].

In traditional machine learning frameworks, researchers had to manually create features for EEG classification. However, this was complicated by the intricate nature of brain functions and the complex signal patterns of EEG. During the early stages of feature design, it was challenging to capture all the relevant signal characteristics necessary for optimal or comprehensive brainwave decoding due to the limited understanding of the complexities of the brain. This limitation hinders our ability to decode brainwaves in complex tasks where our understanding of the brain mechanism is insufficient.

The development of deep learning has brought about a major shift in the way brainwave decoding is done. Deep learning is particularly good at automatically extracting features directly from raw data, which eliminates the need for manual feature specification. In convolutional neural networks (CNNs), the initial layers identify low-level characteristics by optimizing parameters through backpropagation. The deeper layers progressively learn higher level features and structures, which enhances the model’s interpretative and predictive capabilities. This ability has led to extensive research into the application of deep learning for EEG signal decoding. Studies have shown that models like CNN can outperform traditional methods such as FBCSP in terms of accuracy and efficiency [35, 36].

Recent advancements have continued to underscore the potent impact of deep learning in EEG decoding. The development of sophisticated models with better generality like transformers and generative pre-trained transformers (GPT) [14, 37] reflects a broader trend toward leveraging large-scale neural networks, potentially in brainwave decoding. These models, which thrive on extensive datasets, are increasingly favored for their ability to push the boundaries of neural decoding, offering substantial improvements over traditional and even earlier deep learning approaches [38, 39, 40]. This evolution from basic machine learning to advanced deep learning frameworks illustrates a significant advancement in EEG decoding, promising enhanced accuracy and new capabilities in interpreting complex brain signals.

3. Transfer learning: from machine learning to large-scale brainwave decoding

In this section, we will delve into transfer learning methods that have been extensively utilized in machine learning and adapted for EEG BCI decoding to enhance diagnostic and BCI performance. We will also introduce how transfer learning can be combined with privacy preservation techniques to enable large-scale EEG data exchange across borders. Finally, we will discuss recent advancements in large pre-trained models, such as transformers, in EEG decoding.

3.1 Transfer learning in the machine learning literature and its compatibility with EEG decoding

Transfer learning could be generally divided into two primary categories: ‘Rule Adaptation’ and ‘Domain Adaptation’ [6]. Rule adaptation focuses on identifying the overarching rules that are either common or unique across tasks. This method essentially narrows the search space for new tasks by leveraging the rules learned from previous tasks. Conversely, domain adaptation is designed to align different datasets into a unified space where they exhibit a similar distribution, thus allowing a single classifier to operate across datasets effectively. In practice, domain adaptation adjusts to new tasks by mapping them onto this common distribution.

Instance-based and feature-based transfer learning fit into the definition of domain adaptation. Instance-based transfer learning prioritizes the re-weighting or selection of instances from the source domain that are most relevant to the target task. Examples of this approach include conditional probability-based multi-source domain adaptation (CP-MDA) and dual-phase weighting framework for multi-source domain adaptation (2SW-MDA) [41]. These methods adjust the importance of source domain instances to improve relevance to the target domain. On the other hand, feature-based transfer learning seeks to develop a shared feature space that benefits both the source and target tasks by minimizing the gap between domains. A well-known example in this category is the feature augmentation method (FAM) by Daumé [42], which enhances transfer efficiency through feature augmentation, effectively enlarging the feature sets to provide more discriminative power across domains. In the context of EEG decoding, instance-based transfer learning could involve the selection of EEG samples from the source domain that is similar to the target EEG, or subjects sharing similar brain patterns. Feature-based transfer learning could involve projection in EEG feature space, from the source EEG datasets to the target EEG datasets, or from one EEG task to another.

Rule adaptation also encompasses parameter-based transfer learning, which includes the transfer of parameters or entire models from the source task to enhance the target task’s performance. This method is exemplified by the multi-model knowledge transfer (MMKT) approach by Tommasi et al. [43], which adapts SVM hyperplanes from source models to improve the target model’s accuracy. Additionally, domain adaptation manifold alignment (DAMA) [44] combines manifold alignment with correlation analysis [45] to identify a common latent space that facilitates tasks such as text classification across different languages. In EEG decoding, adapting general knowledge from sources to the target subject or task is crucial. This is because EEG’s low signal-to-noise ratio can lead to overfitting when the dataset is relatively small.

Transfer learning encompasses a variety of methods that improve the flexibility and effectiveness of machine learning models when applied to diverse datasets. This is particularly important in the context of EEG decoding for BCIs, where interpreting complex datasets is crucial for accurate diagnosis and BCI applications. The ability to adapt to new and varied datasets is essential for advancing performance in these fields.

In recent years, significant progress has been made in the field of deep transfer learning, particularly within computer vision (CV) and natural language processing (NLP) domains, showcasing its potential for application in EEG decoding tasks. This section briefly introduces deep transfer learning strategies in CV and NLP, exploring their relevance and potential integration into EEG decoding efforts.

Unlike traditional methods that often necessitate manual feature engineering—either through feature manifolds or correlation matrices—deep transfer learning automates feature extraction, aligning well with the advancements in deep learning. This feature is particularly suitable for EEG data because our human understanding of the brain is still limited, making manually designed features in machine learning unreliable.

Yosinski et al. [10] revealed a fundamental basis of deep transfer learning that neural networks possess the capability to capture and retain dataset-specific features within their parameters, fitting in the concept of rule adaptation. This foundational insight laid the groundwork for ‘fine-tuning’ pre-trained networks on new, yet related, tasks to improve classification accuracy significantly. Such a process was empirically validated using the ImageNet database, which spans images across thousands of categories. The results convincingly demonstrated that networks pre-trained on such comprehensive datasets could achieve superior classification performance compared to those without access to pre-trained weights, marking a pivotal moment in validating convolutional neural networks (CNNs) transferability.

Similar phenomena were found in NLP, Huang et al. [13] introduced the concept of a shared-hidden-layer multilingual deep neural network (SHL-MDNN). This model leverages speech data from various languages to enhance the classification of new languages, transferring knowledge across linguistic boundaries by implementing shared layers for feature learning across languages, coupled with distinct classification layers for each language’s specific requirements, and this model underscored the flexibility and utility of CNNs in managing diverse datasets.

Deep transfer learning has increasingly incorporated distributional alignment metrics, as a domain adaptation approach, to enhance the adaptability and effectiveness of neural networks in tasks involving multiple datasets. A notable development in this area is the deep adaptation network (DAN) [11], which extends a conventional convolutional neural network (CNN) by adding domain adaptation layers. These layers align the features of two distinct datasets into a uniform distribution, integrating a domain loss calculated by maximum-mean discrepancy (MMD) [46] into the overall loss function of the CNN. This domain loss measures the statistical distance between the distributions of the source and target datasets, assisting the adaptation layer in minimizing these discrepancies.

The exploration of deep transfer learning techniques from the machine learning literature offers promising avenues for enhancing the accuracy and efficiency of brainwave analysis by allowing machine learning methods in EEG decoding to utilize large heterogenous brain data.

3.2 Transfer learning in EEG decoding

The progression of transfer learning techniques within the domain of EEG decoding has markedly improved the utilization of diverse EEG datasets [6, 15]. Traditionally, methods like common spatial patterns (CSP) and filter bank common spatial patterns (FBCSP) have been employed to align EEG data into consistent distributions using covariance matrices for domain adaptation [47, 48, 49, 50]. These techniques transform features across various subjects or datasets, applying unified classifiers to enhance classification accuracy.

Some other methods are based on covariate shift, which adjusts models trained on one dataset to be applicable to another, especially beneficial when the datasets already share similar distributions [51, 52, 53]. This strategy is particularly relevant to EEG tasks like sleep stage decoding which does not vary much across subjects in nature. Some domain adaptation strategies also leverage a common prior distribution found across different tasks, which helps in seamlessly integrating and applying models trained under diverse conditions [54, 55]. These approaches highlight the flexibility and broad applicability of EEG decoding methods, facilitating improved performance and generalization across varied EEG datasets.

Traditional transfer learning methods in EEG decoding often rely heavily on manual feature extraction techniques, which are particularly challenging due to the complex nature of EEG signals that reflect intricate mental activities. Deep learning approaches have revolutionized this process by directly utilizing raw data inputs, allowing the neural network to independently learn and refine features through back-propagation. This shift not only streamlines the model’s setup in an end-to-end form but also enhances its capacity to recognize and generalize the complex patterns found in EEG data, thus potentially improving performance.

Recent applications of deep transfer learning in EEG decoding illustrate these advancements. For instance, Sakhavi et al. [9] demonstrated a method where networks are initially pre-trained on datasets from other subjects and subsequently fine-tuned on data from the target subject using a CNN decoder. This ‘pre-training – fine-tuning’ procedure leverages learned knowledge across multiple subjects, thereby enhancing decoding accuracy, as evidenced by superior performance on the BCIC IV2a dataset [56] compared to models trained exclusively on single-subject data. This approach indicates that CNNs are adept at identifying common features among training subjects, which are then specialized to the new subject’s data through fine-tuning. Further extending the capabilities of EEG decoding, Zhang et al. [57] introduced a ‘brain-ID’ framework that employs a hybrid deep neural network with transfer learning (HDNN-TL). This system combines convolutional neural networks (CNN) with long short-term memory (LSTM) units to capture both spatial and temporal dimensions of motor imagery signals effectively. By learning shared features across different motor imagery (MI) tasks, the model exhibits improved classification accuracy. The application of transfer learning here adapts the fully connected layers to new subjects using fewer data samples, which not only reduces the necessary training duration and data volume but also maintains robust accuracy levels. Additionally, Kaishuo et al. [58] developed an adaptive transfer learning framework using a deep convolutional neural network (CNN) to fine-tune a pre-trained model with subject-specific EEG data. Another study [59] used a multi-branch network to handle the diversity across subjects in individual networks while having a shared network to learn common features across subjects. These deep transfer learning methods boost motor imagery classification accuracy by accommodating inter-subject variability, thus allowing for personalized adjustments to the model that optimize performance across different individuals without the need for extensive retraining.

The progression of transfer learning in EEG decoding has made it possible for algorithms to effectively combine data from multiple EEG datasets or tasks, enhancing their learning capabilities. Historically, transfer learning strategies focused on singular datasets under consistent experimental conditions. Typically, the scope of these EEG datasets is limited to a few dozen subjects, constrained by the high costs and logistical complexities involved in EEG data collection, unlike larger biomedical datasets that can encompass thousands of subjects. To tackle this challenge, another important research direction lies in heterogeneous EEG transfer learning across different datasets and tasks. The 2021 NeurIPS conference highlighted an advancement in EEG transfer learning with the inception of the international BEETL EEG competition [60]. This event underscored the significance of cross-dataset transfer learning, gathering widespread academic interest and showcasing the potential of utilizing EEG across varied datasets and tasks through advanced transfer learning techniques. The competition catalyzed numerous innovative solutions tackling the complexities associated with cross-dataset transfer in EEG decoding.

One of the standout approaches introduced at the competition involved the integration of latent subject alignment using set theory principles, particularly within the EEGInception framework [61]. This technique combines a novel statistical alignment method with set theory concepts to normalize the distributions of latent features across different individuals, thereby enhancing task independence among subjects. Another significant contribution is the development of the multi-source EEGNet framework, which merges domain and label adaptation strategies [60, 62]. This framework utilizes a combination of data alignment techniques and a multi-task EEGNet model, enriched with maximum classifier discrepancy, to streamline domain adaptation. This multifaceted approach aims to forge a robust model capable of efficiently navigating the challenges posed by heterogeneous datasets. Further research explores the use of covariance matrix classification through SPDNet [60, 63], applying Riemannian geometry to address the motor imagery task. This method concentrates on classifying spatial covariance matrices of EEG signals using minimum distance to Riemannian mean classifiers, alongside an SPDNet architecture tailored for processing symmetric positive definite matrices. These methodologies collectively demonstrate the dynamic nature of EEG decoding research and the substantial role of transfer learning in leveraging diverse data sources to enhance model performance and versatility.

Additionally, recent studies have investigated the integration of graph neural networks (GNNs) with transfer learning to improve EEG decoding across datasets characterized by heterogeneous electrode configurations [64, 65, 66, 67]. This approach addresses the challenges arising from variations in recording equipment and electrode placements among different datasets. By employing GNNs, this method effectively processes and interprets data from diverse datasets, which may differ in the number of EEG sensors and their configurations. The application of transfer learning with GNN refines the model’s accuracy with new subjects and datasets, thereby enhancing its adaptability to various experimental conditions and boosting its generalization capabilities in subject-independent motor imagery EEG classification.

The innovative transfer learning strategies introduced in this section signify the rapidly evolving field of EEG decoding. They illustrate how transfer learning can substantially improve the adaptability and applicability of models across diverse data, which advances the utilization of large-scale brainwave decoding.

3.3 Utilizing large-scale EEG: overcoming privacy challenges in cross-border transfer learning

With the advancement of EEG decoding utilizing large datasets through transfer learning, a significant challenge that arises is the preservation of privacy in cross-dataset transfer learning contexts [68, 69, 70, 71, 72]. As machine learning algorithms increasingly leverage heterogeneous EEG datasets from various data centres and providers for large-scale applications, ensuring the privacy of EEG data becomes a critical concern. Various studies have proposed potential solutions that combine transfer learning with privacy preservation to enhance EEG decoding capabilities securely.

Popescu et al. [73] developed a privacy-preserving method for classifying EEG data using homomorphic encryption (HE) [74] and machine learning. This method introduces an encoding system that adapts typical HE schemes to handle real-valued numbers efficiently, tackling the high computational demand and noise accumulation issues inherent in HE applications. The approach was tested in real-world scenarios, including brain seizure detection and predisposition to alcoholism prediction, employing supervised learning techniques to maintain data confidentiality while performing complex classifications. Another innovative approach by Ju et al. [75] employs a deep learning architecture that integrates federated learning with domain adaptation to classify EEG signals while preserving privacy. This architecture processes the spatial covariance matrix of EEG signals to extract discriminative information across multiple subjects without actual data transfer. It features a manifold reduction layer to reduce dimensionality, a tangent projection layer to linearize manifold data, and a federated layer that facilitates distributed model training. This federated transfer learning (FTL) framework is tailored for subject-specific and adaptive analyses, significantly enhancing motor imagery task classification accuracy without compromising privacy.

Further advancements were made with the multi-dataset federated separate-common-separate network (MF-SCSN) [59, 76], which utilizes individual feature extractors for each subject to handle personal motor imagery EEG variations like sensor placements and brain function disparities at multiple network depths. These feature extractors, acting as federated parameters, ensure the privacy of individual datasets intrinsically. Bethge et al. [77] proposed a similar architecture for emotion classification, which focuses on domain-invariant representation learning while safeguarding privacy. This approach enhances emotion classification accuracy across various sources by learning domain-invariant features without necessitating direct data exchange, supporting the foundational results of this research. Building on these concepts, a meta-framework known as ‘Sandwich’ [78] was proposed, combining deep learning, transfer learning, and privacy preservation within a unified architecture. This framework employs federated networks at the input level to manage dataset discrepancies, a central shared network that applies universal learning rules and transfer learning techniques such as deep set or distributional distance alignment, and individual classifiers at the output for specific brain tasks. The ‘Sandwich’ framework demonstrates significant improvements in federated deep transfer learning across various tasks and datasets by enabling the central network to benefit from diverse datasets while local branches maintain data privacy.

Other methods in privacy preservation literature may integrate well into EEG transfer learning to protect data privacy while using large cross-centre EEG datasets. For instance, homomorphic encryption [73] enables computations on encrypted data, allowing sensitive information to remain secure while being processed. Secure multi-party computation (SMC) [79] facilitates secure calculations across data distributed among multiple stakeholders or locations, ensuring that collaborations do not compromise data integrity. Additionally, anonymisation and data sanitisation techniques [80] are used to alter data by removing or obscuring personally identifiable information, thus protecting individual identities. Federated learning, as highlighted in Refs. [81, 82], provides a decentralized model training framework that operates across various devices or data repositories. This approach eliminates the need to consolidate sensitive data in a single location, enhancing privacy.

Privacy-preserving techniques are crucial for facilitating the secure exchange of brainwave data among individuals, hospitals, data centres, and countries. By implementing these strategies, EEG transfer learning can be applied more broadly and effectively in real-world settings, maximizing the potential benefits of this technology while minimizing privacy risks.

3.4 EEG decoding with large pre-trained model

With the advent of the generative pre-trained transformer (GPT) and large language models (LLM) [14], there has been a growing body of research exploring the application of transformer-based architectures in EEG decoding. This section will discuss the potential enhancements that large pre-trained models such as transformers can provide to EEG decoding and will highlight specific studies that have applied these mechanisms and algorithms to the interpretation of brainwaves.

Large language models (LLM) such as GPT [37] offer several advantageous features for EEG signal decoding. Besides the pre-training and fine-tuning, which we introduced in previous sections, the large pre-trained models exhibit robust generalizability and transferability across datasets and tasks, making them highly effective for diverse EEG decoding tasks. Another outstanding feature is self-attention [83], a mechanism foundational to the transformer architecture. Unlike traditional models that process inputs sequentially, transformers use self-attention to evaluate the relevance of all elements in a dataset simultaneously, regardless of their positional order. This characteristic is particularly advantageous for EEG data because EEG is time series data where significant correlations exist across different time steps. For example, certain brain responses, such as the P300 wave [25], manifest delays in activation, which conventional convolutional neural networks (CNNs) might struggle to capture effectively due to their localized receptive fields. Although long short-term memory (LSTM) models [84, 85, 86] were employed to address temporal correlations in EEG decoding, they often suffer from the ‘catastrophic forgetting’ issue [87], an inability to retain earlier information upon the introduction of new data over numerous time steps.

The transformer model, as described in Vaswani et al.’s paper [83], differs from traditional models in that it does not input data word by word (or window by window in the case of EEG data). In the case of a long EEG input, the model sequences it by cutting it into EEG windows, similar to how a sentence is formed word by word. For each word or EEG window, the model generates three vectors—queries, keys, and values—through learned linear transformations. A self-attention score is computed by taking a dot product of the query with all keys to obtain weights that signify the relevance of other EEG windows to the query. These weights are then used to create a weighted sum of values, resulting in an output that represents a synthesis of information based on the entire long EEG input context, rather than just local neighboring EEG windows. This mechanism enables the model to dynamically focus on different parts of a long EEG trial, enhancing its ability to handle long-range dependencies. By combining techniques like pre-training-fine-tuning and the mechanism of self-attention, pre-trained models based on transformers have shown great potential in handling serial sequences in brainwave decoding.

There have been an increasing number of studies working on LLM in EEG decoding [88]. Studies [39, 89, 90, 91] evaluated transformer models on motor imagery datasets, for example, the Physionet motor imagery dataset [92], to integrate spatial and temporal processing within transformer models. These studies show that transformer-based models outperformed traditional CNN and RNN models, showcasing superior ability in handling long-range dependencies essential for accurate EEG signal classification on motor imagery tasks. Numerous studies have been conducted on the use of transformers for sleep stage decoding [93, 94]. They leverage a vast amount of labeled or unlabeled EEG data to pre-train the transformer-based models. These studies also used transformer-based models to reveal the ‘black box’ of EEG features by allowing the model to highlight sleep-relevant features within the EEG signals. They assess the influence of neighboring epochs by using the attention score [95] in transformers. Transformers have also been applied in speech brainwave decoding as well. Lee et al. [96] explore using self-attention mechanisms within transformers to decode imagined and overt speech from non-invasive EEG signals. With a similar concept, transformers were used to interpret EEG data for part-of-speech tagging [97], illustrating the model’s potential in extracting meaningful linguistic structures from brain activity. Another study [98] evaluates decoding spoken speeches for invasive brainwaves with ECoG, demonstrating the application of transformer models on brainwaves beyond EEG.

Together, these studies highlight the powerful potential of large transformer models to process sequential and spatial-temporal brainwaves, particularly for intricate tasks like motor imagery, sleep staging, and speech decoding. These tasks often present challenges in generalization across different subjects, datasets, and tasks. Large pre-trained models hold significant promise in advancing the field of brainwave decoding, utilizing deep learning innovations to improve the generalizability, transferability, and interoperability of models. These enhancements could lead to more reliable and versatile applications in fields such as brain-computer interfaces and disease diagnosis.

4. Challenges and conclusion

Summarizing the current challenges in the literature concerning large-scale EEG transfer learning, it becomes evident that multiple significant barriers must be addressed to propel the field forward.

One major hurdle is the diversity of experimental setups, which manifests in variations such as different EEG input configurations, sensor placements, and technical specifications. This diversity complicates the harmonization of data across datasets for effective cross-dataset learning. Although solutions based on GNN [64, 65, 66, 67] provide some possible avenues for aligning input-level differences, a fundamental principle has not yet been defined. Developing a solution that can reconcile these differences is crucial for the progression of EEG transfer learning. Furthermore, the ability to learn shared knowledge across datasets while also pinpointing specific information relevant to particular tasks or datasets demands the creation of sophisticated alignment methods. These methods are vital for fully exploiting the capabilities of heterogeneous EEG datasets within a cohesive model.

Creating a comprehensive approach to EEG transfer learning by integrating various tasks into a single model can greatly enhance the flexibility and usefulness of EEG-based research. Although there is some prior work on cross-task transfer learning approaches within the same category of brain activities [60], there has not been a solution proposed for using cross-category tasks, such as motor imagery, sleep, and speech.

It is essential to ensure the privacy of EEG data, especially when it is stored across multiple data centres. This is required to ensure secure data exchange and collaboration. Robust privacy-preserving mechanisms must be integrated into transfer learning models to safeguard sensitive information while still promoting the use of large brainwave data. Studies on EEG privacy-preserving techniques presented in the chapter have proposed conceptual algorithms. However, their feasibility in real-world scenarios is yet to be explored.

Additionally, utilizing heterogeneous EEG data to adapt to the scale of large models, such as pre-trained transformers like GPT, remains a significant challenge. Currently, due to the limited availability of EEG data, deep learning applications in EEG decoding often use relatively small transformer networks compared to those in CV and NLP. Although transfer learning enhances the capability to utilize large-scale datasets, the integration of EEG transfer learning with large-scale models effectively is still a relatively emerging field with opportunities. This exploration could potentially unlock new capabilities and improve the accuracy and applicability of EEG decoding methodologies.

In conclusion, transfer learning has significantly advanced EEG decoding, markedly enhancing the potential of large-scale data applications to transform BCI, disease diagnosis, and neuroscience. These advancements could advance the development of BCI to refine our interaction and bolster disease diagnostics to save lives through more precise detection and treatment. Addressing the challenges associated with EEG transfer learning holds promising potential to aid patients and alter human interaction with the surrounding world, ultimately enriching the quality of life in the foreseeable future.

References

1. Vaid S, Singh P, Kaur C. EEG signal analysis for BCI interface: A review. In: 2015 Fifth International Conference on Advanced Computing & Communication Technologies. IEEE; 2015. pp. 143-147
2. Rashid M, Sulaiman N, Abdul APP, Majeed RM, Musa AF, Nasir A, et al. Current status, challenges, and possible solutions of EEG-based brain-computer interface: A comprehensive review. Frontiers in Neurorobotics. 2020;14:515104
3. Khan P, Kader F, Islam SMR, Rahman AB, Kamal S, Toha MU, et al. Machine learning and deep learning approaches for brain disease diagnosis: Principles and recent advances. IEEE Access. 2021;9:37622-37655
4. Müller-Putz GR. Electroencephalography. Handbook of Clinical Neurology. 2020;168:249-262
5. Keene DL, Whiting S, Ventureyra ECG. Electrocorticography. Epileptic Disorders. 2000;2(1):57-63
6. Jayaram V, Alamgir M, Altun Y, Scholkopf B, Grosse-Wentrup M. Transfer learning in brain-computer interfaces. IEEE Computational Intelligence Magazine. 2016;11(1):20-31
7. Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. Journal of Big Data. 2016;3(1):1-40
8. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International Conference on Artificial Neural Networks. Springer; 2018. pp. 270-279
9. Sakhavi S, Guan C. Convolutional neural network-based transfer learning and knowledge distillation using multi-subject data in motor imagery BCI. In: 2017 8th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE; 2017. pp. 588-591
10. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Advances in Neural Information Processing Systems. 2014;27:1-9
11. Long M, Cao Y, Wang J, Jordan M. Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning. PMLR; 2015. pp. 97-105
12. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474. 2014
13. Huang J-T, Li J, Dong Y, Deng L, Gong Y. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. pp. 7304-7308
14. Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. arXiv preprint arXiv:2303.18223. 2023
15. Lotte F, Bougrain L, Cichocki A, Clerc M, Congedo M, Rakotomamonjy A, et al. A review of classification algorithms for EEG-based brain-computer interfaces: A 10 year update. Journal of Neural Engineering. 2018;15(3):031005
16. Proudfoot M, Woolrich MW, Nobre AC, Turner MR. Magnetoencephalography. Practical Neurology. 2014;14(5):336-343
17. Bailey DL, Maisey MN, Townsend DW, Valk PE. Positron Emission Tomography. Vol. 2. Springer; 2005
18. Jaszczak RJ, Coleman RE, Lim CB. SPECT: Single photon emission computed tomography. IEEE Transactions on Nuclear Science. 1980;27(3):1137-1153
19. Glover GH. Overview of functional magnetic resonance imaging. Neurosurgery Clinics. 2011;22(2):133-139
20. Bunce SC, Izzetoglu M, Izzetoglu K, Onaral B, Pourrezaei K. Functional near-infrared spectroscopy. IEEE Engineering in Medicine and Biology Magazine. 2006;25(4):54-62
21. Bamdad M, Zarshenas H, Auais MA. Application of BCI systems in neurorehabilitation: A scoping review. Disability and Rehabilitation: Assistive Technology. 2015;10(5):355-364
22. Robinson PA, Chen P-c, Yang L. Physiologically based calculation of steady-state evoked potentials and cortical wave velocities. Biological Cybernetics. 2008;98(1):1-10
23. Norcia AM, Gregory Appelbaum L, Ales JM, Cottereau BR, Rossion B. The steady-state visual evoked potential in vision research: A review. Journal of Vision. 2015;15(6):4-4
24. Birbaumer N, Elbert T, Canavan AG, Rockstroh B. Slow potentials of the cerebral cortex and behavior. Physiological Reviews. 1990;70(1):1-41
25. Polich J. Neuropsychology of p300. The Oxford Handbook of Event-Related Potential Components. 2012;641:159-188
26. Pfurtscheller G, Da Silva FHL. Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clinical Neurophysiology. 1999;110(11):1842-1857
27. Pfurtscheller G, Stancak A, Neuper C. Event-related synchronization (ERS) in the alpha band—An electrophysiological correlate of cortical idling: A review. International Journal of Psychophysiology. 1996;24(1-2):39-46
28. Lotze M, Halsband U. Motor imagery. Journal of Physiology-Paris. 2006;99(4-6):386-395
29. Pfurtscheller G, Neuper C. Motor imagery and direct brain-computer communication. Proceedings of the IEEE. 2001;89(7):1123-1134
30. Blankertz B, Dornhege G, Krauledat M, Müller K-R, Curio G. The non-invasive berlin brain–computer interface: Fast acquisition of effective performance in untrained subjects. NeuroImage. 2007;37(2):539-550
31. Ang KK, Chin ZY, Zhang H, Guan C. Filter bank common spatial pattern (FBCSP) in brain-computer interface. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE; 2008. pp. 2390-2397
32. Battiti R. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks. 1994;5(4):537-550
33. Pawlak Z, Grzymala-Busse J, Slowinski R, Ziarko W. Rough sets. Communications of the ACM. 1995;38(11):88-95
34. Suthaharan S, Suthaharan S. Support vector machine. In: Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. 2016. pp. 207-235
35. Schirrmeister R, Gemein L, Eggensperger K, Hutter F, Ball T. Deep learning with convolutional neural networks for decoding and visualization of EEG pathology. In: 2017 IEEE Signal Processing in Medicine and Biology Symposium (SPMB). IEEE; 2017. pp. 1-7
36. Chin ZY, Ang KK, Wang C, Guan C, Zhang H. Multi-class filter bank common spatial pattern for four-class motor imagery BCI. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2009. pp. 571-574
37. Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774. 2023
38. Chen J, Zhang Y, Pan Y, Peng X, Guan C. A transformer-based deep neural network model for SSVEP classification. Neural Networks. 2023;164:521-534
39. Sun J, Xie J, Zhou H. EEG classification with transformer-based models. In: 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (Lifetech). IEEE; 2021. pp. 92-93
40. Cui W, Jeong W, Thölke P, Medani T, Jerbi K, Joshi AA et al. Neuro-GPT: Developing a foundation model for EEG. arXiv preprint arXiv:2311.03764. 2023
41. Chattopadhyay R, Sun Q , Fan W, Davidson I, Panchanathan S, Ye J. Multisource domain adaptation and its application to early detection of fatigue. ACM Transactions on Knowledge Discovery from Data (TKDD). 2012;6(4):1-26
42. Daumé H III. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815. 2009
43. Tommasi T, Orabona F, Caputo B. Safety in numbers: Learning categories from few examples with multi model knowledge transfer. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE; 2010. pp. 3081-3088
44. Ham JH, Lee DD, Saul LK. Learning High Dimensional Correspondences from Low Dimensional Manifolds. In: 20th International Conference on Machine Learning (ICML 2003) Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining; 2003
45. Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge University Press; 2004
46. Smola AJ, Gretton A, Borgwardt K. Maximum mean discrepancy. In: 13th International Conference, ICONIP. 2006. pp. 3-6
47. Fazli S, Popescu F, Danóczy M, Blankertz B, Müller K-R, Grozea C. Subject-independent mental state classification in single trials. Neural Networks. 2009;22(9):1305-1312
48. Kang H, Choi S. Bayesian common spatial patterns for multi-subject EEG classification. Neural Networks. 2014;57:39-50
49. Lotte F, Guan C. Regularizing common spatial patterns to improve BCI designs: Unified theory and new algorithms. IEEE Transactions on Biomedical Engineering. 2011;58(2):355-362
50. Devlaminck D, Wyns B, Grosse-Wentrup M, Otte G, Santens P. Multisubject learning for common spatial patterns in motor-imagery BCI. Computational Intelligence and Neuroscience. 2011;2011:8
51. Sugiyama M, Krauledat M, Müller K-R. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research. 2007;8(May):985-1005
52. Li Y, Kambara H, Koike Y, Sugiyama M. Application of covariate shift adaptation techniques in brain–computer interfaces. IEEE Transactions on Biomedical Engineering. 2010;57(6):1318-1324
53. Mohammadi R, Mahloojifar A, Coyle D. Unsupervised short-term covariate shift minimization for self-paced BCI. In: 2013 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB). IEEE; 2013. pp. 101-106
54. Kindermans P-J, Verschore H, Verstraeten D, Schrauwen B. A p300 BCI for the masses: Prior information enables instant unsupervised spelling. In: Advances in Neural Information Processing Systems. 2012. pp. 710-718
55. Alamgir M, Grosse-Wentrup M, Altun Y. Multitask learning for brain-computer interfaces. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010. pp. 17-24
56. Tangermann M, Klaus-Robert Müller A, Aertsen NB, Braun C, Brunner C, Leeb R, et al. Review of the BCI competition iv. Frontiers in Neuroscience. 2012;6:55
57. Zhang R, Zong Q , Dou L, Zhao X, Tang Y, Li Z. Hybrid deep neural network using transfer learning for EEG motor imagery decoding. Biomedical Signal Processing and Control. 2021;63:102144
58. Zhang K, Robinson N, Lee S-W, Guan C. Adaptive transfer learning for EEG motor imagery classification with deep convolutional neural network. Neural Networks. 2021;136:1-10
59. Wei X, Ortega P, Aldo A, Faisal. Inter-subject deep transfer learning for motor imagery EEG decoding. In: 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE; 2021. pp. 21-24
60. Wei X et al. 2021 BEETL competition: Advancing transfer learning for subject independence & heterogenous EEG data sets. In: Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, Volume 176 of Proceedings of Machine Learning Research; PMLR; 06-14 Dec 2022:205-219
61. Bakas S, Ludwig S, Barmpas K, Bahri M, Panagakis Y, Laskaris N, et al. Team COGITAT at NEURIPS 2021: Benchmarks for EEG transfer learning competition. arXiv preprint arXiv:2202.03267. 2022
62. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. EEGNET: A compact convolutional neural network for EEG-based brain–computer interfaces. Journal of Neural Engineering. 2018;15(5):056013
63. Huang Z, Van Gool L. A Riemannian network for SPD matrix learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17. AAAI Press; 2017. pp. 2036-2042
64. Han J, Wei X, Faisal AA. EEG decoding for datasets with heterogenous electrode configurations using transfer learning graph neural networks. arXiv preprint arXiv:2306.13109. 2023
65. Li J, Li S, Pan J, Wang F. Cross-subject EEG emotion recognition with self-organized graph neural network. Frontiers in Neuroscience. 2021;15:611653
66. Zhong P, Wang D, Miao C. EEG-based emotion recognition using regularized graph neural networks. IEEE Transactions on Affective Computing. 2020;13(3):1290-1301
67. Demir A, Koike-Akino T, Wang Y, Haruna M, Erdogmus D. EEG-GNN: Graph neural networks for classification of electroencephalogram (EEG) signals. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE; 2021. pp. 1061-1067
68. Li L et al. A review of applications in federated learning. Computers & Industrial Engineering. 2020;149:1-58
69. Hao M, Li H, Guowen X, Liu S, Yang H. Towards efficient and privacy-preserving federated deep learning. In: 2019 IEEE International Conference on Communications (ICC). IEEE; 2019. pp. 1-6
70. Lyu L, He X, Law YW, Palaniswami M. Privacy-preserving collaborative deep learning with application to human activity recognition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017. pp. 1219-1228
71. Dong H, Chao W, Wei Z, Guo Y. Dropping activation outputs with localized first-layer deep network for enhancing user privacy and data security. IEEE Transactions on Information Forensics and Security. 2017;13(3):662-670
72. Wenliang D, Atallah MJ. Secure multi-party computation problems and their applications: A review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms. 2001. pp. 13-22
73. Popescu AB et al. Privacy preserving classification of EEG data using machine learning and homomorphic encryption. Applied Sciences. 2021;11(16):7360
74. Fang H, Qian Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet. 2021;13(4):94
75. Ce J, Gao D, Mane R, Tan B, Liu Y, Guan C. Federated transfer learning for EEG signal classification. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE; 2020. pp. 3040-3045
76. Wei X, Faisal AA. Federated deep transfer learning for EEG decoding using multiple BCI tasks. In: 2023 11th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE; 2023. pp. 1-4
77. Bethge D, Hallgarten P, Grosse-Puppendahl T, Kari M, Mikut R, Schmidt A, et al. Domain-invariant representation learning from EEG with private encoders. In: ICASSP 2022. IEEE; 2022. pp. 1236-1240
78. Wei X, Narayan J, Faisal AA. The ‘sandwich’ meta-framework for architecture agnostic deep privacy-preserving transfer learning for non-invasive brainwave decoding. arXiv preprint arXiv:2404.06868. 2024
79. Agarwal A, Dowsley R, McKinney ND, Dongrui W, Lin C-T, De Cock M, et al. Protecting privacy of users in brain-computer interface applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2019;27(8):1546-1555
80. Xia K, Wlodzislaw Duch Y, Sun KX, Fang W, Luo H, Zhang Y, et al. Privacy-preserving brain–computer interfaces: A systematic review. IEEE Transactions on Computational Social Systems. 2022
81. Gao D, Ce J, Wei X, Yang L, Chen T, Yang Q. HHHFL: Hierarchical heterogeneous horizontal federated learning for electroencephalography. arXiv preprint arXiv:1909.05784. 2019
82. Szegedi G, Kiss P, Horváth T. Evolutionary federated learning on EEG-data. In: ITAT. 2019. pp. 71-78
83. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017;30:1-11
84. Wang P, Jiang A, Liu X, Shang J, Zhang L. LSTM-based EEG classification in motor imagery tasks. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2018;26(11):2086-2095
85. Perattur Nagabushanam S, George T, Radha S. EEG signal classification using LSTM and improved neural network algorithms. Soft Computing. 2020;24(13):9981-10003
86. Xinmei H, Yuan S, Fangzhou X, Leng Y, Yuan K, Yuan Q. Scalp EEG classification using deep BI-LSTM network for seizure detection. Computers in Biology and Medicine. 2020;124:103919
87. Schak M, Gepperth A. A study on catastrophic forgetting in deep LSTM networks. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Deep Learning: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings, Part II 28. Springer; 2019. pp. 714-728
88. Abibullaev B, Keutayeva A, Zollanvari A. Deep learning in EEG-based BCIs: A comprehensive review of transformer models, advantages, challenges, and applications. IEEE Access. 2023
89. Song Y, Jia X, Yang L, Xie L. Transformer-based spatial-temporal feature learning for EEG decoding. arXiv preprint arXiv:2106.11170. 2021
90. Xie J, Zhang J, Sun J, Ma Z, Qin L, Li G, et al. A transformer-based approach combining deep learning network and spatial-temporal information for raw EEG classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2022;30:2126-2136
91. Song Y, Zheng Q , Liu B, Gao X. EEG conformer: Convolutional transformer for EEG decoding and visualization. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2022;31:710-719
92. Schalk G et al. BCI2000: A general-purpose brain-computer interface (BCI) system. IEEE Transactions on Biomedical Engineering. 2004;51(6):1034-1043
93. Kostas D, Aroca-Ouellette S, Rudzicz F. BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. Frontiers in Human Neuroscience. 2021;15:653659
94. Phan H, Mikkelsen K, Chén OY, Koch P, Mertins A, De Vos M. Sleep transformer: Automatic sleep staging with interpretability and uncertainty quantification. IEEE Transactions on Biomedical Engineering. 2022;69(8):2456-2467
95. Chefer H, Gur S, Wolf L. Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. pp. 782-791
96. Lee Y-E, Lee S-H. EEG-transformer: Self-attention from transformer architecture for decoding EEG of imagined speech. In: 2022 10th International Winter Conference on Brain-Computer Interface (BCI). IEEE; 2022. pp. 1-4
97. Murphy A, Bohnet B, McDonald R, Noppeney U. Decoding part-of-speech from human EEG signals. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1: Long Papers. 2022. pp. 2201-2210
98. Komeiji S, Shigemi K, Mitsuhashi T, Iimura Y, Suzuki H, Sugano H, et al. Transformer-based estimation of spoken sentences using electrocorticography. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2022. pp. 1311-1315

[1] 1. Vaid S, Singh P, Kaur C. EEG signal analysis for BCI interface: A review. In: 2015 Fifth International Conference on Advanced Computing & Communication Technologies. IEEE; 2015. pp. 143-147

[2] 2. Rashid M, Sulaiman N, Abdul APP, Majeed RM, Musa AF, Nasir A, et al. Current status, challenges, and possible solutions of EEG-based brain-computer interface: A comprehensive review. Frontiers in Neurorobotics. 2020;14:515104

[3] 3. Khan P, Kader F, Islam SMR, Rahman AB, Kamal S, Toha MU, et al. Machine learning and deep learning approaches for brain disease diagnosis: Principles and recent advances. IEEE Access. 2021;9:37622-37655

[4] 4. Müller-Putz GR. Electroencephalography. Handbook of Clinical Neurology. 2020;168:249-262

[5] 5. Keene DL, Whiting S, Ventureyra ECG. Electrocorticography. Epileptic Disorders. 2000;2(1):57-63

[6] 6. Jayaram V, Alamgir M, Altun Y, Scholkopf B, Grosse-Wentrup M. Transfer learning in brain-computer interfaces. IEEE Computational Intelligence Magazine. 2016;11(1):20-31

[7] 7. Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. Journal of Big Data. 2016;3(1):1-40

[8] 8. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International Conference on Artificial Neural Networks. Springer; 2018. pp. 270-279

[9] 9. Sakhavi S, Guan C. Convolutional neural network-based transfer learning and knowledge distillation using multi-subject data in motor imagery BCI. In: 2017 8th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE; 2017. pp. 588-591

[10] 10. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Advances in Neural Information Processing Systems. 2014;27:1-9

[11] 11. Long M, Cao Y, Wang J, Jordan M. Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning. PMLR; 2015. pp. 97-105

[12] 12. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474. 2014

[13] 13. Huang J-T, Li J, Dong Y, Deng L, Gong Y. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. pp. 7304-7308

[14] 14. Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. arXiv preprint arXiv:2303.18223. 2023

[15] 15. Lotte F, Bougrain L, Cichocki A, Clerc M, Congedo M, Rakotomamonjy A, et al. A review of classification algorithms for EEG-based brain-computer interfaces: A 10 year update. Journal of Neural Engineering. 2018;15(3):031005

[16] 16. Proudfoot M, Woolrich MW, Nobre AC, Turner MR. Magnetoencephalography. Practical Neurology. 2014;14(5):336-343

[17] 17. Bailey DL, Maisey MN, Townsend DW, Valk PE. Positron Emission Tomography. Vol. 2. Springer; 2005

[18] 18. Jaszczak RJ, Coleman RE, Lim CB. SPECT: Single photon emission computed tomography. IEEE Transactions on Nuclear Science. 1980;27(3):1137-1153

[19] 19. Glover GH. Overview of functional magnetic resonance imaging. Neurosurgery Clinics. 2011;22(2):133-139

[20] 20. Bunce SC, Izzetoglu M, Izzetoglu K, Onaral B, Pourrezaei K. Functional near-infrared spectroscopy. IEEE Engineering in Medicine and Biology Magazine. 2006;25(4):54-62

[21] 21. Bamdad M, Zarshenas H, Auais MA. Application of BCI systems in neurorehabilitation: A scoping review. Disability and Rehabilitation: Assistive Technology. 2015;10(5):355-364

[22] 22. Robinson PA, Chen P-c, Yang L. Physiologically based calculation of steady-state evoked potentials and cortical wave velocities. Biological Cybernetics. 2008;98(1):1-10

[23] 23. Norcia AM, Gregory Appelbaum L, Ales JM, Cottereau BR, Rossion B. The steady-state visual evoked potential in vision research: A review. Journal of Vision. 2015;15(6):4-4

[24] 24. Birbaumer N, Elbert T, Canavan AG, Rockstroh B. Slow potentials of the cerebral cortex and behavior. Physiological Reviews. 1990;70(1):1-41

[25] 25. Polich J. Neuropsychology of p300. The Oxford Handbook of Event-Related Potential Components. 2012;641:159-188

[26] 26. Pfurtscheller G, Da Silva FHL. Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clinical Neurophysiology. 1999;110(11):1842-1857

[27] 27. Pfurtscheller G, Stancak A, Neuper C. Event-related synchronization (ERS) in the alpha band—An electrophysiological correlate of cortical idling: A review. International Journal of Psychophysiology. 1996;24(1-2):39-46

[28] 28. Lotze M, Halsband U. Motor imagery. Journal of Physiology-Paris. 2006;99(4-6):386-395

[29] 29. Pfurtscheller G, Neuper C. Motor imagery and direct brain-computer communication. Proceedings of the IEEE. 2001;89(7):1123-1134

[30] 30. Blankertz B, Dornhege G, Krauledat M, Müller K-R, Curio G. The non-invasive berlin brain–computer interface: Fast acquisition of effective performance in untrained subjects. NeuroImage. 2007;37(2):539-550

[31] 31. Ang KK, Chin ZY, Zhang H, Guan C. Filter bank common spatial pattern (FBCSP) in brain-computer interface. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE; 2008. pp. 2390-2397

[32] 32. Battiti R. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks. 1994;5(4):537-550

[33] 33. Pawlak Z, Grzymala-Busse J, Slowinski R, Ziarko W. Rough sets. Communications of the ACM. 1995;38(11):88-95

[34] 34. Suthaharan S, Suthaharan S. Support vector machine. In: Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. 2016. pp. 207-235

[35] 35. Schirrmeister R, Gemein L, Eggensperger K, Hutter F, Ball T. Deep learning with convolutional neural networks for decoding and visualization of EEG pathology. In: 2017 IEEE Signal Processing in Medicine and Biology Symposium (SPMB). IEEE; 2017. pp. 1-7

[36] 36. Chin ZY, Ang KK, Wang C, Guan C, Zhang H. Multi-class filter bank common spatial pattern for four-class motor imagery BCI. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2009. pp. 571-574

[37] 37. Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774. 2023

[38] 38. Chen J, Zhang Y, Pan Y, Peng X, Guan C. A transformer-based deep neural network model for SSVEP classification. Neural Networks. 2023;164:521-534

[39] 39. Sun J, Xie J, Zhou H. EEG classification with transformer-based models. In: 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (Lifetech). IEEE; 2021. pp. 92-93

[40] 40. Cui W, Jeong W, Thölke P, Medani T, Jerbi K, Joshi AA et al. Neuro-GPT: Developing a foundation model for EEG. arXiv preprint arXiv:2311.03764. 2023

[41] 41. Chattopadhyay R, Sun Q , Fan W, Davidson I, Panchanathan S, Ye J. Multisource domain adaptation and its application to early detection of fatigue. ACM Transactions on Knowledge Discovery from Data (TKDD). 2012;6(4):1-26

[42] 42. Daumé H III. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815. 2009

[43] 43. Tommasi T, Orabona F, Caputo B. Safety in numbers: Learning categories from few examples with multi model knowledge transfer. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE; 2010. pp. 3081-3088

[44] 44. Ham JH, Lee DD, Saul LK. Learning High Dimensional Correspondences from Low Dimensional Manifolds. In: 20th International Conference on Machine Learning (ICML 2003) Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining; 2003

[45] 45. Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge University Press; 2004

[46] 46. Smola AJ, Gretton A, Borgwardt K. Maximum mean discrepancy. In: 13th International Conference, ICONIP. 2006. pp. 3-6

[47] 47. Fazli S, Popescu F, Danóczy M, Blankertz B, Müller K-R, Grozea C. Subject-independent mental state classification in single trials. Neural Networks. 2009;22(9):1305-1312

[48] 48. Kang H, Choi S. Bayesian common spatial patterns for multi-subject EEG classification. Neural Networks. 2014;57:39-50

[49] 49. Lotte F, Guan C. Regularizing common spatial patterns to improve BCI designs: Unified theory and new algorithms. IEEE Transactions on Biomedical Engineering. 2011;58(2):355-362

[50] 50. Devlaminck D, Wyns B, Grosse-Wentrup M, Otte G, Santens P. Multisubject learning for common spatial patterns in motor-imagery BCI. Computational Intelligence and Neuroscience. 2011;2011:8

[51] 51. Sugiyama M, Krauledat M, Müller K-R. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research. 2007;8(May):985-1005

[52] 52. Li Y, Kambara H, Koike Y, Sugiyama M. Application of covariate shift adaptation techniques in brain–computer interfaces. IEEE Transactions on Biomedical Engineering. 2010;57(6):1318-1324

[53] 53. Mohammadi R, Mahloojifar A, Coyle D. Unsupervised short-term covariate shift minimization for self-paced BCI. In: 2013 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB). IEEE; 2013. pp. 101-106

[54] 54. Kindermans P-J, Verschore H, Verstraeten D, Schrauwen B. A p300 BCI for the masses: Prior information enables instant unsupervised spelling. In: Advances in Neural Information Processing Systems. 2012. pp. 710-718

[55] 55. Alamgir M, Grosse-Wentrup M, Altun Y. Multitask learning for brain-computer interfaces. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010. pp. 17-24

[56] 56. Tangermann M, Klaus-Robert Müller A, Aertsen NB, Braun C, Brunner C, Leeb R, et al. Review of the BCI competition iv. Frontiers in Neuroscience. 2012;6:55

[57] 57. Zhang R, Zong Q , Dou L, Zhao X, Tang Y, Li Z. Hybrid deep neural network using transfer learning for EEG motor imagery decoding. Biomedical Signal Processing and Control. 2021;63:102144

[58] 58. Zhang K, Robinson N, Lee S-W, Guan C. Adaptive transfer learning for EEG motor imagery classification with deep convolutional neural network. Neural Networks. 2021;136:1-10

[59] 59. Wei X, Ortega P, Aldo A, Faisal. Inter-subject deep transfer learning for motor imagery EEG decoding. In: 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE; 2021. pp. 21-24

[60] 60. Wei X et al. 2021 BEETL competition: Advancing transfer learning for subject independence & heterogenous EEG data sets. In: Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, Volume 176 of Proceedings of Machine Learning Research; PMLR; 06-14 Dec 2022:205-219

[61] 61. Bakas S, Ludwig S, Barmpas K, Bahri M, Panagakis Y, Laskaris N, et al. Team COGITAT at NEURIPS 2021: Benchmarks for EEG transfer learning competition. arXiv preprint arXiv:2202.03267. 2022

[62] 62. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. EEGNET: A compact convolutional neural network for EEG-based brain–computer interfaces. Journal of Neural Engineering. 2018;15(5):056013

[63] 63. Huang Z, Van Gool L. A Riemannian network for SPD matrix learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17. AAAI Press; 2017. pp. 2036-2042

[64] 64. Han J, Wei X, Faisal AA. EEG decoding for datasets with heterogenous electrode configurations using transfer learning graph neural networks. arXiv preprint arXiv:2306.13109. 2023

[65] 65. Li J, Li S, Pan J, Wang F. Cross-subject EEG emotion recognition with self-organized graph neural network. Frontiers in Neuroscience. 2021;15:611653

[66] 66. Zhong P, Wang D, Miao C. EEG-based emotion recognition using regularized graph neural networks. IEEE Transactions on Affective Computing. 2020;13(3):1290-1301

[67] 67. Demir A, Koike-Akino T, Wang Y, Haruna M, Erdogmus D. EEG-GNN: Graph neural networks for classification of electroencephalogram (EEG) signals. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE; 2021. pp. 1061-1067

[68] 68. Li L et al. A review of applications in federated learning. Computers & Industrial Engineering. 2020;149:1-58

[69] 69. Hao M, Li H, Guowen X, Liu S, Yang H. Towards efficient and privacy-preserving federated deep learning. In: 2019 IEEE International Conference on Communications (ICC). IEEE; 2019. pp. 1-6

[70] 70. Lyu L, He X, Law YW, Palaniswami M. Privacy-preserving collaborative deep learning with application to human activity recognition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017. pp. 1219-1228

[71] 71. Dong H, Chao W, Wei Z, Guo Y. Dropping activation outputs with localized first-layer deep network for enhancing user privacy and data security. IEEE Transactions on Information Forensics and Security. 2017;13(3):662-670

[72] 72. Wenliang D, Atallah MJ. Secure multi-party computation problems and their applications: A review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms. 2001. pp. 13-22

[73] 73. Popescu AB et al. Privacy preserving classification of EEG data using machine learning and homomorphic encryption. Applied Sciences. 2021;11(16):7360

[74] 74. Fang H, Qian Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet. 2021;13(4):94

[75] 75. Ce J, Gao D, Mane R, Tan B, Liu Y, Guan C. Federated transfer learning for EEG signal classification. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE; 2020. pp. 3040-3045

[76] 76. Wei X, Faisal AA. Federated deep transfer learning for EEG decoding using multiple BCI tasks. In: 2023 11th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE; 2023. pp. 1-4

[77] 77. Bethge D, Hallgarten P, Grosse-Puppendahl T, Kari M, Mikut R, Schmidt A, et al. Domain-invariant representation learning from EEG with private encoders. In: ICASSP 2022. IEEE; 2022. pp. 1236-1240

[78] 78. Wei X, Narayan J, Faisal AA. The ‘sandwich’ meta-framework for architecture agnostic deep privacy-preserving transfer learning for non-invasive brainwave decoding. arXiv preprint arXiv:2404.06868. 2024

[79] 79. Agarwal A, Dowsley R, McKinney ND, Dongrui W, Lin C-T, De Cock M, et al. Protecting privacy of users in brain-computer interface applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2019;27(8):1546-1555

[80] 80. Xia K, Wlodzislaw Duch Y, Sun KX, Fang W, Luo H, Zhang Y, et al. Privacy-preserving brain–computer interfaces: A systematic review. IEEE Transactions on Computational Social Systems. 2022

[81] 81. Gao D, Ce J, Wei X, Yang L, Chen T, Yang Q. HHHFL: Hierarchical heterogeneous horizontal federated learning for electroencephalography. arXiv preprint arXiv:1909.05784. 2019

[82] 82. Szegedi G, Kiss P, Horváth T. Evolutionary federated learning on EEG-data. In: ITAT. 2019. pp. 71-78

[83] 83. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017;30:1-11

[84] 84. Wang P, Jiang A, Liu X, Shang J, Zhang L. LSTM-based EEG classification in motor imagery tasks. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2018;26(11):2086-2095

[85] 85. Perattur Nagabushanam S, George T, Radha S. EEG signal classification using LSTM and improved neural network algorithms. Soft Computing. 2020;24(13):9981-10003

[86] 86. Xinmei H, Yuan S, Fangzhou X, Leng Y, Yuan K, Yuan Q. Scalp EEG classification using deep BI-LSTM network for seizure detection. Computers in Biology and Medicine. 2020;124:103919

[87] 87. Schak M, Gepperth A. A study on catastrophic forgetting in deep LSTM networks. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Deep Learning: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings, Part II 28. Springer; 2019. pp. 714-728

[88] 88. Abibullaev B, Keutayeva A, Zollanvari A. Deep learning in EEG-based BCIs: A comprehensive review of transformer models, advantages, challenges, and applications. IEEE Access. 2023

[89] 89. Song Y, Jia X, Yang L, Xie L. Transformer-based spatial-temporal feature learning for EEG decoding. arXiv preprint arXiv:2106.11170. 2021

[90] 90. Xie J, Zhang J, Sun J, Ma Z, Qin L, Li G, et al. A transformer-based approach combining deep learning network and spatial-temporal information for raw EEG classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2022;30:2126-2136

[91] 91. Song Y, Zheng Q , Liu B, Gao X. EEG conformer: Convolutional transformer for EEG decoding and visualization. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2022;31:710-719

[92] 92. Schalk G et al. BCI2000: A general-purpose brain-computer interface (BCI) system. IEEE Transactions on Biomedical Engineering. 2004;51(6):1034-1043

[93] 93. Kostas D, Aroca-Ouellette S, Rudzicz F. BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. Frontiers in Human Neuroscience. 2021;15:653659

[94] 94. Phan H, Mikkelsen K, Chén OY, Koch P, Mertins A, De Vos M. Sleep transformer: Automatic sleep staging with interpretability and uncertainty quantification. IEEE Transactions on Biomedical Engineering. 2022;69(8):2456-2467

[95] 95. Chefer H, Gur S, Wolf L. Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. pp. 782-791

[96] 96. Lee Y-E, Lee S-H. EEG-transformer: Self-attention from transformer architecture for decoding EEG of imagined speech. In: 2022 10th International Winter Conference on Brain-Computer Interface (BCI). IEEE; 2022. pp. 1-4

[97] 97. Murphy A, Bohnet B, McDonald R, Noppeney U. Decoding part-of-speech from human EEG signals. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1: Long Papers. 2022. pp. 2201-2210

[98] 98. Komeiji S, Shigemi K, Mitsuhashi T, Iimura Y, Suzuki H, Sugano H, et al. Transformer-based estimation of spoken sentences using electrocorticography. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2022. pp. 1311-1315