Transfer Learning and Domain Adaptation in Telecommunications

Konstantinos Vandikas; Farnaz Moradi; Hannes Larsson; Andreas Johnsson

doi:10.5772/intechopen.114932

Abstract

Transfer learning (TL) and domain adaptation (DA) are well-known approaches in the AI literature that can be used to address the fundamental challenge of data scarcity. TL and DA propose ways to reuse previously trained models to compensate for missing data or the absence of labeled information, which often comes at the cost of manual work. The potential of TL and DA has only recently started to demonstrate its worth in the telecommunication area as mobile networks are undergoing significant changes to become AI-native. As such, an AI-native network is one that allows for swapping a classical implementation of a network function with one that is data-driven and powered by an AI algorithm. In this book chapter, we present our experiences and findings while working with TL and DA to build AI-native networks. More specifically, we share our findings when applying TL and DA in network management use cases for service performance, end-to-end latency, and block call rate prediction. Via these use cases, and to overcome limitations with TL and DA, we introduce enhancements such as source selection, unsupervised domain adaptation, and transfer learning in distributed intelligence. These enhancements can benefit the general AI community and other industries.

Keywords

domain adaptation
distributed intelligence
AI-native networks
source selection
telecom networks

Author Information

Show +

Konstantinos Vandikas*
- Ericsson Research, Stockholm, Sweden
Farnaz Moradi
- Ericsson Research, Stockholm, Sweden
Hannes Larsson
- Ericsson Research, Stockholm, Sweden
Andreas Johnsson
- Ericsson Research, Stockholm, Sweden

*Address all correspondence to: konstantinos.vandikas@ericsson.com

1. Introduction

1.1 Telecommunications and 6G networks

Telecommunication networks are embedded so deeply in our lives that we have grown to take them for granted and thus are not able to consider what it would be like if 1 day they stopped working. Imagine not being able to access the Internet, collaborate with your peers, and even make or receive phone calls. During the Covid pandemic, which in many parts of the world came with a set of restrictions that required people to stay indoors, a 74% increase was observed [1] in the amount of time that people spend online not only for professional reasons but also for entertainment, education, and social interaction. But what is a telecommunication network? The World International Property Organization (WIPO) defines a telecommunication network as “the set of physical devices, denominated infrastructure, or the electromagnetic means that support the transmission, reception and emission of signals” [2]. To this end, the Third Generation Partnership Project [3] (3GPP) was formed in 1998 to unite different communication standards and provide a stable environment for the enhancement of telecommunication technologies. Notable generations of telecommunication networks produced by 3GPP, are 3G (for third generation), 4G, and 5G. Each generation comes with enhancements over the previous generation to improve the experience of using networks for the users by increasing the volume of data that can be sent over time (throughput) and how quickly such data can be received (latency) but also to improve how the networks operate internally.

The 6G network is the next step in this long line of mobile network generations. As such, it is expected to usher in a new era in the way that humans and machines communicate – an era where physical and digital worlds merge [4] by way of digital twins. Digital twins [5] offer the means to create a digital representation of a physical object, thus allowing that object to overcome the limitations of the physical world and interact with other digital entities while maintaining its physical properties. As such, we expect to see applications that range from humans interacting with each other in the meta-verse via AR/XR [6], thus experiencing more immersive communication, to smart manufacturing [7], where digital twins will collaborate to determine the optimal assembly/production line in the digital world before instructing their physical counterparts on how to reorient themselves.

1.2 AI-native telecommunication networks

Underneath the hood, 6G networks are expected to be AI-native and thus leverage AI technologies. In [8], the authors define an AI-native network as a system that has “intrinsic trustworthy AI capabilities, where AI is a natural part of the functionality in design, deployment, operation, and maintenance. An AI-native system capitalizes on a data-driven and knowledge-based ecosystem, where data is created and consumed to produce new AI-based functionality, replacing static, rule-based mechanisms with learning and adaptive AI when needed.” In the same article, the authors introduce a schematic illustrated in Figure 1 to describe how telecommunication networks are expected to migrate toward a target state where most (if not all) functions and capabilities are replaced by AI-powered approaches, thus resulting in an AI-native network. Figure 1 is partitioned horizontally into different layers where each layer can be interpreted using the terminology described in the Open Systems Interconnection model (OSI) network architecture [9].

As such, the lower layers deal with aspects related to the physical layer such as the physical properties of collecting signals from a wireless interface (in OSI this is called the physical layer). The middle layers deal with how to route (or transport) the information collected by the lower layers, now in the form of packets, to their corresponding recipients (in OSI this is called the transport layer). Finally, the top layers associate a more specific meaning to the packets that are being routed, which is often determined by end-user software such as browsers and e-mail clients (in OSI this is called the application layer).

In each layer, we encounter one or more machine-learning models, which either reside within the confines of each layer or are cross-layer, meaning that they control and/or leverage information from two or more layers. In an AI-native design, these models are expected to either introduce new functions/capabilities in each layer or to replace existing functions by re-inventing and enhancing their behavior using AI techniques. Examples of such models include mechanisms for load balancing, mobility optimization, and network energy saving [10].

In the same figure and on the vertical axis, we observe that layers are grouped under the term domain. Domains serve as the means of managing telecommunication networks due to their size, ownership, and how they are configured. As such, a single communication services provider (CSP) may own and manage multiple domains, each of which handles a specific geographical area of a nationwide network. However, domains might also have different owners (or different CSPs), and yet they need to interact, as observed in the case of roaming, where mobile connectivity is seamlessly preserved as the user moves from one country to another.

Finally, yet importantly, the aspect of model life-cycle management is illustrated in this figure to signify the need for maintaining the underlying machine-learning models including tasks such as collecting the data needed to train a machine-learning model, building and packaging the model as a software component and then making that model available either as a service via an Application Programming Interface (API) or by including that model in binary form in the same software package that implements the corresponding function.

Several techniques from the field of AI are expected to come together to design and implement AI-native networks such as generative AI [11], reinforcement learning (RL) [12], distributed intelligence (DI) [13] (horizontal and vertical federated learning (FL)) and multi-task learning [14] to mention a few. Additionally, in the scope of this chapter, we present and discuss the use of transfer learning (TL) and domain adaptation (DA) as one of the mechanisms that can be used by all the aforementioned AI techniques to enhance the process of developing and maintaining machine-learning models.

1.3 Machine learning in AI-native networks

As mentioned in the previous section, in each layer (and in between layers) of an AI-native network, we encounter different AI models that are tasked with supporting different use cases. Such use cases include load balancing, mobility optimization, and energy savings.

Load balancing entails spreading the load, for example, in terms of number of connections between devices such as smartphones and base stations that are serving those devices or, in other words, providing Internet access to the smartphones as they move around. The main goal of load balancing is to avoid overloading a particular base station and instead utilize a nearby base station that the smartphone can reach with high signal quality. The role of machine learning in this example is to predict the load of a base station and use that information to influence the choice of base station to where the smartphone connects to.

Mobility optimization is another important example: as smartphones move, the quality of the base station that they are originally connected to (typically known as the primary base station) deteriorates as the distance increases. However, a neighboring base station is usually nearby and ready to serve the incoming smartphone. Given its proximity to the moving smartphone, the neighboring base station offers a better communication channel. Still, all the information that was intended to be delivered to the smartphone is being sent to the primary base station. Mobility optimization techniques allow for such information to be deferred to the neighboring base station that the smartphone is going to connect to transparently without breaking any connectivity that might impact the user’s experience for example in cases where the user is watching a live video. The role of machine learning in this example is to predict the likelihood that a smartphone is going to connect to a base station for a given time horizon and as such prepare the transfer of information to that base station proactively.

Network energy-saving techniques are growing in popularity to reduce the operational expenses of operating a mobile network and also to build more sustainable networks that reduce the emissions of mobile networks toward the goal of reaching net-zero emissions by 2030 [15]. The role of machine learning in this context is important, as it uses techniques such as graph neural networks and conditional variational autoencoders. More specifically, via these techniques, one can associate the way the mobile network is configured, its performance, and the power it consumes during that operation. As such, its configuration can change when it is underutilized to reduce power consumption without impacting user experience [16].

It should be noted that machine learning in mobile networks is not limited to use cases that introduce new features. It can also be seen as a source of self-reflection or a mechanism for the network to understand itself in terms of performance. This is important since users of the network expect a certain performance. Predictive knowledge of how the network is going to perform or underperform is important since, on the one hand, it can offer certain guarantees and, on the other hand, detect the absence of such guarantees. Consequently, this can be used for troubleshooting and even root-cause analysis approaches, as well as to introduce on-the-fly techniques to combat potential loss in performance. In this chapter, we examine a network’s performance in three ways:

In the scope of service performance prediction (Section 2).
In terms of delay prediction, it measures the latency of a bit to travel from one side of the network to the other side (Section 3).
To predict the possibility that a network may not be able to perform an important function – allowing users to make calls or estimate the probability that a call will be dropped in block rate/drop call rate prediction (Section 4).

1.4 Transfer leaning and domain adaptation in telecommunication networks

Figure 2 revisits the architecture of the telecommunication network from the point of view of the infrastructure. In the device layer, we have devices such as smartphones, robots, and vehicles. They all transmit signals, which are then intercepted by the radio access network (RAN), which consists of antennae and base stations. Pure interception of signals is not enough for a telecommunication network – a core is needed, which can be seen as the brain of the network. The core consists of several functions that ensure authorization and authentication to different services, charging functions, and the ability to access the Internet. In addition, the core also contains network management capabilities, which can be used to monitor the network and also re-configure it in the case where problems are detected such as network congestion which may slow down its performance. On the top of this figure, we have the cloud layer which consists of data centers that can further boost the networks compute and storage capability.

Figure 2.
Mobile network infrastructure view. The figure represents three different challenges for using TL/DA in the telecom domain.

The number of machine-learning tasks is expected to grow as the network gets more and more complex across all four layers explored above. This may lead to challenges related to model management and securing model fitness in support of the dynamic nature of the network.

From this view, we identify three challenges: namely (C1) how to select the most suitable source model across a set of potential candidates, (C2) how to handle feature diversity across domains, and (C3) how to address the challenge of data and label limitations in the target domain for operational networks.

To overcome these, we observe that TL and DA can play a crucial role. Every model in each layer or position in the architecture can benefit from learning from another domain. This can be via a pretrained source model or by joint training of models across the network. Further, it can play an important role in the scope of model life-cycle management particularly when a model is deployed in a new (also known as greenfield) domain where no data or only a small amount of data has been collected. Instead of waiting for additional data to be collected, which can be time-consuming, TL and DA can enhance this process by using what is learned from other domains.

But what are TL and DA? TL is a technique originally introduced in 1976 [17] aiming at reusing the knowledge learned from one task to improve the process of learning another related task. Despite its humble beginnings, TL became increasingly important around 2012, as observed in [18] with the proliferation of deep learning, where a great body of work, particularly around computer vision, has benefited from TL in applications such as identifying types of objects. TL is typically implemented via fine-tuning, which means that some parts of a source model are updated while others are fixed. Traditionally, in a Neural Network context, only the last few layers are updated or fully replaced with new layers. Using this approach, the source model functions as a feature extractor, and the final layers of the model can be trained to use the knowledge of the first model in the new task. It is possible to change what (if any) layers are kept fixed and which ones are updated or replaced.

Domain adaptation (DA) [19] is a subset of transfer learning that deals with domain shifts between a source and target domain that share the same task. A simple example of a domain adaptation scenario is to do image segmentation in traffic where the source domain consists of images from summer and the target domain consists of images with snow in winter. An image segmentation model trained only on images from summer may underperform on a test set from winter unless some adaptation is made. There are three types of domain adaptation:

Supervised DA: all examples in the target domain are labeled.
Semi-supervised DA: some samples (usually a small set) of the target domain are labeled.
Unsupervised DA: Only the source domain contains labeled samples while the target domain is unlabeled.

In all cases, both traditional TL and DA, the feature space between the source and target domain can be the same (homogeneous TL/DA) or different (heterogeneous TL/DA). In the case of heterogeneous TL/DA, the meaning of the input features changes, and they may even change dimensionality. In these cases, it may be undesired or impossible to do traditional fine-tuning or to use DA methods that utilize common feature extractors between the source and target domains.

1.5 Chapter outline

In this chapter, we present and discuss our experiences while introducing and developing the field of transfer learning and domain adaptation in the telecommunications industry. The telecommunication industry can be described as an unusual area in which to apply transfer learning since most datasets do not consist of images or natural language. Instead, most datasets are tabular, with several columns and features containing arithmetic data for different key performance indicators (KPIs) that measure the network’s performance. As such, in this chapter, we describe our experiences working with such datasets and applying transfer learning and domain adaptation to the creation of an AI-native network architecture.

The chapter is structured as follows: Having introduced AI-native networks and the specific challenges that TL and DA can address as machine learning transforms our networks, we continue by describing three applications of these. We begin in Section 2 by describing the application of DA in network management. In Section 3, we present the value of using the right source models when performing TL. In Section 4, we provide an example of using TL in the context of FL to speed up the learning process. It should be noted that all applications described in this chapter have been published in peer-reviewed conferences. Yet, in these chapters, the reader can find a more intuitive explanation of each work which can function as the means of making the techniques more accessible and easier to relate. The chapter finishes with a set of conclusions and takeaways, including the highlights of each use case.

2. Transfer learning for service performance prediction

2.1 Service performance prediction

As of 5G, telecom systems have become cloud native, and the network enables many new services (such as remote-controlled vehicles and machines, online gaming, and video streaming) as it has low latency, high bandwidth, and new capabilities such as network slicing. Telecom service providers must be able to deliver services under strict Service-Level Agreements (SLA). Therefore, it is crucial to introduce the capability of predicting the performance of the service running in a dynamically changing environment.

The performance of a cloud service delivered over a telecom network depends on the current load and allocated resources to the service. In a cloud environment, the load is often highly dynamic, and the allocated resources may change during operation due to scaling and migration. Figure 3 shows a simplified example where a video streaming service is running in data center 1. Due to changes in the utilization or workloads, the service could be scaled up and/or migrated to another execution environment (e.g., two instances of the service being deployed in DC 2 and DC 3, and the traffic is load-balanced between the two execution environments). In such a scenario, an ML model trained for predicting the performance of the video service (e.g., the frame rates at the client side) using input features collected from DC 1 and output metrics from UE1 will not perform well for the target execution environment, and therefore a new ML model is needed.

Figure 3.
Service performance prediction in dynamic environment.

In [20], we showed how changes in the operational environment can significantly reduce the accuracy of a predictive model by investigating several realistic scenarios using data from a real testbed. Our evaluation results showed that fine-tuning some layers of a model trained for one environment (source domain) could significantly reduce the number of required data samples in the new/changed environment (target domain) and, therefore, could reduce the overhead of data collection via monitoring/measurements, as well as reducing model training time compared to training a new model from scratch.

2.2 Source model selection

Even though transfer learning using approaches such as fine-tuning has shown promising results in applications such as service performance prediction, it is known that fine-tuning a source model that is not similar or relevant to the target domain can negatively impact the performance of the model. That is, in the case where transfer learning decreases the performance of the target model. This is known as negative transfer and has been studied in different domains such as computer vision [21]. Conceptually, a relevant source to a target domain is the one that “improves” the information content of the target domain and thus improves the model performance. To improve the information content of the target domain, the information content of the candidate source must be complementary.

As shown in Figure 2 (C1) and as discussed before, several ML models trained for similar services or models from other domains or layers might be available to be used as a source model to be fine-tuned or adapted after changes in the operational environment. In [22, 23], we have taken a closer look at transfer learning and determine ways on how to select a source for transfer learning. The proper source selection can be beneficial as it can speed up the process but also overcome the issue of negative transfer when choosing incorrect sources.

To address the challenge of source selection, we investigated two different approaches. The first approach – similarity – asks the question: “What source model comes from the environment most similar to the new environment?” The idea is that a source model coming from a very similar environment should be a good starting point, as a model coming from the same environment would be identical – and we know that is what we want in the end. The second approach – diversity – asks the question: “What source model has seen the most different scenarios?” This idea is that a model that has seen more diverse data has more knowledge to pass on, no matter how the environment in the target domain looks.

A comprehensive evaluation of the two approaches shows the importance of selecting the right source. We measure the impact of using transfer learning by introducing the concept of transfer gain, which is the difference in the error of a model trained on the target data alone and the error of a model transferred from the source to the target using TL. If a reliable source is selected, the target model will benefit and have a positive transfer gain, but selecting a wrong source may lead to a negative transfer gain (as shown in the right-hand side of Figure 4).

Figure 4.
Transfer gain for two different targets, one with only a single service being executed on the hardware (left) and one with two services running (right).

Figure 4 shows some representative results. The figures show the transfer gain calculated using normalized mean absolute error (NMAE) for two target video-on-demand services V1P and V2F when transferring from different sources (V1P, V1F, V2P, and V2F). In terms of diversity and similarity, V1P and V1F are similar to each other but less diverse than V2P and V2F, which, in turn, are similar to each other. Our experimental evaluation shows that the diversity-based approaches outperform the similarity ones. This is visible in Figure 4, where the target domain is V1P with the task of predicting audio buffer rate, and the candidate sources are all domains with the task of predicting the average framerate. Interestingly, here, the best sources are not the ones trained on the same identical domain as in the target; in fact, the two diverse domains V2P and V2F outperform V1P as a source significantly. This illustrates the point that the diversity of the source is very important, since even choosing an identical domain as the target provides less useful insight to transfer compared to using a more diverse one. Additionally, it makes sense to use diversity for practical purposes. When we have limited access to target data, as in the cases where we need TL, it is hard to calculate similarity. This is simply because we do not have enough data from the target, so any similarity measure will be unreliable. When it comes to diversity, this does not take the target data into account at all; thus, we can already know what the best model is to transfer before the need arises.

We should note that source selection is also important when dealing with remote datasets such as in the context of federated learning when one can choose between different participants to learn from. As such, knowledge of optimal sources can be used to overcome unnecessary data transfers, thus reducing communication costs.

2.3 Heterogeneous transfer learning

Traditionally, many transfer learning applications operate under the assumption that source and target data both have the same feature space, and that they are extracted from the same distribution. In many telecom use cases, however, this is not certain. For example, in the service performance prediction scenario, when a network application is being migrated or horizontally/vertically scaled in the cloud, the number of available features (e.g., data collected from the infrastructure such as network and memory usage) may change (see Figure 2 (C2) as an example). As another example, the feature set may change due to, e.g., data collection costs in certain parts of the network. That is, the cost of collecting a specific feature may not match the expected performance improvement of the model, and thus should be removed.

When the number of input features between a source model and a target model is not the same, simple fine-tuning of the model weights is not possible. To accommodate the varying feature sets and the impact on the service performance prediction use case, we explored two approaches, a naïve and an advanced, in [24]. These approaches adapt the first layers of the neural network model. In the naïve approach, we randomly initialize the new weights of the first layer, whereas in the second approach, we transfer the weights corresponding to overlapping features in the source and target domains, and the rest are randomly initialized. A comprehensive evaluation of the two approaches using data from a testbed showed that the advanced approach is often more capable, especially with the low availability of target-domain samples and smaller model architectures.

Representative results are illustrated in Figure 5, where a model transfer is performed from a source to a target domain. The model predicts service performance from statistics extracted from an edge-network infrastructure, as explained in [23]. In this case, the source domain feature set contains 18 features, whereas the target is limited to either 10 (left plot) or 2 (right plot) features. The transfer is evaluated for five different use case representative neural-network architectures (Shallow-4, Deep Narrow-6, Deep-Narrow-10, Deep-Wide-6, and Deep-Wide 10). As can be observed, both the naïve and advanced transfer approaches outperform training from “scratch,” using only a limited number of target-domain training samples in terms of NMAE for the target model.

Figure 5.
NMAE for heterogeneous transfer using the naïve and advanced approaches, for a set of different neural-network architectures, considering a small number (25) of training samples.

3. Delay prediction in 5G enhanced with transfer learning and domain adaptation

3.1 Network delay prediction and its importance for new services

The rapid global deployment of 5G networks has facilitated the emergence of numerous applications in various fields, including real-time communication, industrial control, and automotive sectors. Critical to these applications is the conformance of performance metrics, such as communication delay, as delay violations can potentially lead to safety failures.

Using machine learning for predicting performance metrics, like communication delay, provides network operators with the opportunity to proactively address trends and potential violations. Moreover, it can serve as input for end-to-end communication protocols to optimize their communication patterns. However, learning the performance such as communication delay in an operational environment, is not practical. Collecting training data from an operational environment requires extensive measurements, which can adversely affect the service. Furthermore, expensive instrumentation may be required for the user devices. In some environments, it may not be feasible to collect labeled data, as shown in (C3) of Figure 3. One solution to this problem is to use transfer learning and domain adaptation.

In a series of studies, we have explored approaches that utilize deep neural networks to predict communication delay experienced by 5G applications based on metrics observed in the 5G infrastructure [25, 26]. This corresponds to making delay predictions in the base station in the RAN in Figure 2. Further, Figure 6 shows the concept from a functional viewpoint where an ML model, located in the 3GPP function called Network Data Analytics Function (NWDAF), reads 5G infrastructure data from the Radio Access Network (RAN) for its delay prediction task. The prediction outcome is then exposed to the applications via the Application Function (AF) or the Network Exposure Function n(NEF) for end-to-end optimization or back to the RAN for network configuration and resource optimization.

Figure 6.
Applications on a user equipment (UE) are communicating with an application server over a 3GPP network. In the 3GPP network extensive measurements are conducted in the RAN for prediction of delay in the NWDAF. Insights are exposed toward the application.

3.2 Domain adaptation to overcome limitations in labeled data

The performance of trained models may be compromised in case of sudden or gradual changes in network configuration, traffic behavior, and user patterns, as well as in the case of new user equipment types deployed. To address these challenges, transfer learning and domain adaptation techniques can be applied to enhance the robustness of the models in dynamic network environments [27]. Here, we compared the performance of different transfer learning and domain adaptation approaches for varying sizes of target data sets with varying amounts of labels available in the target data set. Traditional fine-tuning requires labeled data, which is then evaluated only for the available labeled data. In the case where there is only unlabeled data available, we instead show the results of applying the source model directly, as it is impossible to fine-tune without labels. For domain adaptation, we show an evaluation of two domain adaptation methods based on DANN [28] including one modified version that warm starts part of the DA model with a pretrained source model and uses both unlabeled and labeled samples. An example of the results is shown in Figure 7.

Figure 7.
Example results with the different evaluated approaches for the network performance use case.

The following observations can be made from Figure 7: First, domain adaptation improves model performance even in cases where we do not have any labelled target samples. Secondly, the more target samples we have, and the more labelled target samples, the better fine-tuning works. Thirdly, with a large enough dataset in the target domain, fine-tuning does very well. Conversely, when the previous criteria are not met, such as when we do not have labelled target samples and a small number of target samples, then transfer learning is required. Finally, when we have a high-quality target data set, transfer learning becomes less crucial for the final model performance compared to training the model from scratch.

4. Federated key performance indicator prediction

Federated Learning is a machine-learning technique originally introduced in [29] to enable privacy when learning machine-learning models. As such, federated learning trades the transfer of original raw data collected remotely with the transfer of neural parameters. To that end one additional node is introduced, typically referred as the parameter server, that is tasked to aggregate the neural parameters of each participant.

The process of learning a model locally, transferring produced neural parameters to the parameter server for each participant, aggregating those parameters, and then sending them back to further train the aggregated model on each participant for a number of iterations is known as the federated learning protocol and is illustrated in Figure 8.

Figure 8.
Federated learning protocol with random or pretrained model initialization.

To enable privacy protocols that are orthogonal to the federated training process such as secure aggregation [30] have been introduced to alleviate the problem of leaking sensitive information if the parameter server is in any way compromised.

Federated Learning found its way into telecommunication networks early on since privacy is a desired property in 3GPP [31] but also because the transfer of neural parameters can significantly reduce the cost of transferring data particularly in the cases where the architecture of a neural network is small [32].

In federated learning and in the initial iteration every participant has the same set of neural parameters, which is usually a random distribution. This step is known as the initialization step. Another popular example of such initialization is the Xavier initialization, which allows activations and gradients to flow actively during forward and backward propagation.

Intuitively, one can observe a link between transfer learning and federated learning. From a transfer learning perspective, the federated learning protocol where the aggregated parameters of all participants are sent to every participant in each iteration is a form of transfer learning where the aggregated wisdom of each participant is used as a form of fine-tuning iteratively until the overall process converges or in other words, training loss stabilizes.

This observation raises the question – what if instead of using a random distribution of neural parameters in the initialization phase, we used a set of neural parameters from a previously trained model for the same use case? Such a model can be obtained from other data sources which can either be real data obtained in testing environments, simulations, and knowledge from domain experts in the field of telecommunications.

We put that question to the test in the context of a use case called block call rate prediction. Block Call Rate (BCR) is an important key performance indicator in mobile networks as it allows the detection of potential cases where users might be unable to perform a call.

The results for this figure are summarized in Table 1. The table consists of four rows and four columns. The columns describe four different experiments that we performed:

Centralized: The centralized experiment serves as a baseline. Here, we train machine learning by copying data from all participants in a centralized location. We then expose all these data directly in the training process.
Federated: In this experiment, we train different models without moving the data and aggregate them instead. In this example, the model is randomly initialized.
Pretrained Model: In this experiment, we obtained a pretrained model in a centralized manner using a larger dataset.
Federated Pretrained Model: In this experiment, instead of random initialization, we initiate all participants using the knowledge from the pretrained model obtained in the previous experiment.

	Centralized	Federated	Pretrained Model (one-time cost)	Federated with Pretrained Model
AUC	0.69	0.67	0.77	0.76
Network Footprint (MB)	55	4.2	2.3	4.2
Training Time (seconds)	825	40	3600	40
Privacy	Low	High	Low	High

Table 1.

Centralized, federated, and Pretrained model comparison.

Table 1 has 4 rows, and each row assesses each experiment using a different metric.

Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) is a metric used in classification problems. It represents how likely it is that a positive sample is placed on the right of a random negative example. It is measured between 0 and 1, where 1 means that the model is always correct, while 0 means that the model is always wrong.
Network footprint: Measures the volume of data that is transferred in each experiment. In the centralized and pretrained model experiment, we measure the transfer of data in MB. This measurement is interpreted as how much data we would need to transfer if this data were remotely obtained. In the case of federated learning, we measure the transfer of the neural parameters as defined by Eq. (1):

Network Footprint = (N+1)∗R∗ModelE1

where N is the number of participants, R the number of iterations and Model indicates the size of the serialized form of the neural-network parameters.

Training time: Measures the time that it took to train the model using a single host with multiple processing cores.
Privacy: Measures the impact in privacy, which is either Low; there are no privacy implications since no data is transferred or high; there are privacy implications since raw data are transferred.

Starting from the centralized experiment, we obtain a baseline of 0.69 AUC, which comes at the expense of privacy and a high volume of data transfer. The federated setting achieves a slightly lower AUC (0.67) while moving a smaller set of data and preserving privacy. Since the training process is parallelized, training time is also reduced. We then proceeded to train in a centralized model using a bigger dataset for the same use case, which achieved the highest AUC in our experiments – 0.77. Using this model, in the initialization phase shows that AUC is preserved (0.76) and it is not negatively impacted by the learning of other participants, thus benefiting the federation.

Beyond BCR, we have applied the same approach in other use cases that share the same problem formulation; in a supervised classification problem, we train a model to determine the likelihood that a given KPI is going to increase, remain stable, or deteriorate for a given time horizon. One example of such a KPI is the Drop Call Rate (DCR).

5. On the impact of transfer learning on the telecommunication industry

The telecommunication industry is transitioning into an AI-native architecture that is replacing traditional network functions with data-driven and AI-powered algorithms. This introduces new challenges such as the need to collect data and to perform model life cycle management to maintain the corresponding, now AI-powered, network functions. In the context of AI-native networks and beyond the use cases presented in this chapter, we see that transfer learning and domain adaptation will play an important role in managing operational expenses by reducing resource expenditure from data collection, ML model training, and in maintaining and managing the life cycles of models. Moreover, we expect to see the importance of transfer learning and domain adaptation in tasks such as rolling out new network deployments. New network deployments mean that new nodes (such as base stations) are deployed in the field or that new frequencies are being used to improve the utilization of spectrum that has been allocated to a certain area. Usually, such deployments require manually collecting data and tuning the network, sometimes via measurement campaigns that take time and strain network resources due to the volume of data that needs to be collected. We expect that the use of transfer learning and domain adaptation will revolutionize this process as it will allow for previous knowledge of the area to be reused and extended, thus greatly speeding up deployment and making newly installed equipment readily available to users.

6. Conclusions

This chapter has discussed the applicability of transfer learning (TL) and domain adaptation (DA) in the context of telecommunication networks, illustrating how these technologies play a pivotal role in advancing the overarching vision of an AI-native network. Furthermore, the chapter provides an in-depth discussion of three significant use cases, namely, service performance prediction, delay prediction in 5G networks, and block call rate prediction.

In the domain of service performance prediction, TL and DA emerge as critical tools for refining predictive models when the service execution environment undergoes a significant change. By leveraging knowledge gained from disparate yet related tasks, incorporating heterogeneous features, and utilizing diverse source models, these technologies enhance the accuracy of the models. Additionally, similar insights are observed for delay prediction in 5G networks, specifically addressing the challenge of a lack of labeled data in operational networks. Finally, in the block call rates use case, it is showcased how TL can improve the initialization of models trained through federated learning. These use cases shed light on how TL and DA technologies are specifically tailored to address the dynamic nature of telecom environments, ensuring model robustness, generalization, and adaptability.

However, to achieve the vision of an AI-native network, several outstanding research issues and engineering challenges must be further addressed. A primary concern is the development of a unified framework for TL and DA, featuring a standardized API across diverse functions, network layers, methodologies, and data sources prevalent in the telecommunication domain. Embedded within such a framework is also the need to control the monitoring and measurement functions in the network to safeguard data (both labeled and unlabeled) in target domains, execution environments distributed throughout the telecommunication network to facilitate the TL and DA, as well as model life-cycle management.

Conflict of interest

The authors declare no conflict of interest.

Acronyms

AR/XR	augmented reality/extended reality
AI	artificial intelligence
API	application programming interface
OSI	open systems interconnection model
CSP	communication service provider
TL	transfer learning
DA	domain adaptation
ML	machine learning
RL	reinforcement learning
DI	distributed intelligence
FL	federated learning
WIPO	World Intellectual Property Organization
3GPP	3rd Generation Partnership Project

References

1. Ericsson. Keeping consumers connected [Internet]. 2022. Available from: https://wcm.ericsson.net/49d4b7/assets/local/reports-papers/consumerlab/reports/2020/global-report-keeping-consumers-connected-16062020.pdf [Accessed: January 15, 2024]
2. World Intellectual Property Organization. Basic Telecommunication Law [Internet]. World Intellectual Property Organization; 2024. Available from: https://wipolex-res.wipo.int/edocs/lexdocs/laws/en/ao/ao012en.html [Accessed: January 15, 2024]
3. 3rd Generation Partnership Project. Introducing 3GPP. [Internet]. 2024. Available from: https://www.3gpp.org/ about-us/introducing-3gpp [Accessed: December 6, 2023]
4. Ericsson. Follow the journey to 6G [Internet]. 2024. Available from: https://www.ericsson.com/en/6g [Accessed: January 4, 2024]
5. Tao F, Zhang H, Liu A, Nee AY. Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics. 2018;15(4):2405-2415
6. Tang F, Chen X, Zhao M, Kato N. The roadmap of communication and networking in 6G for the metaverse. IEEE Wireless Communications. 2023;30(4):72-81. DOI: 10.1109/MWC.019.2100721
7. Mourtzis D, Angelopoulos J, Panopoulos N. Smart manufacturing and tactile internet based on 5G in industry 4.0: Challenges, applications and new trends. Electronics. 2021;10(24):3175
8. Ericsson. Defining AI native: A key enabler for advanced intelligent telecom networks. [Internet]. 2023. Available from: https://www.ericsson.com/en/reports-and-papers/white-papers/ai-native. [Accessed: January 4, 2024]
9. ISO/IEC 7498-1:1994. Information technology Open Systems Interconnection Basic Reference Model: The Basic Model. [Internet]. 2020. Available from: https://www.iso.org/standard/20269.html [Accessed: January 1, 2024]
10. 3GPP Highlights. 3GPP – the 5G standard [Internet]. 2022. Available from: https://www.3gpp.org/images/newsletters/3GPP_Highlights_Issue_5_WEB_opt1.pdf [Accessed: January 4, 2024]
11. Cao Y, Li S, Liu Y, Yan Z, Dai Y, Yu PS, et al. A Comprehensive survey of ai-generated content (aigc): A history of generative AI from GAN to chatGPT. [Internet]. arXiv. 2023. Available from: https://arxiv.org/abs/2303.04226 [Accessed: January 15, 2024]
12. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. Journal of Artificial Intelligence Research. 1996;1(4):237-285
13. Driss MB, Sabir E, Elbiaze H, Saad W. Federated learning for 6G: Paradigms, taxonomy, recent advances and insights. [Internet]. arXiv. 2023. Available from: https://arxiv.org/abs/2312.04688 [Accessed: January 15, 2024]
14. Caruana R. Multitask learning. Machine learning. 1997;28:41-75
15. Ericsson. Net Zero: what does it mean and how do we get there? [Internet]. 2022. Available from: https://www.ericsson.com/en/blog/2022/3/net-zero-what-is-it [Accessed: December 4, 2023]
16. Ericsson. Ensuring energy-efficient networks with artificial intelligence [Internet]. 2021. Available from: https://www.ericsson.com/en/reports-and-papers/ericsson-technology-review/articles/ensuring-energy-efficient-networks-with-ai [Accessed: December 5, 2023]
17. Bozinovski S, Fulgosi A. The influence of pattern similarity and transfer learning upon training of a base perceptron b2. In: Proceedings of Symposium Informatica. Vol. 3. 1976. pp. 121-126
18. Ng A. Nuts and Bolts of Building AI Applications Using Deep Learning. [Internet]. NIPS Keynote Talk; 2016. Available from: https://neurips.cc/virtual/2016/tutorial/6203 [Accessed: January 15, 2024]
19. Wang M, Deng W. Deep visual domain adaptation: A survey. Neurocomputing. 2018;27(312):135-153
20. Moradi F, Stadler R, Johnsson A. Performance prediction in dynamic clouds using transfer learning. In: 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). Arlington, VA, USA: IEEE; 2019. pp. 242-250
21. Wang Z, Dai Z, Póczos B, Carbonell J. Characterizing and avoiding negative transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE; 2019. pp. 11293-11302
22. Larsson H, Taghia J, Moradi F, Johnsson A. Towards source selection in transfer learning for cloud performance prediction. In: 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM). Bordeaux, France: IEEE; 2021. pp. 599-603
23. Larsson H, Taghia J, Moradi F, Johnsson A. Source selection in transfer learning for improved service performance predictions. In: 2021 IFIP Networking Conference (IFIP Networking). Espoo and Helsinki, Finland: IEEE; 2021. pp. 1-9
24. Sanz FG, Ebrahimi M, Johnsson A. Exploring approaches for heterogeneous transfer learning in dynamic networks. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. Budapest, Hungary: IEEE; 2022. pp. 1-9
25. Rao A, Riaz H, Zavodovski A, Mochaourab R, Berggren V, Johnsson A. Generalizable one-way delay prediction models for heterogeneous UEs in 5G network. In: IEEE Network Operations and Management Symposium (NOMS). Seoul, South Korea: IEEE; 2024
26. Rao A et al. Prediction and exposure of delays from a base station perspective in 5G and beyond networks. In: Proceedings of the ACM SIGCOMM Workshop on 5G and beyond Network Measurements, Modeling, and Use Cases. Amsterdam, Netherlands: Association for Computing Machinery; 2022. pp. 8-14
27. Larsson H, Moradi F, Taghia J, Lan X, Johnsson A. Domain adaptation for network performance modeling with and without labeled data. In: NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium. Miami, FL, USA: IEEE/IFIP; 2023. pp. 1-9
28. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, et al. Domain-adversarial training of neural networks. Journal of Machine Learning Research. 2016;17(59):1-35
29. McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. Fort Lauderdale, FL, USA: PMLR; 2017. pp. 1273-1282
30. Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, et al. Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, Texas, USA: Association for Computing Machinery; 2017. pp. 1175-1191
31. Khan H, Dowling B, Martin KM. Identity confidentiality in 5G mobile telephony systems. In: International Conference on Research in Security Standardization. Cham: Springer International Publishing; 2018. pp. 120-142
32. Ericsson. Privacy-aware machine learning with low network footprint [Internet]. 2019. Available from: https://www.ericsson.com/en/reports-and-papers/ericsson-technology-review/articles/privacy-aware-machine-learning [Accessed: January 4, 2024]

[1] 1. Ericsson. Keeping consumers connected [Internet]. 2022. Available from: https://wcm.ericsson.net/49d4b7/assets/local/reports-papers/consumerlab/reports/2020/global-report-keeping-consumers-connected-16062020.pdf [Accessed: January 15, 2024]

[2] 2. World Intellectual Property Organization. Basic Telecommunication Law [Internet]. World Intellectual Property Organization; 2024. Available from: https://wipolex-res.wipo.int/edocs/lexdocs/laws/en/ao/ao012en.html [Accessed: January 15, 2024]

[3] 3. 3rd Generation Partnership Project. Introducing 3GPP. [Internet]. 2024. Available from: https://www.3gpp.org/ about-us/introducing-3gpp [Accessed: December 6, 2023]

[4] 4. Ericsson. Follow the journey to 6G [Internet]. 2024. Available from: https://www.ericsson.com/en/6g [Accessed: January 4, 2024]

[5] 5. Tao F, Zhang H, Liu A, Nee AY. Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics. 2018;15(4):2405-2415

[6] 6. Tang F, Chen X, Zhao M, Kato N. The roadmap of communication and networking in 6G for the metaverse. IEEE Wireless Communications. 2023;30(4):72-81. DOI: 10.1109/MWC.019.2100721

[7] 7. Mourtzis D, Angelopoulos J, Panopoulos N. Smart manufacturing and tactile internet based on 5G in industry 4.0: Challenges, applications and new trends. Electronics. 2021;10(24):3175

[8] 8. Ericsson. Defining AI native: A key enabler for advanced intelligent telecom networks. [Internet]. 2023. Available from: https://www.ericsson.com/en/reports-and-papers/white-papers/ai-native. [Accessed: January 4, 2024]

[9] 9. ISO/IEC 7498-1:1994. Information technology Open Systems Interconnection Basic Reference Model: The Basic Model. [Internet]. 2020. Available from: https://www.iso.org/standard/20269.html [Accessed: January 1, 2024]

[10] 10. 3GPP Highlights. 3GPP – the 5G standard [Internet]. 2022. Available from: https://www.3gpp.org/images/newsletters/3GPP_Highlights_Issue_5_WEB_opt1.pdf [Accessed: January 4, 2024]

[11] 11. Cao Y, Li S, Liu Y, Yan Z, Dai Y, Yu PS, et al. A Comprehensive survey of ai-generated content (aigc): A history of generative AI from GAN to chatGPT. [Internet]. arXiv. 2023. Available from: https://arxiv.org/abs/2303.04226 [Accessed: January 15, 2024]

[12] 12. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. Journal of Artificial Intelligence Research. 1996;1(4):237-285

[13] 13. Driss MB, Sabir E, Elbiaze H, Saad W. Federated learning for 6G: Paradigms, taxonomy, recent advances and insights. [Internet]. arXiv. 2023. Available from: https://arxiv.org/abs/2312.04688 [Accessed: January 15, 2024]

[14] 14. Caruana R. Multitask learning. Machine learning. 1997;28:41-75

[15] 15. Ericsson. Net Zero: what does it mean and how do we get there? [Internet]. 2022. Available from: https://www.ericsson.com/en/blog/2022/3/net-zero-what-is-it [Accessed: December 4, 2023]

[16] 16. Ericsson. Ensuring energy-efficient networks with artificial intelligence [Internet]. 2021. Available from: https://www.ericsson.com/en/reports-and-papers/ericsson-technology-review/articles/ensuring-energy-efficient-networks-with-ai [Accessed: December 5, 2023]

[17] 17. Bozinovski S, Fulgosi A. The influence of pattern similarity and transfer learning upon training of a base perceptron b2. In: Proceedings of Symposium Informatica. Vol. 3. 1976. pp. 121-126

[18] 18. Ng A. Nuts and Bolts of Building AI Applications Using Deep Learning. [Internet]. NIPS Keynote Talk; 2016. Available from: https://neurips.cc/virtual/2016/tutorial/6203 [Accessed: January 15, 2024]

[19] 19. Wang M, Deng W. Deep visual domain adaptation: A survey. Neurocomputing. 2018;27(312):135-153

[20] 20. Moradi F, Stadler R, Johnsson A. Performance prediction in dynamic clouds using transfer learning. In: 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). Arlington, VA, USA: IEEE; 2019. pp. 242-250

[21] 21. Wang Z, Dai Z, Póczos B, Carbonell J. Characterizing and avoiding negative transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE; 2019. pp. 11293-11302

[22] 22. Larsson H, Taghia J, Moradi F, Johnsson A. Towards source selection in transfer learning for cloud performance prediction. In: 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM). Bordeaux, France: IEEE; 2021. pp. 599-603

[23] 23. Larsson H, Taghia J, Moradi F, Johnsson A. Source selection in transfer learning for improved service performance predictions. In: 2021 IFIP Networking Conference (IFIP Networking). Espoo and Helsinki, Finland: IEEE; 2021. pp. 1-9

[24] 24. Sanz FG, Ebrahimi M, Johnsson A. Exploring approaches for heterogeneous transfer learning in dynamic networks. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. Budapest, Hungary: IEEE; 2022. pp. 1-9

[25] 25. Rao A, Riaz H, Zavodovski A, Mochaourab R, Berggren V, Johnsson A. Generalizable one-way delay prediction models for heterogeneous UEs in 5G network. In: IEEE Network Operations and Management Symposium (NOMS). Seoul, South Korea: IEEE; 2024

[26] 26. Rao A et al. Prediction and exposure of delays from a base station perspective in 5G and beyond networks. In: Proceedings of the ACM SIGCOMM Workshop on 5G and beyond Network Measurements, Modeling, and Use Cases. Amsterdam, Netherlands: Association for Computing Machinery; 2022. pp. 8-14

[27] 27. Larsson H, Moradi F, Taghia J, Lan X, Johnsson A. Domain adaptation for network performance modeling with and without labeled data. In: NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium. Miami, FL, USA: IEEE/IFIP; 2023. pp. 1-9

[28] 28. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, et al. Domain-adversarial training of neural networks. Journal of Machine Learning Research. 2016;17(59):1-35

[29] 29. McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. Fort Lauderdale, FL, USA: PMLR; 2017. pp. 1273-1282

[30] 30. Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, et al. Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, Texas, USA: Association for Computing Machinery; 2017. pp. 1175-1191

[31] 31. Khan H, Dowling B, Martin KM. Identity confidentiality in 5G mobile telephony systems. In: International Conference on Research in Security Standardization. Cham: Springer International Publishing; 2018. pp. 120-142

[32] 32. Ericsson. Privacy-aware machine learning with low network footprint [Internet]. 2019. Available from: https://www.ericsson.com/en/reports-and-papers/ericsson-technology-review/articles/privacy-aware-machine-learning [Accessed: January 4, 2024]

Transfer Learning and Domain Adaptation in Telecommunications

Transfer Learning - Leveraging the Capability of Pre-trained Models Across Different Domains [Working Title]

Abstract

Keywords

Author Information

Konstantinos Vandikas*

Farnaz Moradi

Hannes Larsson

Andreas Johnsson

1. Introduction

1.1 Telecommunications and 6G networks

1.2 AI-native telecommunication networks

Figure 1.

1.3 Machine learning in AI-native networks

1.4 Transfer leaning and domain adaptation in telecommunication networks

Figure 2.

1.5 Chapter outline

2. Transfer learning for service performance prediction

2.1 Service performance prediction

Figure 3.

2.2 Source model selection

Figure 4.

2.3 Heterogeneous transfer learning

Figure 5.