Open access peer-reviewed chapter - ONLINE FIRST

Private SVM Inference on Encrypted Data

Written By

Ahmad Al Badawi

Submitted: 29 July 2024 Reviewed: 02 August 2024 Published: 04 September 2024

DOI: 10.5772/intechopen.1006690

Support Vector Machines - Algorithms, Optimizations, and Real-World Applications IntechOpen
Support Vector Machines - Algorithms, Optimizations, and Real-Wor... Edited by Robertas Damaševičius

From the Edited Volume

Support Vector Machines - Algorithms, Optimizations, and Real-World Applications [Working Title]

Dr. Robertas Damaševičius

Chapter metrics overview

12 Chapter Downloads

View Full Metrics

Abstract

This tutorial chapter provides a comprehensive guide to implementing privacy-preserving Support Vector Machine (SVM) inference using Fully Homomorphic Encryption (FHE). We demonstrate a practical solution for secure and private SVM inference on encrypted data, enabling sensitive data analysis while maintaining confidentiality. Through a step-by-step implementation on a real-world dataset, we cover data preparation, SVM model training, and homomorphic inference. Our experimental results on a commodity laptop show that our approach achieves high accuracy with a reasonable latency of nearly 6 seconds for SVM inference. This chapter serves as a valuable resource for practitioners and researchers seeking to apply privacy-preserving techniques to SVM solutions, with significant implications for applications like medical diagnosis, financial prediction, and recommender systems, where data privacy is crucial. By following this tutorial, readers can gain hands-on experience with privacy-preserving SVM inference using FHE.

Keywords

  • encrypted SVMs inference
  • machine learning as a service
  • privacy-preserving machine learning
  • homomorphic encryption
  • implementation

1. Introduction

Support Vector Machines (SVMs) have become indispensable tools in supervised machine learning, playing a crucial role in solving a wide range of problems. These algorithms are widely employed for both classification and regression tasks. Their effectiveness lies in their ability to find optimal decision boundaries by maximizing the margin between different classes making them particularly useful when dealing with complex datasets [1, 2, 3, 4, 5, 6, 7, 8, 9].

A significant challenge arises when applying SVMs to problems involving sensitive data, which poses substantial privacy risks. This is particularly concerning in scenarios where datasets contain personal and sensitive information, such as medical records, financial transactions, or user preferences. In these cases, maintaining data privacy becomes crucial to prevent unauthorized access, misuse, or exploitation. Furthermore, stringent privacy regulations like the General Data Protection Regulation (GDPR) impose strict constraints on handling sensitive data, limiting its use beyond specific purposes like storage and logging and prohibiting its use for analysis without privacy-aware methods [10, 11]. Therefore, developing machine learning algorithms, including SVMs, that effectively utilize sensitive data while ensuring privacy remains a critical and active research area [12, 13, 14, 15, 16, 17, 18, 19].

One promising solution to balancing data utility and privacy is Fully Homomorphic Encryption (FHE) [20], a cryptographic method that allows an untrusted party to perform operations directly on encrypted data without decryption. This ensures that sensitive information remains confidential throughout the analysis process, making it remarkably useful in cloud computing environments where sensitive data might be shared with the cloud provider. FHE offers a solution for utilizing sensitive data while preserving user privacy. It allows us to extract insights from the data without ever decrypting it, effectively overcoming the limitations imposed by privacy regulations and concerns.

This chapter specifically addresses the challenge of Machine Learning as a Service (MLaaS) when the analysis algorithm is an SVM [21, 22]. As shown in Figure 1, in this scenario, two main actors are involved: (1) a user with sensitive data records and (2) a public server that owns a pre-trained SVM model. The user encrypts their data using their FHE secret key and sends it, along with the FHE public material, to the server. The server can then evaluate the pre-trained SVM model on the encrypted user input using FHE evaluation methods. This generates an encrypted insight that is transmitted back to the user. Only the user who possesses the FHE secret key can decrypt the result and gain insight.

Figure 1.

Privacy-preserving MLaaS protocol with FHE.

We provide a comprehensive guide that empowers the reader to implement private SVM inference using homomorphic encryption. Our main goal is to provide readers with the knowledge and tools necessary to implement private SVM inference while preserving data confidentiality via FHE. This chapter caters to readers with varying levels of cryptographic expertise, assuming only a basic understanding of SVMs.

The remainder of this chapter is structured as follows. Section 2 reviews the state of the art on private machine and deep learning under FHE. Section 3 reviews the core methods employed in the presented solution. Section 4 details the design and implementation of the proposed solution. Section 5 presents the experimental results. Section 6 concludes the work and explores potential avenues for future research.

Advertisement

2. Literature review

Gentry’s seminal work [20] realized the first FHE scheme, enabling arbitrary computations on encrypted data without compromising privacy. This new cryptographic capability has seen far-reaching implications, particularly in cloud-based machine and deep learning applications. Numerous studies have investigated the potential of FHE-based private machines and deep learning.

The exploration of machine learning models under FHE began with linear classifiers [23], followed by a series of works adapting other common models, including Convolutional Neural Networks (CNN) [12], SVMs [14, 15], logistic regression [17, 24, 25], and Transformers [26, 27] and many others. These advancements have paved the way for secure and private machine learning applications, ensuring the protection of sensitive data while maintaining computational capabilities.

Notably, the majority of research in this domain has concentrated on the inference phase of the machine or deep learning process, largely due to its relative simplicity and lower computational intensity compared to the training phase. However, a growing body of recent studies has begun to explore the feasibility of machine and deep learning training under FHE, yielding promising preliminary results. These investigations encompass various aspects, including logistic regression training [28], SVM training [29], CNNs training [30], shallow neural networks training [16], and transfer learning [31, 32], which paves the way for more comprehensive advancements in secure machine learning.

In this chapter, we focus on the evaluation of the inference phase of SVM models under FHE, presenting the reader with a comprehensive overview of cutting-edge tools and expert optimization techniques to facilitate this process. Our primary objective is to provide practitioners with the knowledge and expertise necessary to develop and implement their own secure SVM inference models, enabling them to protect their sensitive data while maintaining its utility.

Advertisement

3. Preliminaries

In this section, we provide an overview of the basic concepts and theoretical foundations that underpin our subsequent discussion, establishing a solid basis for the developments presented in this chapter.

3.1 Notations

We adopt the following notation conventions throughout this chapter: scalar quantities are denoted by lowercase letters, e.g., a; vector quantities are denoted by lowercase boldface letters, e.g., a; and matrix quantities are denoted by uppercase boldface letters, e.g., A. The dot-product between two vectors u and v is denoted by uv. The point-wise addition and multiplication between two vectors are denoted by and , respectively. We will be working with various sets of numbers such as the set of integers denoted as Z, the set of reals R, and the set of complex numbers C. Encrypted quantities are distinguished by an over-bar symbol, e.g., a¯ to denote an encryption of the scalar quantity a.

3.2 CKKS FHE scheme

The Cheon-Kim-Kim-Song (CKKS) scheme [33] is a FHE scheme ideally well-suited for handling encrypted real or complex quantities. Due to the wide use of real-number computations in machine learning algorithms, including SVMs, CKKS stands out as one of the most efficient FHE schemes for implementing machine learning algorithms.

Similar to other FHE schemes, CKKS operates under the honest-but-curious threat model, which assumes that all participants in a CKKS computation will follow the underlying protocol but may attempt to monitor and analyze the system to extract sensitive information. As long as participants adhere to the protocol, CKKS guarantees the confidentiality of secret keys and encrypted messages, provided that the participants’ computational power does not exceed a predefined security level. For instance, CKKS may offer a 128-bit security level, ensuring that any attempt to break the system, recover the secret key, or decrypt encrypted messages would require computational resources exceeding this threshold. The security level of CKKS is primarily determined by the selection of cryptographic parameters, which must be chosen carefully to ensure a secure instantiation of CKKS that meets a specified security requirement. The careful choice of parameters is crucial to maintain the confidentiality and integrity of encrypted data and prevent potential security breaches.

It is crucial to understand the concept of noise, also referred to as error, which is inherent in CKKS and numerous other FHE schemes. This noise is a fundamental component of the Learning With Errors (LWE) problem [34], the primary mathematical framework underlying CKKS and many other FHE schemes. The security of CKKS is predicated on the presence of this noise, which is introduced during the encryption process. As homomorphic computations are performed on encrypted data, the noise accumulates and must be managed through specific CKKS primitives. Fortunately, many of existing CKKS libraries automate this noise maintenance process and abstracts it from the user.

3.2.1 CKKS instantiation

The CKKS scheme is instantiated by specifying various user-defined parameters. Firstly, the multiplicative depth, denoted as d, must be selected, which determines the maximum number of sequential multiplications in the computation graph. It is essential to note that increasing the multiplicative depth can result in higher computational overhead, as CKKS internally manipulates larger mathematical objects. Consequently, minimizing the multiplicative depth is crucial to reduce the CKKS parameters, decrease computational overhead, and enhance the overall performance of the CKKS application.

Secondly, the user should specify the precision of the underlying CKKS computation by defining the scaling factor parameter, denoted as σ (measured in bits). This parameter controls the precision of the computation and is typically selected within the range of 45–60 bits in most existing CKKS implementations. By adjusting σ, users can trade-off between computational efficiency and precision, with higher values resulting in more accurate computations at the cost of increased computational overhead.

Thirdly, the user can specify the CKKS security parameter, denoted as λ (measured in bits), which determines the security level of the CKKS scheme. This parameter controls the computational overhead and overall performance of the CKKS application, with higher values of λ providing increased security at the cost of reduced performance. Typical values for λ include 128, 192, and 256 bits, which correspond to common security levels. It is essential to note that increasing λ will result in higher computational overhead, leading to decreased overall performance of the CKKS application.

Upon defining the aforementioned parameters, the user can instantiate the CKKS scheme, which entails calculating the encryption keys, comprising the public key pk and secret key sk, as well as the public evaluation keys, including the multiplication key evk, set of rotation keys rki=0r1, and bootstrapping key bk. The encryption keys are utilized for data encryption and decryption operations, whereas the evaluation keys are shared with the evaluator, who will be executing the CKKS computation. The multiplication key enables homomorphic multiplication, allowing the evaluator to multiply two encrypted quantities, while the rotation keys facilitate homomorphic rotation, enabling the rotation of an encrypted vector quantity by any integral rotation index i0r1. The bootstrapping key, on the other hand, is exclusively required for deep computational workloads that necessitate the refreshing of the noise in the encrypted quantities, ensuring the continued security and integrity of the computation.

3.2.2 Data encoding and encryption

CKKS provides two fundamental primitives for data encoding, defined as follows:

  • ENCODE: takes as input real or complex vector a either Rn or Cn and returns a plaintext object a that encodes the input vector.

  • DECODE: takes as input a plaintext object a and decodes it into the corresponding vector quantity a either Rn or Cn, enabling the retrieval of the original encoded data.

For data encryption and decryption, CKKS provides two main primitives as follows:

  • ENCRYPT: takes as input either sk or pk and a plaintext object a, and returns an encryption of a¯, a.k.a., ciphertext a¯.

  • DECRYPT: takes as input sk and a ciphertext a¯ and returns the corresponding plaintext object a.

After encoding and encrypting their sensitive data, the user can transmit it to an untrusted evaluator for homomorphic computation. Upon completing the computation, the evaluator returns the encrypted results to the user, who can then decrypt and decode them to obtain the desired insights.

3.2.3 Homomorphic data processing

The CKKS homomorphic encryption scheme offers the evaluator the capability to perform specific operations on encrypted data without requiring decryption. This section details the three primary homomorphic operations supported by CKKS:

  • Homomorphic Addition—EVALADDc¯1c¯2: performs homomorphic point-wise addition of the underlying encrypted vectors yielding ciphertext c¯add=EncryptionEncodingv1v2. Furthermore, CKKS inherently supports homomorphic subtraction using the same principle.

  • Homomorphic Multiplication—EVALMULc¯1c¯2evk: performs homomorphic point-wise multiplication of the underlying encrypted vectors yielding ciphertext c¯mul=EncryptionEncodingv1v2. The multiplication key evk is used for noise maintenance.

  • EvalRotate(c¯, i, ρ, rki): performs cyclic rotation of the encrypted message vector by an amount iZ+ in direction ρ0=left1=right, generating a ciphertext that encrypts the rotated vector. This operation disrupts the the ciphertext structure and the rotation key rki is used to restore the original structure.

It is important to note that both EVALADD and EVALMUL support a mixed-mode operation, where one input argument can be in plaintext while the output remains encrypted. Specifically, when using EVALMUL with one plaintext input argument, the computation is significantly faster compared to when both input arguments are encrypted.

3.3 SVM

SVMs are powerful supervised machine learning algorithms widely used for classification and regression tasks. While both classification and regression involve learning from a labeled dataset, this section will primarily focus on their application in classification problems. We begin by providing a brief overview of the training process employed to generate the SVM model parameters, followed by a detailed description of the SVM inference process. The latter is a computational procedure that is specifically implemented in CKKS, and it is this process that enables the homomorphic encryption and secure computation of SVM-based predictions.

3.3.1 SVM training

As mentioned previously, the presented system requires the server to have a pre-trained model. This model can be realized through various approaches, including the utilization of public domain data, which is freely available and accessible, which eliminates concerns regarding data privacy. Another common approach is the use of synthetic data, which is artificially generated to mimic real-world data, providing a viable substitute for the sensitive dataset. Furthermore, non-private segments of the dataset can be utilized, comprising data from participants who have explicitly consented to open-source their information, a common practice observed in genome studies where individuals willingly share their genetic data for research purposes (e.g., [35]). The resulting pre-trained model is used as an input to the privacy-preserving SVM inference protocol. The subsequent paragraphs describe formally the training process, which involves utilizing a subset of the evaluation dataset, designated as the training dataset, to train and fine-tune the SVM model’s performance and accuracy.

SVM training can be described as follows. Given an input dataset comprising m data points, each characterized by n features, which can be represented as a matrix XRm×n. In addition, a label vector y+11m is associated with the dataset, where each entry yi indicates the class to which data point Xi belongs. Given a chosen kernel function K, a SVM classifier can be trained to identify the optimal separating hyperplane between two classes.

Several widely utilized kernel functions exist for support vector machines, each serving distinct purposes. The identity function, which maps inputs to themselves, is primarily employed in linearly separable problems due to its simplicity. Conversely, for non-linearly separable problems, more complex kernel functions are employed. In such cases, the polynomial kernel, Radial Basis Function (RBF), also known as the Gaussian kernel, and the Sigmoid kernel, a smooth and continuously differentiable function, are commonly utilized to facilitate non-linear transformations and enhance the SVM model’s descriptive capacity.

The training process of a SVM model involves determining the optimal hyperplane that best separates data points into different classes. This is achieved by optimizing the model parameters to maximize the margin between the classes while minimizing the misclassification error. To achieve this, SVM employs quadratic programming and Lagrange multipliers. This mathematical formulation transforms the problem of finding the optimal hyperplane into a constrained optimization problem. The solution to this problem yields the model parameters, including the weights and bias of the hyperplane.

The training algorithm learns the following model parameters:

  • set of support vectors SV0l1 which is a subset of X.

  • the coefficients vector αRl which can be interpreted as weight factors assigned to the support vectors, influencing their contribution to the decision boundary.

  • the intercept (or bias) parameter bR.

We conclude our discussion on SVM training at this point, as our system assumes the availability of a trained SVM model’s parameters as input. Omitting the mathematical description of SVM training for the sake of brevity, we instead refer interested readers to consult dedicated SVM resources for a comprehensive understanding of the training process [36].

3.3.2 SVM inference

The SVM inference function is of paramount importance in our system. Specifically, the evaluator is tasked with evaluating a given SVM model on encrypted data points provided by the user. Consequently, the server performs an encrypted evaluation of the SVM inference function, which necessitates a detailed description of this function. The SVM inference function is responsible for computing the prediction output of the SVM model on the encrypted data points, and its secure evaluation is crucial for maintaining the confidentiality of the user’s data.

During SVM inference, our primary objective is to evaluate the decision function, as presented in Eq. (1). The kernel function K poses a challenge when it is chosen as RBF or Sigmoid. In contrast, a polynomial kernel function is inherently compatible with the CKKS computational model and does not present any difficulties. To address the challenges associated with RBF and Sigmoid kernel functions, we can employ polynomial approximations over a predefined range, which have demonstrated excellent performance in various studies [12, 13, 37, 38, 39].

cx=signi=0l1αiyiKxix+b,wherexiis theithsupport vector.E1

The decision function presents an additional challenge in computing the sign function, which is a non-arithmetic operation. To address this, we can draw on a similar strategy employed for handling the kernel function, approximating the sign function as a polynomial. Alternatively, the sign function can be computed on the client side upon returning the SVM insight. In our approach, we opt for the former method, approximating the sign function as a polynomial to facilitate its evaluation within the decision function. Notably, for SVM regression tasks, the sign function is not required, thereby eliminating this challenge altogether.

Advertisement

4. System implementation

This section outlines the implementation details of our system. We begin by introducing the dataset employed for evaluation purposes, followed by an overview of the tools utilized in our implementation.

4.1 Evaluation dataset

We evaluated our system using Dal’s Credit Card Dataset [40], a comprehensive collection of credit card transactions and user information. This dataset contains transactions made by European cardholders in September 2013, spanning two days, with a total of 284,807 transactions, of which 492 are fraudulent.

4.1.1 Data preprocessing

Due to privacy concerns, the dataset is not publicly available in its raw format. Instead, a preprocessed version is provided, which has undergone Principal Component Analysis (PCA) to extract the most informative features. The PCA transformation retains only the features that show a high correlation with the response variable, a class label indicating whether each transaction is fraudulent (Class 1) or legitimate (Class 0). This preprocessing step helps to reduce dimensionality and improve model performance.

It is in these particular situations where privacy-preserving machine learning analysis prove to be paramount. Traditional data preprocessing techniques can compromise data quality, potentially limiting the accuracy of the analysis. Furthermore, these techniques often lack cryptographic security, making it possible for unauthorized parties to invert the preprocessing and recover the original dataset [41]. To address this concern, it is essential to adopt cryptographically secure systems that can process encrypted raw data without sacrificing privacy.

4.1.2 Dataset characteristics

The dataset comprises only numerical data, making it suitable for machine learning-based fraud detection. The dataset is extremely unbalanced, with a fraud rate of approximately 0.17%. This imbalance is a common challenge in fraud detection, as fraudulent transactions are typically rare compared to legitimate ones. The dataset characteristics are summarized in Table 1. It comprises 28 predictor features, denoted as V1V28. These features represent the transformed variables resulting from the PCA preprocessing technique, with the exception of two original features:

  1. Time: The timestamp of each transaction, which was not subjected to PCA transformation and is retained in its original form.

  2. Amount: The transaction amount, which was also exempt from PCA preprocessing and is present in the dataset in its raw form.

AttributeValue
Number of Features28
Type of ClassificationBinary (1: fraud, 0:legitimate)
Number of Samples284,807
Number of Fraudulent Samples492
Number of Legitimate Samples284,315
Feature TypeNumerical
PCA Transformed Features26
Original Features2 (Time, Amount)

Table 1.

Dal’s credit card dataset characteristics [40].

The inclusion of these two original features, alongside the 26 transformed features, provides a comprehensive set of predictors for modeling and analysis.

4.2 Implementation

We used three key open-source tools to develop our privacy-preserving credit card fraud detection system. We utilized Python’s scikit-learn’s SVM package [36], which provides an efficient implementation of SVMs for both training and inference. We also employed the FHSVM framework [15], an FHE-based privacy-preserving SVM inference engine, enabling secure and private predictions on encrypted data. Lastly, we used the OpenFHE framework [42] which includes an implementation of CKKS that is required by FHSVM. Below, we describe this computational infrastructure in more detail and how they are utilized to build a privacy-preserving credit card fraud detection system.

4.2.1 Scikit-learn SVM

We employed the Python sci-kit-learn library’s SVM package to train an SVM model on the credit card dataset. The comprehensive training methodology, including data preprocessing, feature selection, model configuration, and extraction, is shown in Figure 2. Below, we detail each step in the training procedure.

Figure 2.

SVM training workflow.

4.2.1.1 Dataset processing

To prepare the dataset for model training and evaluation, we performed the following processing steps. The dataset was randomly partitioned into training and testing sets, adhering to a ratio of 80% for training and 20% for testing. This split ensures that the model is trained on a substantial portion of the data while reserving a sufficient amount for evaluation purposes. The random partitioning helps maintain the integrity of the dataset’s distribution, reducing the likelihood of bias in the training process. We also excluded samples with missing values to enhance the reliability and accuracy of our analysis.

4.2.1.2 Model training

Given the dataset’s inherent class imbalance, we adopted an undersampling approach to create a balanced training dataset. This involved maintaining the entirety of the fraudulent samples, while randomly sampling an equal number of legitimate transactions from the dataset. By doing so, we ensured that the model was trained on a dataset with an equal number of instances from both classes. It should be noted that undersampling was only applied to the training dataset. During model evaluation, the entire testing dataset is used without undersampling.

To address the critical nature of accurately detecting fraudulent transactions, we implemented a cost-sensitive approach by assigning higher penalties to false negatives than false positives. Specifically, we penalized missing a fraud more heavily by modifying the SVM’s class_weight parameter to 0:0.75, 1:0.25. This adjustment ensured that the model prioritized the accurate detection of fraudulent transactions.

Furthermore, we selected the Radial Basis Function (RBF) kernel for the SVM model due to its superior performance in capturing non-linear relationships between the predictor variables and the response variable. Given the complexity of the relationships between predictor and response variables in our dataset, the RBF kernel’s ability to model non-linear patterns made it the preferred choice for our SVM. Preliminary experiments with other kernels reinforced this decision. Detailed SVM model parameters are presented in Table 2.

ParameterValue
KernelRBF
Penalty C1.0
Gammascale
ProbabilityTrue
Number of Support Vectors114
Class weight0: 0.75, 1: 0.25
Decision function shapeOVR

Table 2.

SVM model summary.

4.2.1.3 Model evaluation

Following the training phase, we conducted a comprehensive evaluation of our model’s performance using a range of standard metrics. These metrics provide insights into the model’s ability to accurately detect fraudulent transactions while minimizing false positives. The evaluation metrics employed include:

  1. Precision: Measures the proportion of true positives (correctly identified fraudulent transactions) among all positive predictions made by the model.

  2. Recall: Quantifies the proportion of true positives among all actual fraudulent transactions in the dataset.

  3. Classification Accuracy: Calculates the overall proportion of correctly classified transactions (both fraudulent and legitimate).

  4. F1-score: Harmonizes precision and recall into a single metric, providing a balanced measure of the model’s performance.

  5. Area Under the Receiver Operating Characteristic Curve (ROC-AUC): Assesses the model’s ability to distinguish between fraudulent and legitimate transactions.

4.2.1.4 Exporting the model parameters

We have successfully developed a high-performing SVM model, capable of accurately classifying transactions as fraudulent or legitimate. To facilitate the secure evaluation of this model on encrypted data, we extract its key parameters, including support vectors, gamma value, dual coefficients, and intercept, and store them in plain text files. These extracted parameters will be utilized by the FHSVM module to enable the homomorphic evaluation of our SVM model on encrypted input samples.

4.2.2 FHSVM framework

The FHSVM framework enables the secure evaluation of an SVM model on encrypted input samples, utilizing the CKKS FHE scheme to compute the decision function in Eq. (1). This framework facilitates the classification of transactions without compromising data confidentiality.

As mentioned previously, the decision function involves basic arithmetic operations on real numbers, which are inherently supported by the CKKS scheme. However, the kernel and sign functions require special handling, as they are not supported by CKKS. To address this, the FHSVM framework substitutes these functions with polynomial approximations, ensuring accurate computations.

To implement the FHSVM framework, two critical parameters must be defined:

  1. The range over which the Radial Basis Function (RBF) and sign functions are evaluated, ensuring accurate polynomial approximations.

  2. The degree of the polynomial approximations required to accurately represent the RBF and sign functions.

It is important to note that increasing the degree of polynomials and expanding the evaluation range can lead to more accurate polynomial approximations of the RBF and sign functions. However, this enhancement comes at a cost, as it requires the use of larger CKKS parameters, which can significantly degrade performance. To achieve a balance between accuracy and efficiency, careful optimization of these parameters is necessary to tailor them to the specific requirements of the problem at hand. In our case, we have found that evaluating the RBF and sign functions over a range of −16 to 1, with a polynomial degree of 13, provides an optimal trade-off between accuracy and performance. Through rigorous experimentation, we have validated these parameters and present them below as the optimal values for our dataset.

4.2.3 OpenFHE

OpenFHE is a comprehensive, open-source FHE library that provides a robust implementation of various FHE schemes, including the CKKS scheme. As a versatile and efficient backend, OpenFHE enables the secure evaluation of complex mathematical operations on encrypted data. In the context of our system, OpenFHE plays a crucial role as the underlying FHE library for the FHSVM framework, facilitating the secure evaluation of CKKS operations.

As a result of our system requirements, we have carefully selected the following CKKS parameters to ensure optimal performance and security. Table 3 shows the cryptographic parameters used in instantiating CKKS in OpenFHE. The polynomial ring used for encryption is set to 216. The multiplicative depth is set to 13, indicating the maximum number of multiplications allowed during homomorphic evaluation. We have chosen a security level of 128-bit, providing robust protection against known attacks. The scale precision is set to 59-bit, ensuring accurate fixed-point arithmetic operations. As bootstrapping is not needed for our system, it is disabled to optimize performance, and the secret key distribution follows a ternary scheme. These carefully chosen parameters enable our system to achieve a balance between security, accuracy, and efficiency.

ParameterValue
Ring Dimension216
Multiplicative Depth d13
Desired Compute Precision σ59-bit
Security Level λ128-bit
BootstrappingDisabled
Secret Key DistributionTernary

Table 3.

CKKS parameters used in our privacy-preserving credit card fraud detection system.

Advertisement

5. Results

In this section, we present the results of our experiments evaluating the performance and security of our privacy-preserving SVM inference system using the CKKS FHE scheme. We first report the results of training our SVM model in the clear. We then present the results of our homomorphic inference experiments, including the accuracy and performance overhead of encrypting and decrypting our data. Finally, we evaluate the security of our system, verifying that our encryption parameters achieve the desired level of security and that our homomorphic inference process preserves the privacy of our input data.

5.1 Training results

To establish a baseline for our privacy-preserving SVM inference, we first trained our SVM model in Python clearly. This allowed us to evaluate the model’s performance on our dataset without any encryption overhead. We report the results of this training process, including the model’s accuracy, precision, recall, F1 score, and other relevant metrics.

Table 4 characterizes the classification quality of our SVM model when evaluated on the testing dataset. The model achieves a precision of 0.9988, recall of 0.9971, and accuracy of 0.9971, demonstrating its ability to accurately detect fraudulent transactions. Moreover, the F1-score of 0.9977 and AUC of 0.9834 further validate the model’s robustness. Notably, the balanced accuracy of 0.9283 highlights the model’s effectiveness in handling class imbalance. Overall, these metrics indicate that the SVM model succeeds in distinguishing between legitimate and fraudulent transactions.

MetricValue
Precision0.9988
Recall0.9971
Accuracy0.9971
Balanced Accuracy0.9283
F1-Score0.9977
Area Under the Curve (AUC)0.9834

Table 4.

Characterizing the SVM model performance.

5.2 Homomorphic inference results

With our SVM model trained, we then proceeded to evaluate its performance under homomorphic encryption using OpenFHE’s CKKS implementation. This involved encrypting our input data, performing inference on the encrypted data, and then decrypting the results. We report the accuracy and performance overhead of this homomorphic inference process, comparing it to the clear-text inference results.

Table 5 characterizes the performance of the SVM model’s homomorphic inference. The most time-consuming operation is the SVM decision function evaluation, taking approximately 5.21 seconds, which is roughly 86% of the total inference time. This suggests that optimizing the decision function evaluation could significantly improve overall performance. In contrast, sample encryption and decryption are relatively fast, taking less than 0.2 seconds combined. The sign function evaluation takes around 0.82 seconds, which is a reasonable overhead considering its importance in the inference process. The computation precision of up to 4 decimal digits (13 bits) indicates a good balance between accuracy and efficiency.

OperationTime (sec)
Sample Encryption0.146
SVM Decision Function Evaluation5.214
Sign Function Evaluation0.816
SVM Inference Total Evaluation Time6.030
Sample Decryption0.054
Computation PrecisionUp to 4 decimal digits (1˜3 bits)

Table 5.

Characterizing the SVM model homomorphic inference performance.

Potential performance optimizations can significantly enhance the efficiency of our privacy-preserving SVM inference solution. One approach is to offload the sign function computation to the client, who can process it in the clear. By doing so, we can drop approximately 0.816 seconds from the overall computation time. Furthermore, it’s essential to recognize that in SVM regression tasks, the sign function evaluation is not required and can be entirely eliminated.

We emphasize here that encryption imposes a substantial performance overhead on private SVM inference, significantly amplifying runtime by four to five orders of magnitude. While plaintext SVM inference completes in mere milliseconds, the equivalent encrypted operation demands several seconds of computation. This dramatic slowdown is attributable to the inherent complexities of operating within the encrypted domain using FHE. Furthermore, the approximate nature of CKKS computations results in a notable reduction in computational precision to 13 bits, a stark contrast to the double-precision standard prevalent in plaintext calculations (52 bits). Despite this limitation, we verified that 13-bit precision is adequate for our specific use case and potentially applicable to a broader range of scenarios.

5.3 Security evaluation

Our FHE-based privacy-preserving fraud detection system assumes a semi-honest threat model. The semi-honest threat model, also known as the “honest-but-curious” model, is a security framework commonly assumed in FHE protocols. In this model, both the server and client are assumed to follow the protocol correctly, executing the agreed-upon steps without deviations. However, they are also curious and may attempt to infer additional information from the data they receive or observe during protocol execution.

Crucially, the semi-honest model assumes that the server and client do not collude with each other and do not actively attempt to cheat or manipulate the protocol. Their malicious behavior is limited to learning more information than is explicitly provided by the protocol.

Thus, the primary security goal of FHE under the semi-honest model is to prevent the server from gaining any significant knowledge about the data beyond what is intentionally revealed by the protocol. This implies that even if the server attempts to analyze or intercept the encrypted data, it should be unable to deduce sensitive information. By operating within the semi-honest threat model, FHE offers a robust security guarantee, ensuring that even curious parties cannot compromise data confidentiality.

To ensure the security and privacy of our homomorphic inference process, we conducted a thorough security evaluation. This involved analyzing the encryption parameters used in OpenFHE’s CKKS implementation to verify that they achieve the desired security level (128-bit) against known attacks. We also confirmed that the homomorphic inference process preserves the privacy of the input data, as demonstrated by the inability to recover original data from ciphertext.

It is crucial to acknowledge that current FHE schemes lack robust security guarantees, specifically Chosen-Ciphertext Attack 2 (CCA2) security and verifiable computation. This vulnerability allows a malicious server or interceptor to manipulate ciphertexts by performing computations on them, potentially disrupting the service by providing incorrect results. Furthermore, FHE inherently lacks verifiable computation security, leaving the client unable to verify whether the server has performed the intended computation. Hence, FHE operates ideally under the honest-but-curious threat model, relying on legal bindings or incentives to enforce honest behavior among participating parties.

In addition, FHE schemes are susceptible to side-channel attacks if not implemented correctly, as demonstrated in recent studies [43, 44]. These findings highlight the need for enhanced security measures and rigorous implementation to ensure the integrity of FHE-based systems.

5.4 Discussion

Our results demonstrate the feasibility of privacy-preserving SVM inference via FHE but also highlight the trade-offs between accuracy, performance, and security. In this discussion, we summarize our key findings, explore the implications of our results for practical applications, and identify avenues for future research to further improve the performance and security of homomorphic inference.

5.4.1 Predictive performance

The SVM model exhibits remarkable predictive performance, achieving high accuracy and precision in detecting fraudulent transactions. In addition, the model demonstrates robustness across different evaluation metrics, including F1-score, AUC, and balanced accuracy.

It is important to note that we view SVM training and SVM inference as essentially orthogonal processes. This independence means that any optimizations or improvements made to the training process, aimed at enhancing the classifier’s performance, will not impact the homomorphic SVM inference functionality. Advancements in training techniques, such as kernel selection, regularization, or data preprocessing, can be explored and implemented independently without affecting the encrypted inference pipeline. This decoupling of training and inference enables flexibility and future-proofing, allowing for continuous improvements in classification accuracy without compromising the security and privacy benefits of homomorphic inference.

5.4.2 The importance of privacy-preserving MLaaS

A significant challenge in our research was the unavailability of raw data due to stringent privacy regulations. Sensitive data owners often preprocess data using techniques such as PCA before sharing it, which can potentially compromise data integrity and limit the effectiveness of machine learning models. While PCA can reduce dimensionality, it also inherently discards information which impacts the model performance. Furthermore, the vulnerability of PCA-processed data to inversion attacks raises serious concerns about data privacy [41]. This highlights the importance of cryptography-based privacy-preserving technologies such as FHE which facilitates data analysis on encrypted raw data. This not only safeguards sensitive information but also enables comprehensive data utilization providing valuable insights without compromising privacy.

5.4.3 Scalability

We tested the SVM inference on a commodity laptop, which provided valuable insights into the model’s performance under typical computing conditions achieving 6 seconds overall inference latency. However, it is likely that utilizing a high-end server with advanced processing capabilities, increased memory, and optimized hardware configurations could significantly enhance the performance of the SVM inference. Such a setup could potentially reduce the computation time, especially for the most time-consuming operation, the SVM decision function evaluation, and improve the overall efficiency of the homomorphic inference process. Moreover, a high-end server could also enable the processing of multiple SVM queries, larger datasets, more complex models, and increased precision levels, making it an attractive option for applications requiring high-performance secure computations.

Advertisement

6. Conclusions

In conclusion, this paper has demonstrated the feasibility and effectiveness of implementing privacy-preserving SVM inference using the CKKS FHE scheme. We successfully enabled SVM inference on encrypted data samples, ensuring the confidentiality of sensitive information even when processed on untrusted servers. Our comprehensive evaluation on a large-scale credit card fraud detection dataset showcased remarkable predictive accuracy and reasonable inference latency of nearly 6 seconds, highlighting the practicality of our approach.

The significance of this work lies in its ability to bridge the gap between machine learning and privacy-preserving computing, enabling organizations to harness the power of SVM inference without compromising data privacy. As the demand for secure computing solutions continues to grow, our implementation serves as a vital step toward realizing the full potential of FHE in real-world applications.

Future research directions may focus on optimizing the performance of homomorphic SVM inference, exploring alternative FHE schemes, and expanding the scope of privacy-preserving machine learning applications.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Cortes C, Vapnik V. Support-vector networks. In: Machine Learning. Vol. 20. Boston: Kluwer Academic Publishers; 1995. pp. 273-297
  2. 2. Suthaharan S, Suthaharan S. Support vector machine. In: Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. New York, NY: Springer; 2016. pp. 207-235
  3. 3. Pisner DA, Schnyer DM. Support vector machine. In: Machine Learning. Amsterdam, Netherlands: Elsevier; 2020. pp. 101-121
  4. 4. Pavlidis P, Wapinski I, Noble WS. Support vector machine classification on the web. Bioinformatics. 2004;20(4):586-587
  5. 5. Battineni G, Chintalapudi N, Amenta F. Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (svm). Informatics in Medicine Unlocked. 2019;16:100200
  6. 6. Altan A, Karasu S. The effect of kernel values in support vector machine to forecasting performance of financial time series. The Journal of Cognitive Systems. 2019;4(1):17-21
  7. 7. Alanazi A. Using machine learning for healthcare challenges and opportunities. Informatics in Medicine Unlocked. 2022;30:100924
  8. 8. Singh G, Gupta R, Rastogi A, Chandel MDS, Ahmad R. A machine learning approach for detection of fraud based on svm. International Journal of Scientific Engineering and Technology. 2012;1(3):192-196
  9. 9. Jha J, Ragha L. Intrusion detection system using support vector machine. International Journal of Applied Information Systems (IJAIS). 2013;3:25-30
  10. 10. Voigt P, Von dem Bussche A. The eu general data protection regulation (gdpr). In: A Practical Guide. 1st ed. Vol. 10, No. 3152676. Cham: Springer International Publishing; 2017. pp. 10-5555
  11. 11. Domingo-Ferrer J, Farras O, Ribes-González J, Sánchez D. Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges. Computer Communications. 2019;140:38-60
  12. 12. Gilad-Bachrach R, Dowlin N, Laine K, Lauter K, Naehrig M, Wernsing J. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning. United States: PMLR; 2016. pp. 201-210
  13. 13. Al Badawi A, Jin C, Lin J, Mun CF, Jie SJ, Tan BHM, et al. Towards the alexnet moment for homomorphic encryption: Hcnn, the first homomorphic cnn on encrypted data with gpus. IEEE Transactions on Emerging Topics in Computing. 2020;9(3):1330-1343
  14. 14. Bajard J-C, Martins P, Sousa L, Zucca V. Improving the efficiency of svm classification with fhe. IEEE Transactions on Information Forensics and Security. 2019;15:1709-1722
  15. 15. Al Badawi A, Chen L, Vig S. Fast homomorphic svm inference on encrypted data. Neural Computing and Applications. 2022;34(18):15555-15573
  16. 16. Al Badawi A, Hoang L, Mun CF, Laine K, Aung KMM. Privft: Private and fast text classification with homomorphic encryption. IEEE Access. 2020;8:226544-226556
  17. 17. Chan FM, Al Badawi AQA, Sim JJ, Tan BHM, Sheng FC, Aung KMM. Genotype imputation with homomorphic encryption. In: Proceedings of the 6th International Conference on Biomedical Signal and Image Processing. New York, NY, USA: Association for Computing Machinery; 2021. pp. 9-13
  18. 18. Blatt M, Gusev A, Polyakov Y, Goldwasser S. Secure large-scale genome-wide association studies using homomorphic encryption. Proceedings of the National Academy of Sciences. 2020;117(21):11608-11613
  19. 19. Geva R, Gusev A, Polyakov Y, Liram L, Rosolio O, Alexandru A, et al. Collaborative privacy-preserving analysis of oncological data using multiparty homomorphic encryption. Proceedings of the National Academy of Sciences. 2023;120(33):e2304415120
  20. 20. Gentry C. Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. New York, NY, USA: Association for Computing Machinery; 2009. pp. 169-178
  21. 21. Hesamifard E, Takabi H, Ghasemi M, Wright RN. Privacy-preserving machine learning as a service. Proceedings on Privacy Enhancing Technologies. 2018;(3):123-142
  22. 22. Tanuwidjaja HC, Choi R, Baek S, Kim K. Privacy-preserving deep learning on machine learning as a service—A comprehensive survey. IEEE Access. 2020;8:167425-167447. DOI: 10.1109/ACCESS.2020.3023084
  23. 23. Graepel T, Lauter K, Naehrig M. Ml confidential: Machine learning on encrypted data. In: International Conference on Information Security and Cryptology. New York City, United States: Springer; 2012. pp. 1-21
  24. 24. Kim M, Song Y, Wang S, Xia Y, Jiang X, et al. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR Medical Informatics. 2018;6(2):e8805
  25. 25. Blatt M, Gusev A, Polyakov Y, Rohloff K, Vaikuntanathan V. Optimized homomorphic encryption solution for secure genome-wide association studies. BMC Medical Genomics. 2020;13:1-13
  26. 26. Rovida L, Leporati A. Transformer-based language models and homomorphic encryption: An intersection with bert-tiny. In: Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics. 2024. pp. 3-13
  27. 27. Chen T, Bao H, Huang S, Dong L, Jiao B, Jiang D, et al. The-x: Privacy-Preserving Transformer Inference with Homomorphic Encryption. arXiv preprint arXiv:2206.00216. 2022
  28. 28. Bergamaschi F, Halevi S, Halevi TT, Hunt H. Homomorphic training of 30,000 logistic regression models. In: Applied Cryptography and Network Security: 17th International Conference, ACNS 2019, Bogota, Colombia, June 5–7, 2019, Proceedings 17. Springer; 2019. pp. 592-611
  29. 29. Park S, Byun J, Lee J, Cheon JH, Lee J. He-friendly algorithm for privacy-preserving svm training. IEEE Access. 2020;8:57414-57425
  30. 30. Nandakumar K, Ratha N, Pankanti S, Halevi S. Towards deep neural network training on encrypted data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York City, U.S.: Institute of Electrical and Electronics Engineers (IEEE); 2019
  31. 31. Paul J, Annamalai MSMS, Ming W, Al Badawi A, Veeravalli B, Aung KMM. Privacy-preserving collective learning with homomorphic encryption. IEEE Access. 2021;9:132084-132096
  32. 32. Lee S, Lee G, Kim JW, Shin J, Lee M-K. Hetal: Efficient privacy-preserving transfer learning with homomorphic encryption. In: International Conference on Machine Learning. United States: PMLR; 2023. pp. 19010-19035
  33. 33. Cheon JH, Kim A, Kim M, Song Y. Homomorphic encryption for arithmetic of approximate numbers. In: Advances in Cryptology—ASIACRYPT 2017. Cham: Springer International Publishing; 2017. pp. 409-437. ISBN 978-3-319-70694-8
  34. 34. Regev O. On lattices, learning with errors, random linear codes, and cryptography. In: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, STOC ‘05. New York, NY, USA: Association for Computing Machinery. ISBN 1581139608; 2005. pp. 84-93. DOI: 10.1145/1060590.1060603
  35. 35. Haeusermann T, Greshake B, Blasimme A, Irdam D, Richards M, Vayena E. Open sharing of genomic data: Who does it and why? PLoS One. 2017;12(5):e0177158
  36. 36. Chang C-C, Lin C-J. Libsvm: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST). 2011;2(3):1-27
  37. 37. Lee J-W, Kang HC, Lee Y, Choi W, Eom J, Deryabin M, et al. Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access. 2022;10:30039-30054
  38. 38. Brand M, Pradel G. Practical privacy-preserving machine learning using fully homomorphic encryption. Cryptology ePrint Archive. 2023
  39. 39. Hong S, Park JH, Cho W, Choe H, Cheon JH. Secure tumor classification by shallow neural network using homomorphic encryption. BMC Genomics. 2022;23(1):284
  40. 40. Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems. 2017;29(8):3784-3797
  41. 41. Kwatra S, Torra V. Data reconstruction attack against principal component analysis. In: International Symposium on Security and Privacy in Social Networks and Big Data. New York City, United States: Springer; 2023. pp. 79-92
  42. 42. Al Badawi A, Bates J, Bergamaschi F, Cousins DB, Erabelli S, Genise N, et al. Openfhe: Open-source fully homomorphic encryption library. In: Proceedings of the 10th Workshop on Encrypted Computing & Applied Homomorphic Cryptography. 2022. pp. 53-63
  43. 43. Aydin F, Karabulut E, Potluri S, Alkim E, Aysu A. Reveal: Single-trace side-channel leakage of the seal homomorphic encryption library. In: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). New York City, United States: IEEE; 2022. pp. 1527-1532
  44. 44. Aydin F, Aysu A. Exposing side-channel leakage of seal homomorphic encryption library. In: Proceedings of the 2022 Workshop on Attacks and Solutions in Hardware Security. 2022. pp. 95-100

Written By

Ahmad Al Badawi

Submitted: 29 July 2024 Reviewed: 02 August 2024 Published: 04 September 2024