Abstract
The advent of deep learning has brought about remarkable advancements in various fields, including computer vision, natural language processing, and reinforcement learning. However, the vulnerability of deep neural networks to adversarial examples has raised significant concerns regarding their robustness and reliability. Adversarial examples are carefully crafted inputs that are imperceptibly perturbed to cause misclassification or incorrect behavior of machine learning models. While extensive research has been conducted to understand and mitigate this vulnerability, a relatively novel perspective has emerged—reversible adversarial examples. In this chapter, we delve into the concept of reversible adversarial examples, exploring their characteristics and generation methods. We review existing literature on reversible adversarial examples, highlighting their significance in safeguarding privacy. Moreover, we introduce potential applications of reversible adversarial examples and discuss future directions for this new research field.
Keywords
- adversarial example
- reversible data hiding
- deep neural network
- privacy protection
- artificial intelligence
1. Introduction
Deep learning models have demonstrated exceptional capabilities across various domains such as image recognition [1], natural language processing [2], and so on. However, the susceptibility of these models to adversarial examples poses a significant challenge to their reliability and security. Adversarial examples, which add carefully crafted perturbations to input data, can lead to misclassification or incorrect behavior of machine learning models, even with imperceptible changes to human observers. As a result, ensuring robustness against adversarial attacks has become a crucial area of research in machine learning security.
In recent years, a novel approach to understanding and mitigating adversarial vulnerabilities has emerged through the exploration of reversible adversarial examples (RAE) [3]. These examples are crafted with the specific goal of being reversible, meaning that the original input can be recovered from the adversarially perturbed version, leading to correct model predictions.
In this chapter, we delve into the realm of reversible adversarial examples, aiming to provide a comprehensive overview of this emerging field. We begin by introducing the concept of reversible adversarial examples and elucidating their distinguishing characteristics compared to traditional adversarial examples. Building upon this foundation, we review recent advancements in the generation methods of reversible adversarial examples, including white-box attacks and black-box attacks. Furthermore, we explore the applications and implications of reversible adversarial examples.
Moreover, we investigate multiple prominent white-box attack strategies for crafting reversible adversarial examples. These methods leverage various techniques such as perturbation generation, reversible data hiding, and denoising to achieve reversibility while maintaining adversarial potency. Additionally, we explore several black-box attack approaches. These methods aim to generate reversible adversarial examples without accessing the model’s internal parameters or architecture, thereby simulating real-world scenarios where limited information about the target model is available.
By analyzing and comparing these diverse approaches, we gain insights into the capabilities, limitations, and trade-offs associated with reversible adversarial examples. Furthermore, we discuss possible applications and future directions in RAE research, including developing more sophisticated attack algorithms, improving the adversarial transferability, and investigating the practical implications of RAEs in real-world applications. Reversible adversarial examples will play a crucial role in scenarios such as privacy protection, access control, and model authorization. Research in the field of reversible adversarial example will also drive advancements in the broader adversarial machine learning community.
2. Related work
2.1 Adversarial example
The exploration of adversarial examples has been a prominent area of research in the field of machine learning security. Adversarial examples, which are carefully crafted inputs designed to deceive machine learning models, have raised concerns about the robustness and reliability of these models. Several methods have been proposed to generate adversarial examples, both in the context of white-box attacks, where attackers have full access to model parameters, and black-box attacks, where attackers have limited knowledge about the target model.
In the domain of white-box attacks, various algorithms have been developed to generate adversarial examples effectively. The fast gradient sign method (FGSM) introduced by Goodfellow et al. [4] computes the perturbation by taking a small step in the direction of the gradient of the loss function with respect to the input image. Iterative approaches, such as the basic iterative method (BIM) [5] and projected gradient descent (PGD) [6], iteratively perturb the input image to maximize the model’s loss, resulting in stronger adversarial attacks.
On the other hand, black-box attacks operate under more constrained settings, where attackers have no or limited access to the target model. There are two primary categories of black-box attacks: query-based attacks and transfer-based attacks. Query-based attacks involve iteratively querying the victim model to gather gradient information, which is then used to optimize the input to produce adversarial examples. Transfer-based attacks leverage the effectiveness of adversarial examples generated on surrogate models to deceive the victim model. Several techniques have been proposed to enhance the transferability of adversarial examples. These include employing advanced optimization algorithms, applying input transformations, and utilizing ensemble-model attacks. By refining these strategies, attackers can achieve higher success rates in deceiving diverse models with adversarial examples, posing significant challenges for robust model defenses.
The study of adversarial examples is crucial for understanding the vulnerabilities of machine learning models and developing robust defenses against potential attacks.
2.2 Reversible data hiding
Reversible data hiding (RDH) is a unique form of data hiding that allows for the recovery of the original image without any distortion from the marked image while also enabling the extraction of embedded hidden data. RDH algorithms are typically categorized into three main groups: compression embedding [7], difference expansion [8], and histogram shift [9]. Each category of RDH algorithms offers distinct approaches to achieve reversible embedding while preserving data integrity.
Compression embedding methods focus on exploiting the redundancy in the image to embed additional data without causing irreversible changes. These techniques often leverage lossless compression algorithms to compress the image data and then embed additional information into the compressed domain. Upon extraction, the original image can be fully recovered without any loss of information.
Difference expansion techniques operate by modifying the difference values between neighboring pixel intensities to embed hidden data. By carefully adjusting the differences, data can be embedded in a reversible manner, allowing for accurate extraction without distortion of the original image.
Histogram shift methods manipulate the histogram of the image to embed data. By shifting the histogram bins within certain bounds, additional information can be embedded without causing irreversible changes to the image. This allows for the extraction of hidden data, while ensuring the recovery of the original image remains distortion-free.
The performance of reversible data hiding methods is often evaluated based on metrics such as embedding capacity, distortion introduced to the cover signal, and extraction efficiency. Higher embedding capacity allows for more data to be hidden within the cover signal, while minimizing distortion ensures that the original signal can be accurately reconstructed. Extraction efficiency measures the accuracy and reliability of recovering the hidden data from the stego signal.
Overall, reversible data hiding techniques provide a valuable means of embedding additional information into digital images while preserving their integrity and ensuring the reversible extraction of hidden data. These methods find applications in various domains, including image authentication, watermarking, and data hiding for secure communication. Research in reversible data hiding continues to explore new techniques and applications, aiming to strike a balance between embedding capacity, distortion, and reversibility to meet the diverse requirements of different applications and scenarios.
3. Generation methods for reversible adversarial example
3.1 White-box reversible adversarial example
3.1.1 Post smoothing and in-the-loop smoothing method
Liu et al. [3] proposed the white-box reversible adversarial example algorithms by combining adversarial attacks with the RDH algorithm. The overall framework is illustrated in Figure 1. They proposed two RAE methods: post smoothing method and in-the-loop smoothing method. A straightforward approach to achieve reversible adversarial examples involves embedding adversarial perturbations into the adversarial image using a reversible data hiding scheme, enabling the receiver to invalidate the adversarial perturbation. However, it is important to note that RDH is primarily designed for embedding a short length of information into a large image, making it less suitable for this task. To address this limitation, they propose a solution where the images are divided into super-pixels. Then, they embed the adversarial perturbation generated for these super-pixels. The general process of RAE is described in Algorithm 1.
![](/media/chapter/a043Y00000yGSwdQAG/a09Tc0000004r5lIAA/media/F1.png)
Figure 1.
Generation process of reversible adversarial example [
1: Generate an adversarial example
2: Compress adversarial perturbation
3: Embed
Let
The post-smoothing method represents the most straightforward approach to generate adversarial examples over super-pixels. Initially, an adversarial example is generated using any arbitrary method. Denoting
Subsequently, the adversarial example generated using the post-smoothing method is obtained as
The disadvantage of the post-smoothing method is that the adversarial perturbation is smoothed after completing the optimization process, which decreases the attack ability of the generated RAE. To mitigate this issue, they propose the in-the-loop smoothing super-pixel adversarial attack method, which, although requires more computation overhead, is expected to have a less impact on the attack ability. They take the basic iterative method (BIM) [10] as an example to describe how to generate the super-pixel adversarial perturbation and propose the in-the-loop smoothing version of BIM. BIM adds adversarial perturbations by iteratively updating the original image
where
where
Liu et al. [3] introduced the concept of RAE and presented the first prototype framework to verify its feasibility. The proposed method integrates adversarial examples, reversible data hiding, and encryption to achieve RAE. Moreover, RAE can be viewed as a form of encryption for computer vision as the reversibility of RAE ensures the decryption of this type of encryption.
3.1.2 Reversible adversarial example based on reversible data hiding in YUV Colorspace
Yin et al. [3] proposed a reversible adversarial example scheme where the adversarial perturbation is embedded in the UV channels using the reversible data hiding (RDH) technique. Specifically, the prediction error extension embedding algorithm [11] is utilized to embed the perturbation. This algorithm leverages the correlation between image pixels.
Initially, the adversarial component in the Y channel is obtained by the following equation:
In addition, the class activation mapping (CAM) [12] technique is utilized to narrow down the region of adversarial perturbation. Next, the adversarial distortion in Y channel is embedded into the UV channels using RDH. Finally, images are converted from YUV to RGB color space in the following equation:
This process is iteratively repeated until the victim model is deceived by the generated reversible adversarial example.
RDH algorithm [11] guarantees the exact recovery of the original images. First of all, convert the reversible adversarial examples from RGB to YUV color space. Next, extract the perturbation from the UV channels by the RDH algorithm [11] and mitigate the perturbation in Y channel. Finally, convert the images from YUV to RGB color space to recover the original images.
This reversible adversarial example scheme can also achieve the exact recovery of the original images from reversible adversarial example, which ensures further applications of the images for the receiver end. Moreover, this method embeds the information in the chrominance channels and introduces adversarial perturbations in the luminance channel, which decreases the influence of the embedded information on the attack ability of the RAE.
3.1.3 Reversible adversarial example based on local visible adversarial perturbation
Yin et al. [13] proposed a RAE scheme based on local visible adversarial perturbation. In the process of generating adversarial examples, they adopt AdvPatch [14] in their method. Rao et al. [15] suggest that the placement of the patch within the image significantly impacts the effectiveness of the attack. Thus, they employ basin hopping evolution (BHE) [16] to determine the position of the patch within the image. BHE combines basin hopping and evolutionary techniques, utilizing multiple starting points and crossover operations to maintain solution diversity, thereby facilitating the attainment of the global optimum. They initialize the population and commence the iterative process. In each iteration, the basin hopping algorithm is employed to generate a series of improved solutions. Subsequently, crossover and selection operations are conducted to choose the next generation of the population.
In the process of generating reversible adversarial examples, the segment of the original image obscured by the adversarial patch is treated as the secret image and is embedded into the adversarial examples using RDH. They compress the secret image and convert into binary to reduce the amount of embedded information. Then, they adopt prediction error extension [11] to embed data. The embedding process mainly includes two steps. First, they calculate the prediction error using the pixel value
The predictor predicts the pixel value by considering the neighborhood of a given pixel, leveraging the inherent correlation within the pixel neighborhood. Second, the prediction error is calculated as:
where
In the process of recovering original images, they first extract auxiliary information and image data. Then, they decompress the extracted data and recover the original image with no distortion by the auxiliary information.
3.1.4 Reversible adversarial example based on diffusion model
Xing et al. [17] proposed a RAE scheme based on diffusion model. First of all, they define a backdoor trigger, which biases Gaussian distribution in biased Gaussian distribution (BGD) diffusion process. Thus, denoising diffusion probabilistic model (DDPM) is trained on a biased Gaussian distribution (BGD). The standard generative process
where
Given
where
3.2 Black-box reversible adversarial example
3.2.1 B-RAE method
Xiong et al. [18] proposed a black-box reversible adversarial example scheme (B-RAE). This scheme comprises three components: perturbation generative network (PGN) training, reversible adversarial example (RAE) generation, and original image recovery.
Perturbation generative network is trained to generate robust black-box adversarial perturbations. To enhance the resemblance between the adversarial image and the original image, they employ the discriminator to impose constraints on PGN, ensuring the generation of small and precise noise. The noise layer is designed to simulate typical image processing operations, aimed at enhancing the robustness of adversarial example. The noise robustness in the PGN training process needs to be addressed as RDH unavoidably introduces noise in the image. By incorporating a noise layer, the adversarial example becomes less sensitive to minor noise, thereby decreasing the impact of information embedding. To augment the significance of the perturbation sign on attack ability, they devise the perturbation loss
where
where
In addition, they utilize the ensemble strategy [20] to enhance the transferability of adversarial examples. The classification loss is formulated as follows:
where
where
After the perturbation generative network generates the adversarial perturbation, they employ pre-processing operation and lossless compression to compress the generated adversarial perturbation, thereby reducing information size. Then, the RAE is obtained by embedding the compressed data into the preliminary adversarial example using the RDH technique.
To recover the original image, they first utilize the inverse process of the RDH algorithm applied to extract the embedded data from RAE, thereby restoring the original adversarial example before embedding information. The extracted data comprise the compressed binary data required for restoring the original image. Then, they recover the adversarial perturbation by replicating the decompressed data from a single channel to the other two channels. Finally, they restore the original image by mitigating the adversarial perturbation from the preliminary adversarial example.
3.2.2 Reversible adversarial example based on flipping transformation
Fang et al. [21] proposed a black-box RAE scheme based on flipping transformation. First of all, they adopt CAM [22] technique to obtain the attention map. Inspired by Yang et al. [23], they incorporate flipping transformation in the process of generating adversarial examples to enhance the adversarial transferability. In this approach, the input image is randomly flipped with a probability
where
4. Applications of reversible adversarial example
4.1 Privacy protection
More and more users would like to share their personal images on social network software. However, malicious commercial companies can utilize deep models to collect user data and obtain personal information. By employing RAE, users can ensure the legitimate utilization of shared data by authorized parties and prevent unauthorized access by illegitimate parties, thereby protecting their privacy.
4.2 Dataset access control
On the internet, there exist numerous commercial image datasets that have been meticulously collected through substantial human effort. RAE scheme can be employed to safeguard such datasets. The RAE image datasets are designed to evade the recognition by AI models, thereby ensuring the protection of access to the original image datasets.
4.3 Model authorization
There are many applications based on AI models on the market, while the quality of these models is not guaranteed. The market needs to authenticate the AI models that meet the application requirements and only allow the authorized models to be published on the market. RAE scheme can be applied to this model authorization application. Leveraging a certain amount of reversible adversarial examples, authorized models can correctly recognize the images, while unauthorized models will misclassify the images. Thus, we can identify authorized models according to the classification accuracy rate.
5. Conclusion and future directions
In this chapter, we have delved into the fascinating realm of reversible adversarial examples and explored multiple distinct methods for generating these adversarial examples. Through our exploration of both white-box and black-box attack methods, we have witnessed the effectiveness of RAE scheme against deep models. Each method introduced in this chapter offers unique insights and techniques for crafting reversible adversarial examples. Whether leveraging adversarial perturbation, reversible data hiding, or innovative transformation strategies, these methods highlight the ingenuity and creativity in the adversarial attack landscape.
Looking ahead, the field of RAE holds promise for further exploration and innovation. Future research efforts may focus on developing more sophisticated attack methods, enhancing the transferability of adversarial examples, and exploring the practical implications of RAE in real-world applications. Additionally, continued efforts in adversarial defense and ethical guidelines will be essential to ensure the responsible use and deployment of AI technologies in society.
Our journey into the realm of reversible adversarial examples has provided valuable insights and perspectives, paving the way for continued exploration and advancement in this captivating field.
Acknowledgments
The author acknowledges the use of AI tools for language polishing of the manuscript.
References
- 1.
Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society; 2020. pp. 10076-10085 - 2.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics; 2020. 38-45 p - 3.
Liu J, Zhang W, Fukuchi K, Akimoto Y, Sakuma J. Unauthorized ai cannot recognize me: Reversible adversarial example. Pattern Recognition. 2023; 134 :109048 - 4.
Goodfellow I, J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. arXiv. 2014 - 5.
Kurakin A, Goodfellow I, Bengio S. Adversarial examples in the physical world. In: International Conference on Learning Representations. Chapman and Hall/CRC; 2017 - 6.
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations. 2018 - 7.
Fridrich J, Goljan M, Rui D. Lossless data embedding for all image formats. Electronic Imaging. 2002; 4675 :572-583 - 8.
Tian J. Reversible data embedding using a difference expansion. IEEE Transactions on Circuits and Systems for Video Technology. 2003; 13 (8):890-896 - 9.
Ni Z, Shi Y, Ansari N, Wei S. Reversible data hiding. IEEE Transactions on Circuits and Systems for Video Technology. 2006; 16 (3):354-362 - 10.
Kurakin A, Goodfellow I, Bengio S. Adversarial machine learning at scale. arXiv. 2016 - 11.
Thodi DM, Rodríguez JJ. Expansion embedding techniques for reversible watermarking. IEEE Transactions on Image Processing. 2007; 16 (3):721-730 - 12.
Selvaraju R, R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision. IEEE Computer Society; 2017. pp. 618-626 - 13.
Chen L, Zhu S, Andrew A, Yin Z. Reversible attack based on local visible adversarial perturbation. Multimedia Tools and Applications. 2024; 83 (4):11215-11227 - 14.
Brown T, B, Mané D, Roy A, Abadi M, Gilmer J. Adversarial patch. arXiv. 2017 - 15.
Rao S, Stutz D, Schiele B. Adversarial training against location-optimized adversarial patches. In: European Conference on Computer Vision. Springer; 2020. pp. 429-448 - 16.
Jia X, Wei X, Cao X, Han X. Adv-watermark: A novel watermark perturbation for adversarial examples. In: Proceedings of the 28th ACM International Conference on Multimedia. 2020. pp. 1579-1587 - 17.
Xing F, Zhou X, Fan X, Tian Z, Zhao Y. Raediff: Denoising diffusion probabilistic models based reversible adversarial examples self-generation and self-recovery. arXiv. 2023 - 18.
Xiong L, Yue W, Peipeng Y, Zheng Y. A black-box reversible adversarial example for authorizable recognition to shared images. Pattern Recognition. 2023; 140 :109549 - 19.
Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE; 2019. pp. 658-666 - 20.
Che Z, Borji A, Zhai G, Ling S, Li J, Le Callet P. A new ensemble adversarial attack powered by long-term gradient memories. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020. pp. 3405-3413 - 21.
Fang Y, Jia J, Yang Y, Lyu W. Improving transferability reversible adversarial examples based on flipping transformation. In: International Conference of Pioneering Computer Scientists, Engineers and Educators. Springer; 2023. pp. 417-432 - 22.
Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE; 2021. pp. 13713-13722 - 23.
Bo YANG, Hengwei ZHANG, Zheming LI, Kaiyong XU. Adversarial example generation method based on image flipping transform. Journal of Computer Applications. IEEE. 2022; 42 (8):2319 - 24.
He W, Cai Z. Reversible data hiding based on dual pairwise prediction-error expansion. IEEE Transactions on Image Processing. 2021; 30 :5045-5055