I am passionate about building trustworthy AI systems that we can rely on. My research focuses on generative and multimodal AI systems, addressing important challenges in security, privacy, and safety. From exploring vulnerabilities to improving robustness, I aim to advance AI technologies in a safe and reliable way.
Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data, usually scraped from the internet without proper attribution or consent from content creators. Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs’ cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.
@inproceedings{hintersdorf24nemo,author={Hintersdorf, Dominik and Struppek, Lukas and Kersting, Kristian and Dziedzic, Adam and Boenisch, Franziska},title={Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models},booktitle={Conference on Neural Information Processing Systems (NeurIPS)},year={2024},}
Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks
Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting
In International Conference on Learning Representations (ICLR), 2024
Label smoothing – using softened labels instead of hard ones – is a widely adopted regularization method for deep learning, showing diverse benefits such as enhanced generalization and calibration. Its implications for preserving model privacy, however, have remained unexplored. To fill this gap, we investigate the impact of label smoothing on model inversion attacks (MIAs), which aim to generate class-representative samples by exploiting the knowledge encoded in a classifier, thereby inferring sensitive information about its training data. Through extensive analyses, we uncover that traditional label smoothing fosters MIAs, thereby increasing a model’s privacy leakage. Even more, we reveal that smoothing with negative factors counters this trend, impeding the extraction of class-related information and leading to privacy preservation, beating state-of-the-art defenses. This establishes a practical and powerful novel way for enhancing model resilience against MIAs.
@inproceedings{struppek24smoothing,author={Struppek, Lukas and Hintersdorf, Dominik and Kersting, Kristian},title={Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks},booktitle={International Conference on Learning Representations (ICLR)},year={2024},}
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, and Kristian Kersting
Journal of Artificial Intelligence Research (JAIR), 2023
Models for text-to-image synthesis, such as DALL-E 2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in a textual description, common models reflect cultural stereotypes and biases in their generated images. We analyze this behavior both qualitatively and quantitatively, and identify a model’s text encoder as the root cause of the phenomenon. Additionally, malicious users or service providers may try to intentionally bias the image generation to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.
@article{struppek23homoglyphs,author={Struppek, Lukas and Hintersdorf, Dominik and Friedrich, Felix and Brack, Manuel and Schramowski, Patrick and Kersting, Kristian},title={Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis},journal={Journal of Artificial Intelligence Research (JAIR)},volume={78},year={2023},pages={1017--1068},}
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting
In International Conference on Computer Vision (ICCV), 2023
While text-to-image synthesis currently enjoys great popularity among researchers and the general public, the security of these models has been neglected so far. Many text-guided image generation models rely on pre-trained text encoders from external sources, and their users trust that the retrieved models will behave as promised. Unfortunately, this might not be the case. We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts. By then inserting a single character trigger into the prompt, e.g., a non-Latin character or emoji, the adversary can trigger the model to either generate images with pre-defined attributes or images following a hidden, potentially malicious description. We empirically demonstrate the high effectiveness of our attacks on Stable Diffusion and highlight that the injection process of a single backdoor takes less than two minutes. Besides phrasing our approach solely as an attack, it can also force an encoder to forget phrases related to certain concepts, such as nudity or violence, and help to make image generation safer.
@inproceedings{struppek23rickrolling,author={Struppek, Lukas and Hintersdorf, Dominik and Kersting, Kristian},title={Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis},booktitle={International Conference on Computer Vision (ICCV)},year={2023},pages={4561--4573},}
Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks
Lukas Struppek, Dominik Hintersdorf, Antonio De Almeida Correia, Antonia Adler, and Kristian Kersting
In International Conference on Machine Learning (ICML), 2022
Model inversion attacks (MIAs) aim to create synthetic images that reflect the class-wise characteristics from a target classifier’s private training data by exploiting the model’s learned knowledge. Previous research has developed generative MIAs that use generative adversarial networks (GANs) as image priors tailored to a specific target model. This makes the attacks time- and resource-consuming, inflexible, and susceptible to distributional shifts between datasets. To overcome these drawbacks, we present Plug & Play Attacks, which relax the dependency between the target model and image prior, and enable the use of a single GAN to attack a wide range of targets, requiring only minor adjustments to the attack. Moreover, we show that powerful MIAs are possible even with publicly available pre-trained GANs and under strong distributional shifts, for which previous approaches fail to produce meaningful results. Our extensive evaluation confirms the improved robustness and flexibility of Plug & Play Attacks and their ability to create high-quality images revealing sensitive class characteristics.
@inproceedings{struppek2022ppa,booktitle={International Conference on Machine Learning (ICML)},pages={20522--20545},year={2022},author={Struppek, Lukas and Hintersdorf, Dominik and Correia, Antonio De Almeida and Adler, Antonia and Kersting, Kristian},title={Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks},}
Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash
Lukas Struppek*, Dominik Hintersdorf*, Daniel Neider, and Kristian Kersting
In ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2022
Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system’s reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.
@inproceedings{struppek2022learning,year={2022},author={Struppek, Lukas and Hintersdorf, Dominik and Neider, Daniel and Kersting, Kristian},title={ Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash },booktitle={ACM Conference on Fairness, Accountability, and Transparency (FAccT)},pages={58--69},}
To Trust or Not To Trust Prediction Scores for Membership Inference Attacks
Dominik Hintersdorf*, Lukas Struppek*, and Kristian Kersting
In International Joint Conference on Artificial Intelligence (IJCAI) , 2022
Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model. Knowing this may indeed lead to a privacy breach. Most MIAs, however, make use of the model’s prediction scores - the probability of each output given some input - following the intuition that the trained model tends to behave differently on its training data. We argue that this is a fallacy for many modern deep network architectures. Consequently, MIAs will miserably fail since overconfidence leads to high false-positive rates not only on known domains but also on out-of-distribution data and implicitly acts as a defense against MIAs. Specifically, using generative adversarial networks, we are able to produce a potentially infinite number of samples falsely classified as part of the training data. In other words, the threat of MIAs is overestimated, and less information is leaked than previously assumed. Moreover, there is actually a trade-off between the overconfidence of models and their susceptibility to MIAs: the more classifiers know when they do not know, making low confidence predictions, the more they reveal the training data.
@inproceedings{hintersdorf2022trust,pages={3043--3049},booktitle={International Joint Conference on Artificial Intelligence ({IJCAI}) },year={2022},author={Hintersdorf, Dominik and Struppek, Lukas and Kersting, Kristian},title={ To Trust or Not To Trust Prediction Scores for Membership Inference Attacks },}
You can find my e-mail address on my publications.