Finding AI-Generated Faces in the Wild: Discussion, Acknowledgements, and References

12 Jun 2024


(1) Gonzalo J. Aniano Porcile, LinkedIn;

(2) Jack Gindi, LinkedIn;

(3) Shivansh Mundra, LinkedIn;

(4) James R. Verbus, LinkedIn;

(5) Hany Farid, LinkedIn and University of California, Berkeley.

5. Discussion

For many image classification problems, large neural models – with appropriately representative data – are attractive for their ability to learn discriminating features. These models, however, can be vulnerable to adversarial attacks [4]. It remains to be seen if our model is as vulnerable as previous models in which imperceptible amounts of adversarial noise confound the model [3]. In particular, it remains to be seen if the apparent structural or semantic artifacts we seem to have learned will yield more robustness to intentional adversarial attacks.

In terms of less sophisticated attacks, including laundering operations like transcoding and image resizing, we have

Figure 5. Examples of AI-generated faces and their normalized integrated gradients, revealing that our model is primarily focused on facial regions: (a) an average of 100 StyleGAN 2 faces, (b) DALL-E 2, (c) Midjourney, (d,e) Stable Diffusion 1,2.

shown that our model is resilient across a broad range of laundering operations.

The creation and detection of AI-generated content is inherently adversarial with a somewhat predictable back and forth between creator and detector. While it may seem that detection is futile, it is not. By continually building detectors, we force creators to continue to invest time and cost to create convincing fakes. And while the sufficiently sophisticated creator will likely be able to bypass most defenses, the average creator will not.

When operating on large online platforms like ours, this mitigation – but not elimination – strategy is valuable to creating safer online spaces. In addition, any successful defense will employ not one, but many different approaches that exploit various artifacts. Bypassing all such defenses will pose significant challenges to the adversary. By learning what appears to be a robust artifact that is resilient across resolution, quality, and a range of synthesis engines, the approach described here adds a powerful new tool to a defensive toolkit.


This work is the product of a collaboration between Professor Hany Farid and the Trust Data team at LinkedIn[10]. We thank Matya´s Bohacek for his help in creating the AI-generated faces. We thank the LinkedIn Scholars[11] program for enabling this collaboration. We also thank Ya Xu, Daniel Olmedilla, Kim Capps-Tanaka, Jenelle Bray, Shaunak Chatterjee, Vidit Jain, Ting Chen, Vipin Gupta, Dinesh Palanivelu, Milinda Lakkam, and Natesh Pillai for their support of this work. We are grateful to David Luebke, Margaret Albrecht, Edwin Nieda, Koki Nagano, George Chellapa, Burak Yoldemir, and Ankit Patel at NVIDIA for facilitating our work by making the StyleGAN generation software, trained models and synthesized images publicly available, and for their valuable suggestions.


This paper is available on arxiv under CC 4.0 license.

[10] The model described in this work is not used to take action on any LinkedIn members.
