“Exploring the Evolution of Digital Realism: Hao Li on CGI and Deepfakes at the AI for Good Global Summit”
At the AI for Good Global Summit, Hao Li, the CEO and Co-founder of Pinscreen, as well as an Associate Professor at Muhammad bin Zayed University of Artificial Intelligence, explored the captivating domain of generative AI. He shed light on how it is revolutionizing visual effects and its broader societal implications. His discussion illuminated both the astonishing potential of this groundbreaking technology and the urgent ethical dilemmas it presents.
The Evolution of CGI and Generative AI
Generative AI, especially in the realm of computer-generated imagery (CGI), has dramatically changed storytelling in the entertainment industry. Li recounted the landmark influence of early CGI in iconic films such as *Terminator 2*, where the fluid metal morphing T-1000 showcased the boundless possibilities offered by computer graphics. “Basically, what was telling me is that CGI or visual effects was able to do anything that is possible,” Li stated, setting the stage for a career focused on pushing the limits of digital realism.
Films like *Avatar* exemplify how deeply CGI has permeated contemporary filmmaking. With an impressive 3,000 shots featuring visual effects, *Avatar* illustrated that nearly every aspect of a film could be either created or enhanced digitally. However, a significant hurdle remains in rendering lifelike human faces due to the “Uncanny Valley” effect. This phenomenon arises when a digital human likeness is nearly, but not quite, realistic, leading to discomfort among viewers. “CGI faces were pretty often very creepy, and people would avoid them in movies,” Li pointed out, emphasizing audience sensitivity to human likeness and how it can make almost-realistic CGI characters feel unsettling.
The Complexity Behind CGI
Li elaborated on the intricacies of this effect. As CGI characters edge closer to photorealism, viewers become increasingly aware of even the slightest discrepancies, like unnatural movements or minor facial imperfections, which can make them seem eerie or zombie-like. “Intuitively, you might think that if we add more realism to computer-generated imagery, then you might find it to have more empathy,” Li explained, noting that often the inverse occurs: “Because we get more sensitive to it, we might notice everything that appears very bizarre.”
Advancements and Applications in Facial CGI
Over the years, technological innovations have significantly enhanced the realism of CGI faces. Strategies such as multi-view stereo systems and photometric stereo systems have facilitated high-resolution actor scans, capturing intricate details right down to their pores. Despite these advancements, the method remains complex and costly. “You need a very complex system; it’s a million-dollar system, it’s hard to deploy, and not to mention the amount of heavy post-processing that happens afterward,” Li explained.
His endeavors at ETH Zurich and projects at Weta Digital have been pivotal in advancing facial tracking and digital human creation. One notable achievement involved recreating the likeness of the late Paul Walker for the *Fast & Furious* franchise, illustrating generative AI’s potential to maintain continuity in filmmaking, even in the wake of real-world tragedies.
Li also discussed the crucial role of deep neural networks in enhancing facial CGI. “Initially, deep neural networks were modeled using artificial neural networks, basically multi-layer perceptrons,” he explained. The breakthrough was made with convolutional neural networks, characterized by multiple layers of functions. This technology significantly improved image analysis, enabling the generation of realistic facial expressions. “Instead of using a deep neural network as a classifier, we can actually use it as a regression into different facial blend shapes,” Li emphasized, showcasing how these advancements allow for the development of robust systems capable of generating lifelike images in real-time.
These technological innovations have broadened generative AI applications, enabling real-time overlaying of facial expressions from one person onto another, thus unlocking new possibilities in the entertainment industry and beyond.
Pioneering Realistic Image Creation with Generative Adversarial Networks
Li highlighted the significance of Generative Adversarial Networks (GANs), particularly StyleGAN, in driving generative AI forward, emphasizing its incredible potential: “If you train this network with sufficient data, you can actually generate realistic images of things that never existed […]. This is the crown jewel of generative AI,” he stated. These networks have facilitated the generation of strikingly realistic images and expressions, significantly broadening the sphere of generative AI from entertainment to everyday applications.
Ethical Concerns and the Future of Generative AI
Nevertheless, with the promise of generative AI comes pressing ethical concerns—especially related to deepfakes. Li pointed out that a significant volume of deepfake content is associated with non-consensual pornography, often targeting public figures and raising critical privacy issues. The potential misuse of deepfakes in fraud and disinformation campaigns represents a grave threat. He recounted an incident where a financial employee was duped into facilitating a $25 million transaction by someone using audio-visual deepfakes in a live conversation, marking one of the first known instances of such technology being weaponized in this way.
“The main problem here is that we now have a technology, as opposed to in the past that was only accessible to visual effects and production studios, that is accessible to everyone,” Li stressed.
To address these challenges, Li underscored the need for robust deepfake detection technologies and heightened public awareness. “Together with the World Economic Forum, we’ve actually developed the first real-time deepfake technology,” he noted, highlighting proactive efforts to stay ahead of potential misuse.
Enhancing Communication and Beyond
Li envisions generative AI playing a pivotal role in improving communication and creating more realistic virtual avatars. “We’re seeing many applications in visual effects, but I think a much bigger impact is if we can use this technology to enhance communication,” Li asserted.
He pointed out the transformative potential of generative AI in virtual interactions, enabling the creation of lifelike avatars from a single photograph. By leveraging disentanglement techniques, this technology generates photorealistic avatars that can capture real-time movements and expressions.
Li discussed broader implications within virtual reality, where the integration of generative AI with VR headsets can heighten realism in virtual meetings, educational sessions, and social engagements. Moreover, in the entertainment sector, such innovations could lead to the creation of digital doubles, effective de-aging effects, and allow actors to convincingly speak multiple languages, thereby enhancing their acting versatility and diminishing the reliance on standard dubbing or subtitles.
Li’s presentation highlighted that while generative AI presents extraordinary enhancements in visual effects and communication, it also demands careful scrutiny over its ethical ramifications and robust safeguards against misuse. “AI is really the tool of choice, but simultaneously, as we discuss technology capable of generating anything, diffusion model-based technology represents a key challenge,” he elucidated.
As these technologies become increasingly available, the industry must confront challenges surrounding content control and the development of user-friendly interfaces. Hao Li’s insights at the AI for Good Global Summit not only showcase the phenomenal capabilities of generative AI but also call for responsible innovation in this swiftly evolving field.