Understanding deepfakes and their commercial applications with Modulate.ai
By Jasmine Alberts
‘Deepfakes,’ a portmanteau of ‘deep learning’ and ‘fake’, is used to describe fake media (Google Deeptrace). The use case that is making the most headlines is video, whereby one person’s face is superimposed onto another’s, making someone appear as if they said or did something that they didn’t.
Vice was among the first to report on this technology in December 2017, which has since garnered mainstream attention. Awareness of deepfakes spiked in 2018, with Google searches increasing from a global average of 100 searches per month to 100,000 per month (Deeptrace).
Fake video content has been around as early as the 1920s, where filmmakers would produce dramatized fakes of natural disasters in the studio to sensationalize newsreels. With the emergence of deepfakes, creating ‘fake news’ is as simple as the click of a button. The accessibility of the software–often available for free and requiring no coding skills–has prompted its widespread production.
Let’s get technical
While the concept behind deepfakes has early roots, the AI-driven application has only recently materialized. In 1959, Arthur Samuel pioneered the idea of ‘learning via competition’ through the development of the Samuel Checkers Player Program. It’s an algorithm that learns to play checkers by using two competing neural networks–similar to the biological neural networks in the human brain (AI Magazine).
The two networks, the ‘forger’ and the ‘detective’–or in some cases, the ‘generator’ and the ‘discriminator’–try to outwit one another in order to improve or ‘learn.’ This system is known as ‘generative adversarial networks’ (GAN), which is what developers believe gives an AI traits that resemble an imagination (MIT Technology Review).
In the case of video content, images are stored in a dataset that both networks will reference. A facial recognition algorithm is then used to extract the required details of face A to be overlaid onto another image, face B. The generator produces an artificial output with face A in place of face B; the discriminator compares the output with the original images of A and attempts to distinguish the real from the fake. Based on the results, the generator adjusts its parameters to create a newer, better fake. Then, the discriminator attempts to determine whether this new image is fake. This process is indefinitely repeated until the discriminator can no longer distinguish the real from the fake.
Although relatively new technology, GANs have developed quickly. In May of this year, researchers at Samsung demonstrated a GAN-based system that can produce videos of someone speaking after being given only a single photo of the person.
The good and the bad
The term ‘deepfakes’ has been heavily tarnished for its most infamous applications in creating scandalous fake videos of celebrities, which end up on pornography sites and garner thousands of views. More than 8,000 videos of this kind were identified on adult sites in 2018 (Deeptrace).
A Reddit user under the pseudonym ‘deepfakes,’ who is credited with coining the term, is known for being the first to create this type of content, consequently sparking interest and further development efforts in the Reddit community. Revenge porn, defamation of politicians through ‘fake news,’ and blackmail are ways that this technology is being harnessed. However, it’s not inherently malicious and innovative startups are developing it for commercial purposes.
Modulate is one such example. Founded in 2017 by two graduates from the Massachusetts Institute of Technology, Modulate creates synthetic media or machine-generated media–a subcategory of deepfakes. Instead of fake video content, Modulate develops fake audio using similar GAN technology, providing its users with ‘voice skins’ to enable them to speak with the voice of their favorite character or celebrity on gaming platforms.
After releasing a few pilots, the company is excited about the positive response it has received from the gaming community and its potential applications.
“If I know that I want to sound a certain way and if I don’t have the skills or the biology to make it happen, being able to use a tool to help me express myself more fully, I think is an incredibly powerful thing,” says Modulate Co-founder and CEO Mike Pappas.
Many tech leaders today would agree. Facebook AI Research Chief AI Scientist Yann LeCun is equally optimistic about GAN technology, calling it, “the coolest idea in deep learning in the last 20 years.”
Stakes are high, a response is crucial
Commercialization hasn’t completely assuaged fears about the potential harm the technology could bring. Deepfakes have caught the attention of politicians due to their potential threat to security. U.S. Republican Senator Marco Rubio likened the technology to the modern equivalent of nuclear weapons for its ability to create fake content that could threaten an election.
It’s undeniable that our notions about the authenticity of the media we consume daily are being challenged. In response, the United States Defense Advanced Research Projects Agency launched a media forensics program to study ways to counteract fake media; in particular, how to better identify the subtle distinguishing features of the fake from the genuine.
The U.S. has also introduced the Deepfakes Accountability Act. If passed, it would make crafting malicious deepfake content a punishable offense. Currently, such actions do not constitute a specific crime and have been prosecuted under harassment, identity theft, cyberstalking, or revenge porn charges.
Companies like Modulate are well aware of the ethical considerations of their technology and have taken steps to ensure that fabricated audio can be distinguished as fake.
“We can build in this watermark that detects when synthetic speech is Modulate-generated versus real human speech,” says Modulate Co-founder and CTO Carter Huffman.
He explains that since Modulate is synthesizing voice, they have fine-grained control over the exact frequencies used and even minor changes in timing of speech. The company can then create a recognizable pattern–or watermark–without actually changing how the speech would sound to the human ear. Pappas also notes that efforts to find technical interventions are a focus and priority for other companies in the space, and that he’s encouraged by the collaborative efforts he’s seen.
“There are several [companies] that we’ve spoken in-depth with, where we were actually sharing information and trying to trade back and forth to improve each others’ approaches to this,” he adds.
Modulate’s efforts in the gaming industry are rooted in the fictional context of the technology’s application, which also helps to inform users about fake audio. In other words, Pappas predicts that an “immune system” toward fake content will develop in society over time, although education is still needed.
“Photoshop has been around for a while. We found a way to integrate that into common knowledge, and we found a way to start handling the idea that our photos might be faked,” he says.
With many users requesting celebrity or politician voices, Modulate seeks permission from the relevant party before creating a voice skin. The reciprocated benefit to the celebrity or politician is that they can use the voice skin to further build their personal brand or popularity, especially if its use is outside their typical demographic.
“Part of our strategy is going to these people who have desirable voices that people might want and saying, Hey look, we think this is a cool application, we want to work with you to build this voice skin, as opposed to finding some of their audio and copying their voice without their permission,” says Huffman.
The possibilities for Modulate’s technology go far beyond the gaming realm and into areas like audiobooks and movies. One narrator could potentially have a dozen voice skins, one for each of the different characters. Also, dialogue lost during filming could be easily re-recorded by someone else via the use of a voice skin, saving production time and the efforts of actors, sound engineers, and other staff who would otherwise have had to go back to set.
Looking ahead, Modulate is excited by the potential applications of its technology that the team has yet to consider. Huffman says that users often make suggestions about how it can be used.
“Everywhere someone uses voice for any reason, there’s going to be a reason to want to customize it, to want to have more control,” adds Pappas.
With deepfake technology rapidly growing and evolving–where harmful applications are developing alongside commercial ones–Pappas believes that entrepreneurs wishing to enter this space should know what they ultimately want to accomplish.
“Think really deeply about what it is that you want to build and how it is that you want to change society,” he says.
Jasmine is Jumpstart’s Editorial Intern.