One of the best places to get a glimpse of the future of AI and marketing is Adobe’s “Sneaks” presentations, given at their annual Summit and Max conferences, which showcase working prototype demos, offered celebrity game-show style, of experimental features that may or may not ever appear in products. This year’s Summit, co-hosted by SNL’s Leslie Jones, offered some impressive AI “magic” that fit well with the company’s theme of experience creation. These innovations are definitely worth a look.
Some sneaks are more memorable than others, but one, presented at 2016’s Adobe Max event, still sticks in my head. It was called #VoCo, and was billed as a tool for “Photoshopping Voiceovers.” VoCo consisted of two parts. One analyzed a vocal waveform and then laid out its text, lyrics-aligned-with-music style, under a visual representation of the waveform. The second enabled one to edit the text with a basic text editor and instantly resynthesize the waveform to render the new text in the speaker’s voice. The demo sounded so startlingly natural, to the point of seeming to produce an organic inflection, that it was indistinguishable from true speech (to me at least). While billed as a tool for filmmakers to use to repair a botched line in postproduction, co-host Jordan Peele immediately picked up on its dystopian overtones and asked about it “falling into the wrong hands.” The designer replied that they’d thought of this and were working on watermarking-style technologies to ensure that a vocal forgery could be identified by an algorithm, even if it could fool a human. But the feature remains in the lab amid what Adobe executives identified as “some negative feedback.”
Yet the fundamental question remains: how can we trust what we see and hear on our digital devices? Beyond the tiresome meme of fake news, something more fundamental has been sneaking up on us for a few years without generating the sort of alarm it deserves. These days it’s hard to hear any alarms over the general din of the contemporary news cycle, but media people and marketers must take heed of technology that comes under the academic heading of “generative AI” or, more specifically, generative content or image synthesis.
This is the practice of training and using computers to generate realistic video, audio and images — usually of people doing and saying things they never did or said — often for “entertainment,” but surely for darker purposes as well. You may have seen it flare-up in the notorious trend of so-called #Deepfakes, which erupted on Reddit last year before being banned on Reddit and a number of other sites. But popular apps like FakeApp (based on Google’s popular TensorFlow platform) have made widespread public access to this technology inevitable. Further research in generative adversarial networks (GANs), as well, assure us that the capabilities of AI to produce audio-visual evidence that’s all but indistinguishable from reality are similarly unavoidable.
The anxiety we feel today over polarized discord regarding what’s real and what’s fake pales in comparison to the society we face when we literally can’t distinguish between real and synthetic evidence. Science, law, and government all depend on this ability, as does our personal security, but commerce and marketing have a special role to play as they provide the economic force that powers today’s global media distribution platforms. More on that in a minute.
The follow-on question is, if technology got us into this mess, can technology get us out of it?
There are two basic approaches here: forensic and pre-emptive. Forensic approaches focus on training machines to detect forgeries, on the assumption there’s always something that gives them away, even if it’s undetectable by humans. I’ll leave it to the experts to debate who wins the cat-and-mouse game that follows.
Pre-emptive methods, on the other hand, offer the hope that evidence can be authenticated at the point of collection, by using software to verify the circumstances and method of its recording, securely and at scale. Once again, it’s blockchain to the rescue. By capturing recording events in a decentralized blockchain ledger, innovators hope to deploy services that can vouch for a record’s authentic origin and unaltered status. The idea is to offer an app or SDK — at first, for mobile devices where much of today’s citizen journalism starts but eventually built into the firmware of all types of recording devices. The app would sense when and where reality is being recorded (and identify tricks like taking a picture of a picture or re-recording a video from a screen), timestamp and hash that video in a cloud-based blockchain ledger (the hash part being a sort of signature that makes it possible to detect alterations), and making the irrevocable information available to anyone who wishes to check it.
Unfortunately today’s solutions are still pretty far away from this goal, and the impediments could be severe. Still, necessity is the mother of invention….
One such solution is TruePic, an image verification company focusing on smartphone photography. Another is Prover