Dear friends,
ChatGPT has raised fears that students will harm their learning by using it to complete assignments. Voice cloning, another generative AI technology, has fooled people into giving large sums of money to scammers, as you can read below in this issue of The Batch. Why don’t we watermark AI-generated content to make it easy to distinguish from human-generated content? Wouldn’t that make ChatGPT-enabled cheating harder and voice cloning less of a threat? While watermarking can help, unfortunately financial incentives in the competitive market for generative AI make their adoption challenging.
Even if a particular country were to mandate watermarking of AI-generated content, the global nature of competition in this market likely would incentivize providers in other countries to ignore that law and keep generating human-like output without watermarking.
So what’s next? I think we’re entering an era when, in many circumstances, it will be practically impossible to tell if a piece of content is human- or AI-generated. We will need to figure out how to re-architect both human systems such as schools and computer systems such as biometric security to operate in this new — and sometimes exciting — reality. Years ago when Photoshop was new, we learned what images to trust and not trust. With generative AI, we have another set of discoveries ahead of us.
Keep learning! Andrew
DeepLearning.AI ExclusiveWorking AI: Hackathon HeroGerry Fernando Patia didn’t come from a privileged background or attend a big-name university. So how did he land at Facebook right out of school? Read his story and learn how he used hackathons to attract recruiters.
NewsVoice Clones Go ViralTired of rap battles composed by ChatGPT? Get ready for the next wave of AI-generated fun and profit.
Yes, but: The democratization of voice cloning opens doors to criminals and pranksters.
Why it matters: Voice cloning has entered the cultural mainstream facilitated by online platforms that offer AI services free of charge. Images, text, and now voices rapidly have become convincing and accessible enough to serve as expressive tools for media producers of all sorts. We’re thinking: With new capabilities come new challenges. Many social and security practices will need to be revised for an era when a person’s voice is no longer a reliable mark of their identity.
No Copyright for Generated ImagesThe output of AI-driven image generators is not protected by copyright in the United States. What’s new: The U.S. Copyright Office concluded that copyright does not apply to images generated by the image generator Midjourney. Split decision: In September, 2022, the agency granted a copyright for the comic book Zarya of the Dawn. The following month, however, it alerted author Kris Kashtanova of their intent to cancel the copyright after they learned from the author’s social media posts that Midjourney had produced the images. Kashtanova appealed the decision, and the agency revised its decision by granting a copyright for the text and arrangement of the images on its pages. Humans versus machines: The agency explained its rationale:
Mixed results: Kashtanova said the agency’s decision to protect the text and layout was “great news” but vowed to continue lobbying for copyright protection of the images as well. Yes, but: Different countries are likely to decide such issues differently, creating potential conflicts as intellectual property moves over the internet. While the U.S. has denied protection for intellectual property created by AI, in 2021 South Africa issued a patent that names an AI system as the inventor of a food container with unique properties. Why it matters: Who owns the output of generative AI models? No one — in the U.S., at least. This decision is bound to influence business strategies throughout the publishing and creative communities as generated text, images, video, sound, and the like proliferate. We’re thinking: It takes imagination and skill to generate a satisfying picture using Midjourney including envisioning an image, composing an effective prompt, and following a disciplined process over multiple attempts. Denying the creativity, expertise, and contribution of people who use AI as a creative tool strikes us as a mistake.
A MESSAGE FROM WORKERAAndrew Ng talks with Workera CEO Kian Katanforoosh about upskilling in machine learning and how he hires world-class AI teams in the newest episode of Workera’s Skills Baseline podcast. Watch it here
Text-Driven Video AlterationOn the heels of systems that generate video directly from text, new work uses text to adjust the imagery in existing videos. What’s new: Patrick Esser and colleagues at Runway unveiled Gen-1, a system that uses a text prompt or image to modify the setting (say, from suburban yard to fiery hellscape) or style (for instance, from photorealism to claymation) of an existing video without changing its original shapes and motions. You can see examples and request access here. Key insight: A video can be considered to have what the authors call structure (shapes and how they move) and content (the appearance of each shape including its color, lighting, and style). A video generator can learn to encode structure and content in separate embeddings. At inference, given a clip, it can replace the content embedding to produce a video with the same structure but different content. How it works: Gen-1 generates video frames much like a diffusion model, and the authors trained it following the typical diffusion-model training procedure: Add to each training example varying amounts of noise — nearly up to 100 percent — then train the model to remove it. To generate a video frame, the model starts with 100 percent noise and, guided by a text prompt or image, removes it over several steps. The system used three embeddings: (i) a frame embedding for each video frame (to which noise was added and removed), (ii) a structure embedding for each video frame, and (iii) a content embedding for the entire clip. The dataset comprised 6.4 million eight-frame videos and 240 million images, which the system treated as single-frame videos.
Results: Five human evaluators compared Gen-1 to SDEdit, which alters each frame individually. Testing 35 prompts, the evaluators judged Gen-1’s output to better reflect the text 75 percent of the time. Why it matters: Using different embeddings to represent different aspects of data gives Gen-1 control over the surface characteristics of shapes in a frame without affecting the shapes themselves. The same idea may be useful in manipulating other media types. For instance, MusicLM extracted separate embeddings for large-scale composition and instrumental details. A Gen-1-type system might impose one musical passage’s composition over another’s instruments. We’re thinking: Gen-1 doesn’t allow changes in objects in a frame, such as switching the type of flower in a vase, but it does a great job of retaining the shapes of objects while changing the overall scenery. The authors put this capability to especially imaginative use when they transformed books standing upright on a table into urban skyscrapers.
Deep (Learning) StateMeet the Romanian government’s automated political adviser.
Behind the news: Governments use AI to manage operations, dispense benefits, and administer justice. However, systems that influence policy remain largely experimental. For instance, Salesforce engineers trained a model to create a tax policy that promoted general income equality and productivity more effectively than the current United States tax code. We’re thinking: Many companies analyze social media to understand customer sentiment; for instance, clustering tweets to see what people are saying about a brand. Policymakers' embrace of a similar approach is a welcome step. Work With Andrew Ng
Customer Success Engineer: Landing AI seeks an engineer to build the foundation of the customer success function and handle account management inquiries, renewals, implementation/services, technical support, customer education, and operations. The ideal candidate is customer-oriented and experienced in the artificial intelligence/machine learning industry. Apply here
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|