Dear friends,
Congratulations to Geoff Hinton and John Hopfield for winning the 2024 Physics Nobel Prize! It’s wonderful to see pioneering work in AI recognized, and this will be good for our whole field. Years ago, I was the first to call Geoff the “Godfather of Deep Learning,” which later became “Godfather of AI.” I’m thrilled at the recognition he’s receiving via this most prestigious of awards.
But the Nobel committee wasn’t done yet. One day after the physics prize was announced, Demis Hassabis, John Jumper, and David Baker won the Chemistry Nobel Prize for their work on AlphaFold and protein design. AlphaFold and AlphaFold 2, as well as the work of Baker’s lab, are compelling applications of AI that made significant steps forward in chemistry and biology, and this award, too, is well deserved!
It’s remarkable that the Nobel committees for physics and chemistry, which are made up of scientists in those fields, chose to honor AI researchers with this year’s awards. This is a sign of our field’s growing impact on society. While it’s good that people from outside AI are recognizing AI researchers, I wonder if there’s room for the AI community to pick more award recipients ourselves. Best-known in computer science is the Turing Award, which is selected by a broad group of computer scientists, many of whom have deep AI knowledge. Many AI conferences give out best-paper awards. And applications of AI to other fields doubtless will continue to receive much-deserved recognition by leaders in those fields. I’m optimistic this will allow AI researchers to win more Nobel Prizes — someday also in economics, literature, medicine, and peace, too. Nonetheless, this seems like a good time to see how all of us in AI can do more to recognize the work of innovators in our field.
Geoff once thanked me for my role in getting him anointed “Godfather of AI,” which he said was good for his career. I didn’t realize before that I had the power to give out such titles 😉 but I would love for there to be numerous godfathers and godmothers — and many other awards — in AI!
Keep learning! Andrew
A MESSAGE FROM DEEPLEARNING.AITry the new capabilities of Llama 3.2 in our latest course with Meta. Learn how to compose multimodal prompts, call custom tools, and use the Llama Stack API to build applications with Meta’s family of open weights models. Enroll for free!
NewsFamiliar Faces, Synthetic SoundtracksMeta upped the ante for text-to-video generation with new systems that produce consistent characters and matching soundtracks. What’s new: Meta presented Movie Gen, a series of four systems that generate videos, include consistent characters, alter generated imagery, and add matching sound effects and music. Movie Gen will be available on Instagram in 2025. Meanwhile, you can view and listen to examples here. The team explains how the model was built an extensive 92-page paper. Generated videos: Movie Gen Video can output 256 frames (up to 16 seconds at 16 frames per second) at 1920x1080-pixel resolution. It includes a convolutional neural network autoencoder, transformer, and multiple embedding models.
Consistent characters: Given an image of a face, a fine-tuned version of Movie Gen Video generates a video that depicts a person with that face.
Altered clips: The team modified Movie Gen Video’s autoencoder to accept an embedding of an alteration — say, changing the background or adding an object. They trained the system to alter videos in three stages:
Synthetic soundtracks: Given a text description, a system called Movie Gen Audio generates sound effects and instrumental music for video clips up to 30 seconds long. It includes a DACVAE audio encoder (which encodes sounds that comes before and/or after the target audio), Long-prompt MetaCLIP video encoder, T5 text encoder, vanilla neural network that encodes the current time step, and transformer.
Results: Overall, Movie Gen achieved performance roughly equal to or better than competitors in qualitative evaluations of overall quality and a number of specific qualities (such as “realness”). Human evaluators rated their preferences for Movie Gen or a competitor. The team reported the results in terms of net win rate (win percentage minus loss percentage) between -100 percent and 100 percent, where a score above zero means that a system won more than it lost.
Why it matters: With Movie Gen, table stakes for video generation rises to include consistent characters, soundtracks, and various video-to-video alterations. The 92-page paper is a valuable resource for builders of video generation systems, explaining in detail how the team filtered data, structured models, and trained them to achieve good results. We’re thinking: Meta has a great track record of publishing both model weights and papers that describe how the models were built. Kudos to the Movie Gen team for publishing the details of this work!
Voice-to-Voice and More for GPT-4o APIOpenAI launched a suite of new and updated tools to help AI developers build applications and reduce costs. What’s new: At its annual DevDay conference, OpenAI introduced an API for speech processing using GPT-4o, distillation tools, vision fine-tuning capabilities, and the ability to cache prompts for later re-use. These tools are designed to make it easier to build fast applications using audio inputs and outputs, customize models, and cut costs for common tasks. Development simplified: The new offerings aim to make it easier to build applications using OpenAI models, with an emphasis on voice input/output and image input, customizing models, and resolving common pain points.
Behind the news: OpenAI is undertaking a major corporate transformation. A recent funding round values OpenAI at $157 billion, making it among the world’s most valuable private companies, and the company is transferring more control from its nonprofit board to its for-profit subsidiary. Meanwhile, it has seen an exodus of executives that include CTO Mira Murati, Sora co-lead Tim Brooks, chief research officer Bob McGrew, research VP Barret Zoph, and other key researchers. Why it matters: The Realtime API enables speech input and output without converting speech to text, allowing for more natural voice interactions. Such interactions open a wide range of applications, and they’re crucial for real-time systems like customer service bots and virtual assistants. Although Amazon Web Service and Labelbox provide services to distill knowledge from OpenAI models into open architectures, OpenAI’s tools ease the process of distilling from OpenAI models into other OpenAI models. Image fine-tuning and prompt caching, like similar capabilities for Anthropic Claude and Google Gemini, are welcome additions. We’re thinking: OpenAI’s offerings have come a long way since DevDay 2023, when speech recognition was “coming soon.” We’re eager to see what developers do with voice-driven applications!
German Court: LAION Didn’t Violate CopyrightsA German court dismissed a copyright lawsuit against LAION, the nonprofit responsible for large-scale image datasets used to train Midjourney, Stable Diffusion, and other image generators. What’s new: The court rejected a lawsuit claiming that cataloging images on the web to train machine learning models violates the image owners’ copyrights. It ruled that LAION’s activities fall under protections for scientific research. How it works: LAION doesn’t distribute images. Instead, it compiles links to images and related text that are published on publicly available websites. Model builders who wish to use the images and/or text must download them from those sources. In 2023, photographer Robert Kneschke sued LAION for including his photos. The court’s decision emphasized several key points.
Behind the news: Several other artists have sued LAION, which stands for Large-scale AI Open Network, claiming that the organization used their works without their consent. They have also sued AI companies, including a class action suit against Stability AI, Midjourney, and DeviantArt for using materials under copyright, including images in LAION’s datasets, to train their models. Similar cases have been brought against makers of music generators and coding assistants. All these lawsuits, which are in progress, rest on the plaintiff’s claim that assembling a training dataset of copyrighted works infringes copyrights. Why it matters: The German ruling is the first AI-related decision in Europe since the adoption of the AI Act, and the court took that law’s intent into account when making its decision. It affirms that creating text-image pairs of publicly available material for the purpose of training machine learning models does not violate copyrights, even if commercial organizations later use the data. However, the court did not address whether training AI models on such datasets, or using the trained models in a commercial setting, violates copyrights. We’re thinking: This decision is encouraging news for AI researchers. We hope jurisdictions worldwide establish that training models on media that’s available on the open web is fair and legal. AI’s Criminal Underground RevealedResearchers probed the black market for AI services that are designed to facilitate cybercrime. What’s new: Zilong Lin and colleagues at Indiana University Bloomington studied how large language models (LLMs) are used to provide harmful services, specifically generating malicious code, phishing emails, and phishing websites. They weren’t very effective, by and large (though a high success rate may not be necessary to support a thriving market in automated criminal activity). Risky business: Providers base such services on either uncensored LLMs — that is, those that weren’t fine-tuned to reflect human preferences or don’t employ input/output filters — or publicly available models that they prompt using jailbreak techniques that circumvent built-in guardrails. They sell their services in hacker’s marketplaces and forums, charging far less than typical traditional malware vendors, but services based on models that have been fine-tuned to deliver malicious output command a premium. The authors found that one service generated revenue of more than $28,000 in two months. Sprawling market: The authors identified 212 harmful services. Of those, 125 were hosted on the Poe AI platform, 73 were on FlowGPT, and the remaining 14 resided on unique servers. Of those, the authors were unable to access five because either the provider blocked them, or the service was fraudulent. They identified 11 LLMs used by these services including Claude-2-100k, GPT-4, and Pygmalion-13B (a variant of LLaMA-13B). Testing output quality: The authors prompted more than 200 services using over 30 prompts to generate malicious code, phishing emails, or phishing websites. They evaluated the responses according to:
In all three tasks, at least one service achieved evasiveness of 67 percent or higher, while the majority of services achieved an evasiveness of less than 30 percent. Testing real-world effectiveness: In addition, the authors ran practical tests to see how well the output worked in real-world situations. They prompted nine services to generate code that would target three specific vulnerabilities that relate to buffer overflow and SQL injection. In these tests, the models were markedly less successful.
Why it matters: Previous work showed that LLMs-based services could generate misinformation and other malicious output, but little research has probed their actual use in cybercrime. This work evaluates their quality and effectiveness. In addition, the authors released the prompts they used to circumvent guardrails and generate malicious output — a resource for further research that aims to fix such issues in future models. We’re thinking: It’s encouraging to see that harmful services didn’t get far in real-world tests, and the authors' findings should put a damper on alarmist scenarios of AI-enabled cybercrime. That doesn’t mean we don’t need to worry about harmful applications of AI technology. The AI community has a responsibility to design its products to be beneficial and evaluate them thoroughly for safety.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|