Dear friends,
Last week, I participated in the United States Senate’s Insight Forum on Artificial Intelligence to discuss “Risk, Alignment, & Guarding Against Doomsday Scenarios.” We had a rousing dialogue with Senators Chuck Schumer (D-NY), Martin Heinrich (D-NM), Mike Rounds (R-SD), and Todd Young (R-IN). I remain concerned that regulators may stifle innovation and open source development in the name of AI safety. But after interacting with the senators and their staff, I’m grateful that many smart people in the government are paying attention to this issue.
So a key question is: Can generative AI tools make it much easier to plan and execute a bioweapon attack? Such an attack would involve many steps: planning, experimentation, manufacturing, and finally launching the attack. I have not seen any evidence that generative AI will have a huge impact on the efficiency with which someone can carry out this entire process, as opposed to helping marginally with a subset of steps. From Amdahl’s law, we know that if a tool accelerates one out of many steps in a task, and if that task uses, say, 10% of the overall effort, then at least 90% of the effort needed to complete the task remains.
Keep learning! Andrew
NewsGoogle’s Multimodal ChallengerGoogle unveiled Gemini, its bid to catch up to, and perhaps surpass, OpenAI’s GPT-4. What’s new: Google demonstrated the Gemini family of models that accept any combination of text (including code), images, video, and audio and output text and images. The demonstrations and metrics were impressive — but presented in misleading ways. How it works: Gemini will come in four versions. (i) Gemini Ultra, which will be widely available next year, purportedly exceeds GPT-4 in key metrics. (ii) Gemini Pro offers performance comparable to GPT-3.5. This model now underpins Google’s Bard chatbot for English-language outside Europe. It will be available for corporate customers who use Google Cloud’s Vertex AI service starting December 13, and Generative AI Studio afterward. (Google did not disclose parameter counts for Pro or Ultra.) Two distilled versions — smaller models trained to mimic the performance of a larger one — are designed to run on Android devices: (iii) Gemini Nano-1, which comprises 1.8 billion parameters, and (iv) Nano-2, at 3.25 parameters. A Gemini Nano model performs tasks like speech recognition, summarization, automatic replies, image editing, and video enhancement in the Google Pixel 8 Pro phone.
Misleading metrics: The metrics Google promoted to verify Gemini Ultra’s performance are not entirely straightforward. Google pits Gemini Ultra against GPT-4. However, Gemini Ultra is not yet available, while GPT-4 Turbo already surpasses GPT-4, which outperforms Gemini Pro. Gemini Ultra achieved 90 percent accuracy (human-expert level) on MMLU, which tests knowledge and problem-solving abilities in fields such as physics, medicine, history, and law. Yet this achievement, too, is misleading. Ultra achieved this score via chain-of-thought prompting with 32 examples, while most scores on the MMLU leaderboard are 5-shot learning. By the latter measure, GPT-4 achieves better accuracy. Manipulated demo: Similarly, a video of Gemini in action initially made a splash, but it was not an authentic portrayal of the model’s capabilities. A Gemini model appeared to respond in real time, using a friendly synthesized voice, to audio/video input of voice and hand motions. Gemini breezily chatted its way through tasks like interpreting a drawing in progress as the artist added each line and explaining a sleight-of-hand trick in which a coin seemed to disappear. However, Google explained in a blog post that the actual interactions did not involve audio or video. In fact, the team had entered words as text and video as individual frames, and the model had responded with text. In addition, the video omitted some prompts. Why it matters: Gemini joins GPT-4V and GPT-4 Turbo in handling text, image, video, and audio input and, unlike the GPTs, it processes those data types within the same model. The Gemini Nano models look like strong entries in an emerging race to put powerful models on small devices at the edge of the network.
Europe Clamps DownEurope’s sweeping AI law moved decisively toward approval. What’s new: After years of debate, representatives of the European Union’s legislative and executive branches agreed on a draft of the AI Act, a comprehensive approach to regulating AI. As the legislative session drew to a close, the representatives negotiated nearly nonstop to approve the bill before the deadline. It will return to Europe’s parliament and member countries for final approval in spring 2024 and take effect roughly two years later.
What’s next: The representatives have agreed on these broad strokes, but they will continue to revise the details. After further vetting, the European Parliament will vote again, and a council of deputies from each EU member state will also vote, most likely in early 2024. If both bodies approve the bill, it will take effect no later than 2026. Behind the news: The AI Act has been under construction since 2021. The technology has evolved significantly since then, and the proposal has undergone several revisions to keep pace. The advent of ChatGPT prompted a round of revisions to control foundation models. Negotiations reached fever pitch in late December. France, Germany, and Italy, seeking to protect developers in their countries, sought to weaken restrictions on foundation models. They were opposed by Spain, which sought to strengthen oversight of the most powerful foundation models. The final negotiations concerned exceptions for police and military use of AI within member states. France led a group of countries that pushed for greater military exemptions. Why it matters: The AI Act is the broadest and most detailed effort to regulate AI to date. The stakes are high: Not only does Europe have a budding AI industry of its own, but EU laws often dictate companies’ practices outside the union. Yet the bill won’t take effect for years — when AI may present very different challenges. We’re thinking: Effective regulation should mitigate harm without stifling innovation. The best approach is to regulate applications rather than underlying technology such as foundation models. While the EU restricts some applications in helpful ways, it also limits foundational technology in ways that we expect will hurt innovation in EU member states. We welcome the provisions added at the last moment to lighten the load on small companies and open source developers. These 11th-hour wins reflect the efforts of many people who pushed to protect innovation and openness.
A MESSAGE FROM DEEPLEARNING.AIJoin our new short course, “Reinforcement Learning from Human Feedback,” and learn a key method to align large language models with human values and preferences. Gain a detailed understanding of the technique and use it to fine-tune Llama 2 for an application. Sign up now
Champion for OpennessA new consortium aims to support open source AI. What’s new: Led by Meta and IBM, dozens of organizations from the software, hardware, nonprofit, public, and academic sectors formed the AI Alliance, which plans to develop tools and programs that aid open development. How it works: The AI Alliance’s 57 founding members include established companies like AMD, Intel, Oracle, and Sony; startups like Cerebras and Stability AI; nonprofits such as HuggingFace and the Linux Foundation, public institutes like the European Council for Nuclear Research (CERN) and U.S. National Aeronautics and Space Administration (NASA); and universities in Asia, Europe, and North America. The group stated its intention to pursue a variety of projects:
Behind the news: The membership includes organizations that have prioritized open source development including Meta, Stability AI, and the Linux Foundation. Yet several organizations that provide popular open-source models are not represented, including models released under more permissive open source licenses like GPT Neo and Mistral. Major companies like Apple and Google, who have released some of their work under open source licenses, are also absent. Yes, but: The meaning of “open” is contentious, and AI Alliance does not clearly define it. In large language models, for instance, the spectrum of openness includes:
Why it matters: More openness means faster sharing of knowledge and a greater pace of innovation. The AI Alliance can put substantial resources and breadth of influence behind proponents of openness, potentially acting as a counterweight against well financed commercial interests that are threatened by open source development. For instance, some companies claim that restricting access to AI models is necessary to ensure that bad actors don’t misuse them; of course, it would also eliminate open source competition with those companies. On the other hand, open source advocates argue that transparency makes AI models less likely to be dangerous, since anyone can spot dangers and alter the code to reduce them. We’re thinking: Open source is a powerful engine of innovation that enables people to build freely on earlier developments for the benefit of all. The AI Alliance’s gathering of commercial, institutional, and academic clout looks like a promising approach to promoting openness.
The Big Picture and the DetailsA novel twist on self-supervised learning aims to improve on earlier methods by helping vision models learn how parts of an image relate to the whole. What’s new: Mahmoud Assran and colleagues at Meta, McGill University, Mila, and New York University developed a vision pretraining technique that’s designed to address weaknesses in typical masked image modeling and contrastive learning approaches. They call it Image-based Joint-Embedding Predictive Architecture (I-JEPA). Key insight: Masked image modeling trains models to reconstruct hidden or noisy patches of an image. This encourages models to learn details of training images at the expense of larger features. On the other hand, contrastive approaches train models to create similar embeddings for distorted or augmented versions of the same image. This encourages models to learn larger features, but reliance on augmentations such as zooming and cropping biases models toward those variations versus the wider variety they’re likely to encounter in the wild. I-JEPA combines these approaches: The model learns to embed regions that are made up of many patches, some of them masked, based on the surrounding unmasked patches. This approach balances learning of low- and high-level features. How it works: I-JEPA used three components: (i) A target encoder embedded an image’s target region, (ii) a context encoder embedded the surrounding area, and (iii) a smaller predictor network, given the context embedding, tried to produce an embedding similar to that of the target embedding. All three components were transformers, though other architectures would serve. They were pretrained jointly on ImageNet-1k.
Results: An I-JEPA classifier that used ViT-H/14 encoders achieved 73.3 percent accuracy after about 2,500 GPU-hours of pretraining. A classifier trained on top of a ViT-B/16 base model that had been pretrained for 5,000 GPU-hours using the iBOT method, which relies on hand-crafted augmentations, achieved 69.7 percent accuracy. MAE, a masked modeling rival based on ViT-H/14, achieved 71.5 percent accuracy but required over 10,000 GPU-hours of pretraining. Why it matters: In deep learning for computer vision, there’s a tension between learning details (a specialty of masked image modeling approaches) and larger-scale features (a strength of contrastive methods). I-JEPA gives models more context for learning both details and the high-level features in the training set. We’re thinking: Given a picture of a jungle, I-JEPA would see both the forest and the trees!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|