Dear friends,
Last week, the White House announced voluntary commitments by seven AI companies, as you can read below. Most of the points were sufficiently vague that it seems easy for the White House and the companies to declare success without doing much that they don’t already do. But the commitment to develop mechanisms to ensure that users know when content is AI-generated, such as watermarks, struck me as concrete and actionable. While most of the voluntary commitments are not measurable, this one is. It offers an opportunity, in the near future, to test whether the White House’s presently soft approach to regulation is effective. I was pleasantly surprised that watermarking was on the list. It’s beneficial to society, but it can be costly to implement (in terms of losing users).
As I wrote in an earlier letter, watermarking is technically feasible, and I think society would be better off if we knew what content was and wasn’t AI-generated. However, many companies won’t want it. For example, a company that uses a large language model to create marketing content may not want the output to be watermarked, because then readers would know that it was generated by AI. Also, search engines might rank generated content lower than human-written content. Thus, the government’s push to have major generative AI companies watermark their output is a good move. It reduces the competitive pressure to avoid watermarking. All the companies that agreed to the White House’s voluntary commitments employ highly skilled engineers and are highly capable of shipping products, so they should be able to keep this promise. When we look back after three or six months, it will be interesting to see which ones:
To be fair, I think it would be very difficult to enforce watermarking in open source systems, since users can easily modify the software to turn it off. But I would love to see watermarking implemented in proprietary systems. The companies involved are staffed by honorable people who want to do right by society. I hope they will take the announced commitments seriously and implement them faithfully.
I would love to get your thoughts on this as well. How can we collectively hold the U.S. government and AI companies to these commitments? Please let me know on social media!
Keep learning, Andrew
P.S. A new short course, developed by DeepLearning.AI and Hugging Face, is available! In “Building Generative AI Applications with Gradio,” instructor Apolinário Passo shows you how to quickly create fun demos of your machine learning applications. Prompting large language models makes building applications faster than ever, but how can you demo your work, either to get feedback or let others to experience what you’ve built? This course shows you how to do it by writing only Python code.
NewsAI Firms Agree to Voluntary GuidelinesIn the absence of nationwide laws that regulate AI, major U.S. tech companies pledged to abide by voluntary guidelines — most of which they may already be following. What’s new: Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI agreed to uphold a list of responsible-AI commitments, the White House announced. How it works: President Biden, Vice President Harris, and other administration officials formulated the terms of the agreement in consultation with tech leaders. The provisions fall into three categories:
Behind the news: The surge of generative AI has spurred calls to regulate the technology. The rising chorus has given companies ample incentive to accept voluntary limits while trying to shape forthcoming mandates.
Yes, but: The commitments — with the exception of watermarking generated output — are relatively easy to fulfill, and some companies may be able to say that they already fulfill them. For instance, many established companies employ independent parties to test for safety and security, and some publish papers that describe risks of their AI research. Leaders in the field already discuss limitations, work to reduce risks, and launch initiatives that address major societal problems. Moreover, the agreement lacks ways to determine whether companies have kept their promises and hold shirkers to account. Why it matters: Although some U.S. cities and states regulate AI in piecemeal fashion, the country lacks overarching national legislation. Voluntary guidelines, if companies observe them in good faith and avoid hidden pitfalls, could ease the pressure to assert top-down control over the ways the technology is developed and deployed. We’re thinking: These commitments are a step toward guiding AI forward in ways that maximize benefits and minimize harms — even if some companies already fulfill them. Nonetheless, laws are necessary to ensure that AI’s benefits are spread far and wide throughout the world. Important work remains to craft such laws, and they’ll be more effective if the AI community participates in crafting them.
Apple Grapples With Generative AIApple insiders spoke anonymously about the company’s effort to exploit the current craze for chatbots. What’s new: Apple built a framework for large language models and used it to develop a chatbot dubbed Apple GPT — for internal use only, Bloomberg reported. Under wraps: The iPhone maker is proceeding cautiously to capitalize on the hottest tech trend since mobile. The results are not yet available to the public and may never be.
Behind the news: Apple tends to hold its technology close to its vest, but it has not placed the same emphasis on AI as peers. Its pioneering Siri voice assistant has been criticized for falling behind competitors like Amazon Alexa and Google Assistant (which, in turn, were criticized for falling behind ChatGPT). Although it has published papers on generative AI in recent years, its recent products have not emphasized the technology. Meanwhile, its big-tech rivals have been trying to outdo one another in building and deploying ever more powerful chatbots.
Why it matters: Where some companies zig, Apple often zags. Unlike its peers, it makes its money selling devices and requires tight integration between that hardware and the software that brings it to life. Such differences may make it necessary to “think different” about generative AI. We’re thinking: Apple's control over the iOS and MacOS ecosystems is a huge strength in the race to capitalize on generative AI. We hope that Apple’s generative products will be wonderful, but even if they offer little advantage over the competition, its ability to get them into users’ hands will give it a significant advantage over smaller competitors and even many large companies.
A MESSAGE FROM DEEPLEARNING.AIJoin “Building Generative AI Applications with Gradio,” our new course built in collaboration with Hugging Face. Learn to quickly build, demo, and ship models using Gradio’s user-interface tools! Sign up for free
ChatGPT Ain’t What It Used to BeIt wasn’t your imagination: OpenAI’s large language models have changed. What’s new: Researchers at Stanford and UC Berkeley found that the performance of GPT-4 and GPT-3.5 has drifted in recent months. In a limited selection of tasks, some prompts yielded better results than before, some worse.
Yes, but: Commenting on the findings, Princeton computer scientists Arvind Narayanan and Sayash Kapoor noted that performance differences reported in the paper were consistent with shifts in behavior following fine-tuning. They distinguished between a large language model’s capability (that is, what it can and can’t do given the right prompt), which is informed by pretraining, and its behavior (its response to a given prompt), which is shaped by fine-tuning. The paper showed that, while the models’ behavior had changed between March and June, this did not necessarily reflect changes in their capability. For instance, the paper’s authors asked the models to identify only prime numbers as primes; they didn’t test non-primes. Narayanan and Kapoor tested the models on non-primes and obtained far better performance. Behind the news: For months, rumors have circulated that ChatGPT’s performance had declined. Some users speculated that the service was overwhelmed by viral popularity, OpenAI had throttled its performance to save on processing costs, or user feedback had thrown the model off kilter. In May, OpenAI engineer Logan Kilpatrick denied that the underlying models had changed without official announcements. Why it matters: While conventional software infrastructure evolves relatively slowly, large language models are changing much faster. This creates a special challenge for developers, who have a much less stable environment to build upon. If they base an application on an LLM that later is fine-tuned, they may need to modify the application (for example, by updating prompts). We’re thinking: We’ve known we needed tools to monitor and manage data drift and concept drift. Now it looks like we also need tools to check whether our applications work with shifting LLMs and, if not, to help us update them efficiently.
Stratego MasterReinforcement learning agents have mastered games like Go that provide complete information about the state of the game to players. They’ve also excelled at Texas Hold ’Em poker, which provides incomplete information, as few cards are revealed. Recent work trained an agent to excel at a popular board game that, like poker, provides incomplete information but, unlike poker, involves long-term strategy. What’s new: Julien Perolat, Bart De Vylder, Karl Tuyls, and colleagues at DeepMind teamed up with former Stratego world champion Vincent de Boer to conceive DeepNash, a reinforcement learning system that reached expert-level capability at Stratego. Stratego basics: Stratego is played by two opposing players. The goal is to capture the opponent’s flag piece by moving a piece onto a space that contains it. The game starts with a deployment phase, in which the players place on a board 40 pieces that represent military ranks, as well as a flag and a bomb. The pieces face away from the opposing player, so neither one knows the other’s starting formation. The players move their pieces by turns, potentially attacking each other’s pieces by moving onto a space occupied by an opponent’s piece; which reveals the rank of the opponent’s piece. If the attacking piece has a higher rank, the attack is successful and the opponent’s piece is removed from the board. If the attacking piece has a lower rank, the attack fails and the attacking piece is removed. Key insight: A reinforcement learning agent like AlphaGo learns to play games through self-play; that is, it plays iteratively against a copy of itself, adjusts its weights according to rewards it has received, and — after an interval of learning — adopts the weights of the better-performing copy. Typically, each copy predicts the potential outcome of every possible action and chooses the one that’s most likely to confer an advantage. However, this approach can go awry if one of the copies learns to win by exploiting a vulnerability that’s idiosyncratic to the agent but not to human players. That’s where regularization can help: To prevent such overfitting and enable agents to learn a more generalized strategy, previous work showed that it helps to reward an agent for — in addition to good moves and winning — predicting the same probabilities that actions will be advantageous as an earlier version of itself. Updating this earlier version periodically enables the agent to keep improving. How it works: DeepNash comprised five U-Net convolutional neural networks. One produced an embedding based on the current state of the game board and the most recent 40 previous states. The remaining four U-Nets used the embedding as follows: (i) during training, to estimate the total future reward to be expected after executing a deployment or move, (ii) during the game’s deployment phase, to predict where each piece should be deployed, (iii) during the play phase, to select which piece to move and (iv) to decide where that piece should move.
Results: DeepNash beat the most powerful Stratego bots on the Gravon game platform, winning 97.1 percent of 800 games. It beat Gravon’s human experts 84 percent of the time, ranking third as of April 22, 2022. Along the way, it developed deceptive tactics, fooling opponents by moving less-powerful pieces as though they were more powerful and vice-versa. Why it matters: Reinforcement learning is a computationally inefficient way to train a model from scratch to find good solutions among a plethora of possibilities. But it mastered Go, a game with 10360 possible states, and it predicts protein shapes among 10300 possible configurations of amino acids. DeepNash sends the message that reinforcement learning can also handle Stratego’s astronomical number of 10535 states, even when those states are unknown. We’re thinking: DeepNash took advantage of the Stratego board’s imperfect information by bluffing. Could it have developed a theory of mind?
A MESSAGE FROM DEEPLEARNING.AIJoin our upcoming workshop on August 3, 2023, at 10:00 a.m. Pacific Time! Learn the fundamentals of reinforcement learning and how to integrate human feedback into the learning process. Register now
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|