Dear friends,
Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress this year: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. In this letter, I'd like to discuss Reflection. For a design pattern that’s relatively quick to implement, I've seen it lead to surprising performance gains.
You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection.
Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. After that, we can prompt it to reflect on its own output, perhaps as follows:
Here’s code intended for task X: [previously generated code]
Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback and (iii) ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions. And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement.
Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two different agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses.
Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results in a few cases. I hope you will try it in your own work. If you’re interested in learning more about reflection, I recommend these papers:
I’ll discuss the other agentic design patterns in future letters.
Keep learning! Andrew
P.S. New JavaScript short course! Learn to build full-stack web applications that use RAG in “JavaScript RAG Web Apps with LlamaIndex,” taught by Laurie Voss, VP of Developer Relations at LlamaIndex and co-founder of npm.
NewsOne Agent, Many EnvironmentsAI agents are typically designed to operate a particular software environment. Recent work enabled a single agent to take actions in a variety of three-dimensional virtual worlds. What's new: A team of 90 people at Google and University of British Columbia announced Scalable Instructable Multiworld Agent (SIMA), a system that learned to follow text instructions (such as “make a pile of rocks to mark this spot” or “see if you can jump over this chasm”) in seven commercial video games and four research environments. How it works: SIMA’s architecture consists of several transformers and a vanilla neural network. The authors trained it to mimic human players using a dataset of gameplay broken into 10 second tasks, including onscreen images, text instructions, keyboard presses, and mouse motions. The video games included Goat Simulator 3 (a third-person game in which the player takes the form of a goat), No Man’s Sky (a first- or third-person game of exploration and survival in outer space), Hydroneer (a first-person game of mining and building), and others.
Results: Judges evaluated SIMA’s success or failure at completing nearly 1,500 instructions that spanned tasks in nine categories like action (“jump”), navigation (“go to your ship”), and gathering resources (“get raspberries”). In Goat Simulator 3, SIMA completed 40 percent of the tasks. In No Man’s Sky, the judges compared SIMA’s performance to that of the human players whose gameplay produced the training data. SIMA was successful 34 percent of the time, while the players were successful 60 percent of the time. Judges also compared SIMA to versions that were trained to be experts in a single game. SIMA was successful more than 1.5 times more often than the specialized agents. Behind the news: SIMA extends Google’s earlier successes building agents that rival or beat human players at individual games including Go, classic Atari games, and StarCraft II. Why it matters: Training agents to follow directions in various environments, seeing the same things humans would, is a step toward building instructable agents that can work in any situation. The authors point to potential applications in robotics, simulations, and gaming; wherever an agent might need to be guided through diverse challenges. We're thinking: This work shows that an agent trained on multiple games can perform better than an agent trained on just one, and that the richer the language inputs in a gameworld, the better the agent can perform. With only a handful of training environments under its belt, SIMA doesn’t demonstrate superhuman performance, but it gets the job done a surprising amount of the time!
NEW FROM DEEPLEARNING.AIJoin our short course on “JavaScript RAG Web Apps with LlamaIndex” to learn how to build full-stack JavaScript web applications that let you chat with your data. Harness the capabilities of large language models and retrieval augmented generation (RAG)! Enroll for free
U.S. Deploys AI-Assisted TargetingThe United States military is using computer vision to target enemy positions in the Red Sea and elsewhere.
Behind the news: Google initially developed Maven for the U.S. Defense Department around 2017. Palantir inherited the project after Google, facing protests by employees who did not want to contribute to government intelligence systems, declined to renew its contract in 2018. The U.S. military now has more than 800 active AI projects with a wide range of technology partners and contractors. Other countries are deploying similar technology: Israel and Ukraine have used AI-assisted targeting in their ongoing conflicts. Yes, but: Some U.S. military experts worry about Maven’s accuracy. In tests, Maven successfully identified objects about 60 percent of the time, while human analysts working with the 18th Airborne Corps did so 84 percent of time. Moreover, the system’s training data emphasizes deserts, and its success rate drops in other types of environments. We’re thinking: Automated targeting is increasingly used in military applications, and less-sophisticated systems have been in use for decades. However, humans should always be in control of decisions to fire. We support a global ban on fully autonomous weapons.
Robo-Football From Simulation to RealityHumanoid robots can play football (known as soccer in the United States) in the real world, thanks to reinforcement learning. What’s new: Tuomas Haarnoja and colleagues at Google and University of Oxford trained an agent to play one-on-one football in a simulated environment. They applied the agent to 20-inch hardware robots on a scaled-down field. You can see it in action here. Key insight: In reinforcement learning, an agent improves as it explores various motions. However, such exploration risks damaging expensive hardware. By training in a simulation, the agent can attempt a diversity of motions without risking a physical robot. Once the agent is trained, it can make the leap from simulation to reality. How it works: The agent learned in a virtual world to control the robot’s motion given (i) a simulated robot’s state (including the position, velocity, and acceleration of each of 20 joints), (ii) the current game state (including the location and velocity of the ball and opponent), (iii) the game state at each of the last five time steps, and (iv) the agent’s five previous actions. Training proceeded via reinforcement learning in two stages.
Results: The agent learned not only to turn and kick but also to anticipate the ball’s motion and block an opponent’s shots. It scored penalties against a stationary goalie with 90 percent success in simulation and 70 percent success in the physical world. It stood up in 0.9 seconds on average, while a manually designed agent stood up in 2.5 seconds. Its maximum walking speed of 0.69 meters per second beat the manually designed agent’s 0.27 meters per second. However, its kicks propelled the ball at 2.0 meters per second on average, slower than the manually designed agent’s 2.1 meters per second. Why it matters: Controlling humanoid robots is challenging, as they’re less stable than quadrupeds. Just getting them to do one type of motion, such as jumping, can require dedicated research. This work drives humanoid robots in complex motions by combining established training methods: training in a noisy simulation, self-play, and using teacher agents to reward particular actions. We’re thinking: This work demonstrates that robots get a kick out of machine learning.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|