|
Dear friends,
Tool use, in which an LLM is given functions it can request to call for gathering information, taking action, or manipulating data, is a key design pattern of AI agentic workflows. You may be familiar with LLM-based systems that can perform a web search or execute code. Indeed, some of the large, consumer-facing LLMs already incorporate these features. But tool use goes well beyond these examples.
Further, systems are being built in which the LLM has access to hundreds of tools. In such settings, there might be too many functions at your disposal to put all of them into the LLM context, so you might use heuristics to pick the most relevant subset to include in the LLM context at the current step of processing. This technique, which is described in the Gorilla paper cited below, is reminiscent of how, if there is too much text to include as context, retrieval augmented generation (RAG) systems offer heuristics for picking a subset of the text to include.
Early in the history of LLMs, before widespread availability of large multimodal models (LMMs) like LLaVa, GPT-4V, and Gemini, LLMs could not process images directly, so a lot of work on tool use was carried out by the computer vision community. At that time, the only way for an LLM-based system to manipulate an image was by calling a function to, say, carry out object recognition or some other function on it. Since then, practices for tool use have exploded. GPT-4’s function calling capability, released in the middle of last year, was a significant step toward general-purpose tool use. Since then, more and more LLMs are being developed to similarly be facile with tool use.
Both Tool Use and Reflection, which I described in last week’s letter, are design patterns that I can get to work fairly reliably on my applications — both are capabilities well worth learning about. In future letters, I’ll describe the Planning and Multi-agent collaboration design patterns. They allow AI agents to do much more but are less mature, less predictable — albeit very exciting — technologies.
Keep learning! Andrew
P.S. Learn to carry out red-teaming attacks against your own LLM-based applications to spot and patch vulnerabilities! In our new short course, “Red Teaming LLM Applications,” Matteo Dora and Luca Martial of LLM testing company Giskard teach how to simulate malicious actions to discover vulnerabilities and improve security. We start with prompt injection, which can trick an LLM into bypassing safeguards to reveal private information or say something inappropriate. There is no one-size-fits-all approach to security, but this course will help you identify some scenarios to protect against.
We believe that widespread knowledge of red-teaming capabilities will result in greater transparency and safer LLM-based systems. However, we ask you to use the skills you gain from this course ethically.
News
Microsoft Absorbs InflectionMicrosoft took over most of the once high-flying chatbot startup Inflection AI in an unusual deal. What’s new: Microsoft hired Inflection CEO Mustafa Suleyman and much of the startup’s staff and paid roughly $650 million for access to its models and legal protections, Bloomberg reported. Inflection will shift from serving consumers to focusing on large companies.
Behind the news: Inflection was co-founded in 2022 by Suleyman (a founder of DeepMind, now a division of Google), Simonyan, and LinkedIn chairman Reed Hoffman with funding partly from Microsoft. The startup initially positioned itself as a competitor to OpenAI and Anthropic, seeking to develop AI assistants for consumers. Its flagship product was Pi, a chatbot trained to provide emotional support. Microsoft CEO Satya Nadella began courting Suleyman several months ago, and Suleyman wanted to bring Inflection’s staff along with him. Microsoft made a similar offer to OpenAI in November, during that company’s leadership shakeup, when the tech giant proposed hiring briefly-ousted CEO Sam Altman and many of his co-workers to staff a new organization at Microsoft. Yes, but: The unusual nature of the deal — with Microsoft absorbing most of Inflection’s staff while leaving the startup intact as a company — may have been designed to avoid the antitrust scrutiny that comes with acquisitions. The deal doesn’t automatically trigger a review by U.S. regulators because Microsoft did not acquire Inflection assets. Microsoft’s close relationship with OpenAI has attracted attention from regulators in the U.S., UK, and EU.
Nvidia Revs AI EngineNvidia’s latest chip promises to boost AI’s speed and energy efficiency. What’s new: The market leader in AI chips announced the B100 and B200 graphics processing units (GPUs) designed to eclipse its in-demand H100 and H200 chips. The company will also offer systems that integrate two, eight, and 72 chips. How it works: The new chips are based on Blackwell, an updated chip architecture specialized for training and inferencing transformer models. Compared to Nvidia’s earlier Hopper architecture, used by H-series chips, Blackwell features hardware and firmware upgrades intended to cut the energy required for model training and inference.
Price and availability: The B200 will cost between $30,000 and $40,000, similar to the going rate for H100s today, Nvidia CEO Jensen Huang told CNBC. Nvidia did not specify when the chip would be available. Google, Amazon, and Microsoft stated intentions to offer Blackwell GPUs to their cloud customers. Behind the news: Demand for the H100 chip has been so intense that the chip has been difficult to find, driving some users to adopt alternatives such as AMD’s MI300X. Moreover, in 2022, the U.S. restricted the export of H100s and other advanced chips to China. The B200 also falls under the ban.
NEW FROM DEEPLEARNING.AIIn our new short course “Red Teaming LLM Applications,” you will learn industry-proven manual and automated techniques to proactively test, attack, and improve the robustness of your large language model (LLM) applications. Join now!
Toward Managing AI Bio RiskScientists pledged to control their use of AI to produce potentially hazardous biological materials. What’s new: More than 150 biologists in Asia, Europe, and North America signed a voluntary commitment to internal and external oversight of machine learning models that can be used to design proteins. How it works: The scientists made 10 voluntary commitments regarding synthetic biology research. They promised broadly to avoid research likely to enable harm and to promote research that responds to infectious disease outbreaks or similar emergencies.
Behind the news: The potential role of AI in producing bioweapons is a major focus of research in AI safety. The current pledge arose from a University of Washington meeting on responsible AI and protein design held late last year. The AI Safety Summit, which took place at around the same time, also addressed the topic, and Helena, a think tank devoted to solving global problems, convened a similar meeting in mid-2023. Why it matters: DeepMind’s AlphaFold, which finds the structures of proteins, has spawned models that enable users to design proteins with specific properties. Their output could help scientists cure diseases, boost agricultural production, and craft enzymes that aid industrial processes. However, their potential for misuse has led to scrutiny by national and international organizations. The biology community’s commitment to use such models safely may reassure the public and forestall onerous regulations. We’re thinking: The commitments are long on general principles and relatively short on concrete actions. We’re glad they call for ongoing revision and action, and we hope they lead to the development of effective safeguards.
More Factual LLMsLarge language models sometimes generate false statements. New work makes them more likely to stick to the facts. What’s new: Katherine Tian, Eric Mitchell, and colleagues at Stanford and University of North Carolina proposed FactTune, a procedure that fine-tunes large language models (LLMs) to increase their truthfulness without collecting human feedback. Key insight: Just as fine-tuning based on feedback has made LLMs less harmful, it can make them more factual. The typical method for such fine-tuning is reinforcement learning from human feedback (RLHF). But a combination of direct preference optimization (DPO) and reinforcement learning from AI feedback (RLAIF) is far more efficient. DPO replaces cumbersome reinforcement learning with a simpler procedure akin to supervised learning. RLAIF eliminates the cost of collecting human feedback by substituting model-generated preferences for human preferences. How it works: The authors built models designed to deliver factual output within a specific domain.
Results: Fine-tuning by the authors’ method improved the factuality of models in two domains.
Why it matters: LLMs are known to hallucinate, and the human labor involved in fact checking their output is expensive and time-consuming. The authors applied well tested methods to improve the factuality of texts while keeping human involvement to a minimum. We’re thinking: This work, among others, shows how LLMs can bootstrap their way to better results. We’ve only just begun to explore combinations of LLMs working together as well as individual LLMs working iteratively in an agentic workflow.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|