Dear friends,
Multi-agent collaboration is the last of the four key AI agentic design patterns that I’ve described in recent letters. Given a complex task like writing software, a multi-agent approach would break down the task into subtasks to be executed by different roles — such as a software engineer, product manager, designer, QA (quality assurance) engineer, and so on — and have different agents accomplish different subtasks.
In many companies, managers routinely decide what roles to hire, and then how to split complex projects — like writing a large piece of software or preparing a research report — into smaller tasks to assign to employees with different specialties. Using multiple agents is analogous. Each agent implements its own workflow, has its own memory (itself a rapidly evolving area in agentic technology: how can an agent remember enough of its past interactions to perform better on upcoming ones?), and may ask other agents for help. Agents can also engage in Planning and Tool Use. This results in a cacophony of LLM calls and message passing between agents that can result in very complex workflows.
Keep learning! Andrew
P.S. Large language models (LLMs) can take gigabytes of memory to store, which limits your ability to run them on consumer hardware. Quantization can reduce model size by 4x or more while maintaining reasonable performance. In our new short course “Quantization Fundamentals,” taught by Hugging Face's Younes Belkada and Marc Sun, you’ll learn how to quantize LLMs and how to use int8 and bfloat16 (Brain Float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library. You’ll also dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers. I hope you’ll check it out!
NewsCustom Agents, Little CodingGoogle is empowering developers to build autonomous agents using little or no custom code. What’s new: Google introduced Vertex AI Agent Builder, a low/no-code toolkit that enables Google’s AI models to run external code and ground their responses in Google search results or custom data.
Behind the news: Vertex AI Agent Builder consolidates agentic features that some of Google’s competitors have rolled out in recent months. For instance, OpenAI’s Assistants API lets developers build agents that respond to custom instructions, retrieve documents (limited by file size), call functions, and access a code interpreter. Anthropic recently launched Claude Tools, which lets developers instruct Claude language models to call customized tools. Microsoft’s Windows Copilot and Copilot Builder can call functions and retrieve information using Bing search and user documents stored via Microsoft Graph. Why it matters: Making agents practical for commercial use can require grounding, tool use, multi-agent collaboration, and other capabilities. Google’s new tools are a step in this direction, taking advantage of investments in its hardware infrastructure as well as services such as search. As tech analyst Ben Thompson writes, Google’s combination of scale, interlocking businesses, and investment in AI infrastructure makes for a compelling synergy. We’re thinking: Big-tech offerings like Vertex Agent Builder compete with an expanding universe of open source tools such as AutoGen, CrewAI, and LangGraph. The race is on to provide great agentic development frameworks!
Hallucination Creates Security HolesLanguage models can generate code that erroneously points to software packages, creating vulnerabilities that attackers can exploit. What’s new: A cybersecurity researcher noticed that large language models, when used to generate code, repeatedly produced a command to install a package that was not available on the specified path, The Register reported. He created a dummy package of the same name and uploaded it to that path, and developers duly installed it.
Testing: Lanyado tested Cohere AI’s Coral, Google’s Gemini Pro, and OpenAI’s GPT-4 and GPT-3.5. His aim was to determine how often they hallucinated packages and how often they referred repeatedly to the same hallucinated package. First he collected roughly 47,000 “how to” questions related to over 100 subjects in Go, .NET, Node.js, Python, and Ruby. Then he identified questions that produced hallucinated packages from a zero-shot prompt. He selected 20 of these questions at random and prompted each model 100 times to see whether it would refer to the same package every time.
Why it matters: Lanyado’s method is not known to have been used in an attack, but it may be only a matter of time given its similarity to hacks like typosquatting, dependency confusion, and masquerading. We’re thinking: Improved AI-driven coding tools should help to address this issue. Meanwhile, the difference between a command like pip install huggingface-cli and pip install -U "huggingface_hub[cli]" is subtle. In cases like this, package providers can look out for potential doppelgangers and warn users from being misled.
NEW FROM DEEPLEARNING.AIIn the short course “Quantization Fundamentals with Hugging Face,” you’ll learn how to cut the computational and memory costs of AI models through quantization. Learn to quantize nearly any open source model! Join today
GPT Store Shows Lax ModerationOpenAI has been moderating its GPT Store with a very light touch. What’s new: In a survey of the GPT Store’s offerings, TechCrunch found numerous examples of custom ChatGPT instances that appear to violate the store’s own policies. How it works: The GPT Store has a low bar for entry by design — any paid ChatGPT user can create a custom-prompted variation of the chatbot, known as a GPT, and include it in the store. The store lists GPTs in several categories, such as Writing, Productivity, Programming, and Lifestyle. While many are useful, some are questionable.
Behind the news: OpenAI launched the GPT Store in January. Since then, users have uploaded more than 3 million GPTs that include enhanced search engines, creative writing aids, and tools that produce short videos. The most popular GPTs have millions of downloads. Despite its “store” name, the GPT Store’s contents are free to download. OpenAI is piloting a program in which U.S.-based uploaders of popular GPTs can earn money. Why it matters: The GPT Store is the chatbot era’s answer to Apple’s App Store or Android’s Google Play Store. If it succeeds, it could democratize chatbot development just as the App Store helped to popularize building smartphone applications. How OpenAI moderates the store may have real financial and reputational impacts on developers in the years ahead.
Tuning LLMs for Better RAGRetrieval-augmented generation (RAG) enables large language models to generate better output by retrieving documents that are relevant to a user’s prompt. Fine-tuning further improves RAG performance. What’s new: Xi Victoria Lin, Xilun Chen, Mingda Chen, and colleagues at Meta proposed RA-DIT, a fine-tuning procedure that trains an LLM and retrieval model together to improve the LLM’s ability to capitalize on retrieved content. Retrieval augmented generation (RAG) basics: When a user prompts an LLM, RAG supplies documents that are relevant to the prompt. A separate retrieval model computes the probability that each chunk of text in a separate dataset is relevant to the prompt. Then it grabs the chunks with the highest probability and provides them to the LLM to append to the prompt. The LLM generates each token based on the chunks plus the prompt and tokens generated so far. Key insight: Typically LLMs are not exposed to retrieval-augmented inputs during pretraining, which limits how well they can use retrieved text to improve their output. Such methods have been proposed, but they’re costly because they require processing a lot of data. A more data-efficient, and therefore compute-efficient, approach is to (i) fine-tune the LLM to better use retrieved knowledge and then (ii) fine-tune the retrieval model to select more relevant text. How it works: The authors fine-tuned Llama 2 (65 billion parameters) and DRAGON+, a retriever. They call the system RA-DIT 65B.
Results: On average, across four collections of questions from datasets such as MMLU that cover topics like elementary mathematics, United States history, computer science, and law, RA-DIT 65B achieved 49.1 percent accuracy, while the combination of LLaMA 2 65B and DRAGON+ without fine-tuning achieved 45.1 percent accuracy, and LLaMA 2 65B without retrieval achieved 32.9 percent accuracy. When the input included five examples that showed the model how to perform the task, RA-DIT 65B achieved 51.8 percent accuracy, LLaMA 2 65B combined with DRAGON+ achieved 51.1 percent accuracy, and LLaMA 2 65B alone achieved 47.2 percent accuracy. On average, over eight common-sense reasoning tasks such as ARC-C, which involves common-sense physics such as the buoyancy of wood, RA-DIT 65B achieved 74.9 percent accuracy, LLaMA 2 65B with DRAGON+ achieved 74.5 percent accuracy, and LLaMA 2 achieved 72.1 percent accuracy. Why it matters: This method offers an inexpensive way to improve LLM performance with RAG. We’re thinking: Many developers have found that putting more effort into the retriever, to make sure it provides the most relevant text, improves RAG performance. Putting more effort into the LLM helps, too.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|