Dear friends,
Parallel agents are emerging as an important new direction for scaling up AI. AI capabilities have scaled with more training data, training-time compute, and test-time compute. Having multiple agents run in parallel is growing as a technique to further scale and improve performance.
It is difficult for a human manager to take a complex task (like building a complex software application) and break it down into smaller tasks for human engineers to work on in parallel; scaling to huge numbers of engineers is especially challenging. Similarly, it is also challenging to decompose tasks for parallel agents to carry out. But the falling cost of LLM inference makes it worthwhile to use a lot more tokens, and using them in parallel allows this to be done without significantly increasing the user’s waiting time.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AIAutomate knowledge graph construction using AI agents. Implement a multi-agent system that captures goals, selects relevant data, generates schemas for structured and unstructured sources, and constructs and connects the resulting graphs into a queryable knowledge graph. Enroll for free
News
Proactive AI Assistance for Phones
Google’s latest smartphone sports an AI assistant that anticipates the user’s needs and presents helpful information without prompting.
What’s new: Google unveiled its Pixel 10 along with an AI-powered system called Magic Cue. During calling, texting, and other interactions, the system automatically delivers relevant information — dates, times, names, locations, weather, photos, airline booking numbers, and so on — culled from compatible applications.
How it works: Magic Cue takes advantage of an updated version of Gemini Nano and runs on the Pixel 10’s newly upgraded Tensor G5 AI processor. The system tracks user behavior and provides relevant information proactively.
Behind the news: Google has been especially aggressive in building AI into phones. In 2021, it replaced the Qualcomm Snapdragon chip that had run AI inference on Pixel phones with its own Tensor chip, which combined a GPU, CPU, Tensor processing unit, and security subsystem. Three years later, the Pixel 8’s Tensor G3 chip provided the muscle for AI-enabled audio and video editing — but those capabilities were features within applications. Equipped with the new Tensor G5, Pixel 10 integrates AI with the operating system and applications to provide new kinds of capabilities.
Why it matters: Enabling edge devices to run powerful AI models has been a longstanding goal of big tech companies. But a smartphone’s relatively meager computational, storage, and battery resources have presented serious challenges. The combination of Gemini Nano and the Tensor G5 chip gives Google a strong foundation to keep pushing the limits of edge AI, and its control of the Android operating system gives it tremendous market power to promote its models.
We’re thinking: Apple has noticed Google’s progress. It’s reportedly negotiating with Google to use Gemini technology for its Siri AI assistant.
Mistral Measures LLM Consumption of Energy, Water, and Materials
The French AI company Mistral measured the environmental impacts of its flagship large language model.
What’s new: Mistral published an environmental analysis of Mistral Large 2 (123 billion parameters) that details the model’s emission of greenhouse gases, consumption of water, and depletion of resources, taking into account all computing and manufacturing involved. The company aims to establish a standard for evaluating the environmental impacts of AI models. The study concluded that, while individual uses of the model have little impact compared to, say, using the internet, aggregate use takes a significant toll on the environment.
How it works: Mistral tracked the model’s operations over 18 months. It tallied impacts caused by the building of data centers, manufacturing and transporting servers, training and running the model, the user’s equipment, and indirect impacts of using the model. The analysis followed the Frugal AI methodology developed by Association Française de Normalisation, a French standards organization. Environmental consultancies contributed to the analysis, and environmental auditors peer-reviewed it.
Yes, but: Mistral acknowledged a few shortcomings of the study. It struggled to calculate some impacts due to the lack of data and established standards. For instance, a reliable assessment of the environmental impact of GPUs is not available.
Behind the news: Mistral’s report follows a string of studies that assess AI’s environmental impact.
Why it matters: AI consumes enormous amounts of energy and water, and finding efficient ways to train and run models is critical to ensure that the technology can benefit large numbers of people. Mistral’s approach provides a standardized approach to assessing the environmental impacts. If it’s widely adopted, it could help researchers, businesses, and users compare different models, work toward more environmentally friendly AI, and potentially reduce overall impacts.
We’re thinking: Data centers and cloud computing are responsible for 1 percent of the world’s energy-related greenhouse gas emissions, according to the International Energy Agency. That’s a drop in the bucket compared to agriculture, construction, or transportation. Nonetheless, having a clear picture of AI’s consumption of resources can help us manage them more effectively as demand rises. It's heartening that major AI companies are committed to using and developing sustainable energy sources and using them efficiently, and the environmental footprint of new AI models and processors is falling steadily.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered DeepSeek releasing V3.1 with hybrid thinking modes and Perplexity launching a subscription that pays publishers for AI traffic. Subscribe today!
Robot Antelope Joins Herd
Researchers in China disguised a quadruped robot as a Tibetan antelope to help study the animals close-up.
What’s new: The Chinese Academy of Sciences teamed with Hangzhou-based Deep Robotics and the state news service Xinhua to introduce a robot into a herd that lives in the Hoh Xil National Nature Reserve, a mountainous area where the elevation is above 14,000 feet. The robot enables scientists to observe the shy antelopes without disturbing them.
How it works: The mechanical beast is a Deep Robotics X30 covered with an antelope’s hide. The X30, which is designed for industrial inspections and search-and-rescue tasks, is well suited to the region’s rugged terrain and conditions. It can climb open-riser staircases, function at temperatures between -20° and 55° Celsius, and resist dust and water according to ratings established by the International Electrotechnical Commission. Its vision system is designed to operate in dim or very bright light.
Behind the news: Human observation can disrupt animal behavior, so the study of animals in their natural habitat relies mostly on camera traps and drones. Increasingly, biologists are experimenting with robots mocked up to look like animals.
Why it matters: Applying AI to robotic perception, locomotion, and dexterity opens a wide range of applications. Case in point: Deep Robotics’ PPO training enables its robots to navigate difficult environments (like climbing uneven staircases) and respond to dynamic challenges (like being kicked down the stairs). Such capabilities are valuable not only in domestic and industrial uses but also research situations like observing antelope behavior.
We’re thinking: Robotics is making impressive strides!
Better Image Processing Through Self-Supervised Learning
DINOv2 showed that a vision transformer pretrained on unlabeled images could produce embeddings that are useful for a wide variety of tasks. Now it has been updated to improve the performance of its embeddings in segmentation and other vision tasks.
What’s new: Oriane Siméoni and colleagues at Meta, World Resources Institute, and France’s National Institute for Research in Digital Science and Technology released the weights and training code for DINOv3, a self-supervised model that updates the previous version with 6 times more parameters trained on more data plus a new loss function.
Key insight: Vision transformers trained in a self-supervised fashion — such as feeding them unlabeled images with missing patches and training them to fill in the blanks — yield uneven results beyond a certain number of training steps. Further training increases performance on tasks that depend on analyzing an image globally, like classification and face recognition, but degrades it in tasks that concentrate on portions of an image, like image segmentation and depth estimation. The DINOv3 team discovered the reason: The model’s embeddings of random patches become more similar as training continues. To counteract this, they used the model trained up to that point as a teacher and trained successive versions to avoid producing patch embeddings that were more similar to one another than the teacher’s embeddings were.
How it works: The building of DINOv3 followed that of its predecessor DINOv2 but added a new loss term.
Results: The authors adapted the trained embedding model for various uses by adding separate linear layers and training them on tasks including segmentation and classification.
Why it matters: Unsupervised learning is important in visual AI because image and video data are more common than image-text and video-text data. The additional loss term enabled the team to use this more plentiful data to improve performance on both globally and locally focused tasks.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|