Dear friends,
I am so sorry that the U.S. is letting down our friends and allies. Broad tariffs, implemented not just against adversaries but also steadfast allies, will damage the livelihoods of billions of people, create inflation, make the world more fragmented, and leave the U.S. and the world poorer. AI isn’t the solution to everything, but even amidst this challenging environment, I hope our community can hold together, keep building friendships across borders, keep sharing ideas, and keep supporting each other.
Much has been written about why high, widespread taxes on imports are harmful. In this letter, I’d like to focus on its possible effects on AI. One silver lining of the new tariffs is that they focus on physical imports, rather than digital goods and services, including intellectual property (IP) such as AI research inventions and software. IP is difficult to tax, because each piece of IP is unique and thus hard to value, and it moves across borders with little friction via the internet. Many international AI teams collaborate across borders and timezones, and software, including specifically open source software, is an important mechanism for sharing ideas. I hope that this free flow of ideas remains unhampered, even if the flow of physical goods is.
However, AI relies on hardware, and tariffs will slow down AI progress by restricting access to it. Even though a last-minute exception was made for semiconductors, taxing imports of solar panels, wind turbines, and other power-generation and -distribution equipment will diminish the ability to provide power to U.S. data centers. Taxing imports of servers, cooling hardware, networking hardware, and the like will also make it more expensive to build data centers. And taxing consumer electronics, like laptops and phones, will make it harder for citizens to learn and use AI.
With regard to data-center buildouts, another silver lining is that, with the rise of generative AI, data gravity has decreased because compute processing costs are much greater than transmission costs, meaning it’s more feasible to place data centers anywhere in the world rather than only in close proximity to end-users. Even though many places do not have enough trained technicians to build and operate data centers, I expect tariffs will encourage data centers to be built around the world, creating more job opportunities globally.
Finally, tariffs will create increased pressure for domestic manufacturing, which might create very mild tailwinds for robotics and industrial automation. As U.S. Vice President J.D. Vance pointed out in 2017, the U.S. should focus on automation (and education) rather than on tariffs. But the U.S. does not have the personnel — or know-how, or supply chain — to manufacture many of the goods that it currently counts on allies to make. Robotics can be helpful for addressing a small part of this large set of challenges. Generative AI's rate of progress in robotics is also significantly slower than in processing text, visual data, audio, and reasoning. So while the tariffs could create tailwinds for AI- enabled robotics, I expect this effect to be small.
My 4-year-old son had been complaining for a couple of weeks that his shoes were a tight fit — he was proud that he’s growing! So last Sunday, we went shoe shopping. His new shoes cost $25, and while checking out, I paused and reflected on how lucky I am to be able to afford them. But I also thought about the many families living paycheck-to-paycheck, and for whom tariffs leading to shoes at $40 a pair would mean they let their kids wear ill-fitting shoes longer. I also thought about people I’ve met in clothing manufacturing plants in Asia and Latin America, for whom reduced demand would mean less work and less money to take home to their own kids.
I don’t know what will happen next with the U.S. tariffs, and plenty of international trade will happen with or without U.S. involvement. I hope we can return to a world of vibrant global trade with strong, rules-based, U.S. participation. Until then, let’s all of us in AI keep nurturing our international friendships, keep up the digital flow of ideas — including specifically open source software — and keep supporting each other. Let’s all do what we can to keep the world as connected as we are able.
Love, Andrew
A MESSAGE FROM DEEPLEARNING.AICourse 3 of the Data Analytics Professional Certificate is live! Learn to use Python, the most important coding language in data analytics, to analyze real-world datasets, create visualizations, run tests, and apply AI tools to debug and accelerate your code. Enroll now
News
Ordinary LLMs Implicitly Take Reasoning StepsEven without explicit training in reasoning, large language models “think” in ways that may be more deliberate than previously understood. What’s new: Emmanuel Ameisen and colleagues at Anthropic devised a method to study how transformers generate responses to specific prompts. They also studied Claude 3.5 Haiku’s responses to specific prompts and found that the model, which is not trained to generate chains of thought, nonetheless appeared to take reasoning steps via its neuron activations. Key insight: A viable alternative to a fully connected layer is a cross-layer transcoder, which has two layers. The outputs of the larger first layer are sparse, which makes them interpretable “features,” or individual values that correspond to concepts. By mapping an input to highly activated features, we can identify the concepts that determine the model’s output. How it works: The team replaced fully connected layers in Claude 3.5 Haiku with cross-layer transcoders and interpreted their features.
Results: The authors built graphs that show how Claude 3.5 Haiku computes its output over a number of selected prompts.
Behind the news: Last year, Google trained models to examine individual features in Gemma 2. Before that, Anthropic used similar methods to interpret Claude 3 Sonnet’s middle layer. Why it matters: Apparently Claude 3.5 Haiku — and presumably other large language models — spontaneously perform implicit reasoning steps without being prompted to do so. Anthropic’s method reveals not only whether a model reasons or takes a shortcut, but also what it truly does well and what it only professes to do well. We’re thinking: The authors’ approach to examining how large language models generate output is interesting. We wonder whether even pre-transformer vanilla neural networks would appear to perform some sort of “reasoning” if we were to interpret them in a similar way.
Llama’s Mixture of Vision-Language ExpertsMeta updated its popular open-weights models, claiming performance superior to closed competitors in three size classes.
How it works: The team pretrained Llama 4 models on images and text in over 200 languages from publicly available and licensed data, including data from publicly shared posts on Facebook and Instagram. They trained Llama 4 Scout on 40 trillion tokens and Llama 4 Maverick on 22 trillion tokens.
Results: In tests performed by Meta, Llama 4 models showed strong performance relative to competing models — mostly not mixtures of experts, but some that are known to have higher parameter counts relative to Llama 4 models’ active parameters.
Yes, but: An experimental version of Llama 4 Maverick reached second place in Chatbot Arena behind Gemini 2.5 Pro. However, it was a variation optimized for conversation, not the currently available version. AI researchers accused Meta of attempting to manipulate the leaderboard. Why it matters: Although the version of Llama 4 Maverick that nearly topped the Chatbot Arena is not the released version, its accomplishment says a lot about the growing power of open weights. Open models are quickly reaching parity with closed competitors — a boon to developers, businesses, and society at large. We’re thinking: According to Meta, Behemoth beats GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro, topping all but the best reasoning models — but it isn’t available yet. Something to look forward to!
Learn More About AI With Data Points!AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six very brief news stories. This week, we covered OpenAI’s promise to be more open and Meta’s release of a new herd of AI models. Subscribe today!
Better Multimodal Performance With Open WeightsAlibaba’s latest open-weights system raises the bar for multimodal tasks in a relatively small model. What’s new: Alibaba released Qwen2.5-Omni 7B.
How it works: Qwen2.5-Omni 7B comprises a pretrained text transformer (Qwen 2.5 7B), pretrained vision encoder (Qwen2.5-VL), pretrained audio encoder (Whisper-large-v3), speech transformer, and audio decoder (a transformer plus BigVGAN), along with corresponding adapters of undisclosed architecture.
Results: The authors compared Qwen2.5-Omni 7B to similarly sized models. It performed especially well on audio-to-text, image-to-text, and video-to-text tasks. However, it performed less well on text-to-text and text-to-speech tasks.
Behind the news: Multimodal systems with open weights are multiplying. For instance, AnyGPT (open weights, training, and inference code) accepts and generates speech, text, images, and music. Similarly, Mini-Omni2 (open weights and inference code) accepts and generates text, speech, and images. Why it matters: Multimodal models typically show steep degradation on measurements of instruction-following when shifting from voice to text, but Qwen2.5-Omni does not. As the world moves toward voice-to-voice interactions, open systems that deliver performance comparable to that of closed competitors accelerate progress towards better conversations. We’re thinking: The Qwen team is on fire! Alibaba’s steady stream of highly capable open-weights models is a gift to AI developers.
Better Than Trees for Tabular DataIf you have a collection of variables that represent, say, a cancer patient and you want to classify the patient’s illness as likely cancer or not, algorithms based on decision trees, such as gradient-boosted trees, typically perform better than neural networks. A transformer tailored to tabular data could change this situation.
Results: The authors tested the system on 29 classification datasets and 28 regression datasets from the AutoML benchmark and OpenML-CTR23. Each dataset contained up to 10,000 rows, 500 columns, and 10 classes. They compared TabPFN to the popular gradient-boosted tree approaches CatBoost, LightGBM, and XGBoost.
Yes, but: The authors’ method is slower than decision tree methods with respect to inference. To process a 10,000-row dataset, TabPFN required 0.2 seconds while CatBoost took 0.0002 seconds.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|