|
Dear friends,
Is there an AI bubble? With the massive number of dollars going into AI infrastructure such as OpenAI’s $1.4 trillion plan and Nvidia briefly reaching a $5 trillion market cap, many have asked if speculation and hype have driven the values of AI investments above sustainable values. However, AI isn’t monolithic, and different areas look bubbly to different degrees.
Caveat: I am absolutely not giving investment advice!
I predicted early last year that we’d need more inference capacity, partly because of agentic workflows. Since then, the need has become more acute. As a society, we need more capacity for AI inference!
Having said that, I’m not saying it’s impossible to lose money investing in this sector. If we end up overbuilding — and I don’t currently know if we will — then providers may end up having to sell capacity at a loss or at low returns. I hope investors in this space do well financially. The good news, however, is that even if we overbuild, this capacity will get used, and it will be good for application builders!
Andrew
A MESSAGE FROM DEEPLEARNING.AIIn Agentic AI, taught by Andrew Ng, you’ll learn to design multi-step, autonomous workflows in raw Python. The course covers fundamental agentic design patterns: reflection, tool use, planning, and multi-agent collaboration. Available exclusively at DeepLearning.AI. Enroll now!
News
Google Dominates Arena Leaderboards (For the Moment)
Google introduced Gemini 3 Pro and Nano Banana Pro, its flagship vision-language and image-generation models, and deployed them to billions of users worldwide.
Gemini 3 Pro: A multimodal reasoning model, Gemini 3 Pro leads LMArena’s Text, WebDev, and Vision leaderboards as of this writing. The update replaces Gemini 2.5’s budget of tokens allocated to reasoning with reasoning-level setting (low, medium, or high), which Google says is simpler to manage.
Yes, but: Gemini 3 Pro uses a lot of tokens to achieve its outstanding performance. Completing the Artificial Analysis Intelligence Index, a weighted average of 10 benchmarks, cost $1,201, second only to Grok 4 ($1,888). It also produces incorrect output when it could defer. Tested on the Artificial Analysis Omniscience Hallucination Rate, the proportion of wrong answers out of all non-correct attempts including refusals, Gemini 3 Pro (88 percent) was far higher than Claude Sonnet 4.5 (48 percent) and GPT 5.1 High (5 percent).
Nano Banana Pro: Google also launched Nano Banana Pro (also known as Gemini 3 Pro Image), which currently tops Artificial Analysis’ Text-to-Image and Image Editing leaderboards. Nano Banana Pro uses Gemini 3 Pro’s reasoning and knowledge when producing and editing images, generating up to two intermediate images to refine composition and logic before producing the final image. It’s designed to excel at text generation and to maintain up to 5 consistent characters across multiple generations. It grounds images using Google search to make factually accurate infographics, maps, and the like and translates or alters text within images while preserving artistic style.
Behind the news: Google rolled out Gemini 3 Pro and Nano Banana Pro more broadly than Anthropic’s August launch of Claude Opus 4.1 or OpenAI’s early-November launch of GPT-5.1. Rather than leading with an API and a handful of new apps, Google pushed its new models into services that reach over 2 billion people each month, including Google Search’s AI Overview, Gmail, Docs, Sheets, and Android. At the same time, it launched Antigravity, an agentic coding platform that competes with tools like Cursor and Claude Code.
Why it matters: After trailing OpenAI and Anthropic on many benchmarks for months, Google now leads on many of them (despite a partial upset by Claude Opus 4.5, which arrived a week later). For developers who are evaluating which model to use, this could change their default option. Broadly, benchmark leadership has shifted multiple times in 2025, which suggests that no single company has established a durable technical lead.
We’re thinking: While Gemini 3 Pro defines the state of the art for more than a dozen popular benchmarks — this week, at least! — Google’s market power and edge in distribution may matter more. Its ability to deploy to billions of users instantly through its established products provides a wide moat that most competitors, apart from Apple with its iPhone empire, may find difficult to traverse purely by releasing better models.
Microsoft and Anthropic Form Alliance
Having recently revised its agreement with longtime partner OpenAI, Microsoft pledged to invest billions of dollars in Anthropic, one of OpenAI’s top competitors.
What’s new: Microsoft, Anthropic, and Nvidia formed a partnership. Microsoft and Nvidia will invest up to $10 billion and $5 billion, respectively, in Anthropic. Microsoft will make Anthropic models available on its cloud platform, and Anthropic will purchase $30 billion of inference processing on Microsoft’s infrastructure. Further terms, including whether some of the investments are optional or conditional on Anthropic’s performance, were undisclosed.
How it works: The deal makes Anthropic’s Claude the only top model family to be available on all three leading cloud services: Microsoft, Google, and Amazon. It also gives Anthropic’s valuation a big boost.
Behind the news: Microsoft’s 2022 partnership with OpenAI set the stage for Anthropic’s 2023 alliance with Amazon, matching one startup AI company with an established cloud provider. But Anthropic’s later agreements with Google and OpenAI’s recapitalization and restructuring of its relationship with Microsoft made it easier for Microsoft and Anthropic to find common ground.
Why it matters: A few years ago, OpenAI was the rising AI star in need of processing power, and Microsoft needed both technology to compete with peers and customers for its Azure platform. Their partnership, in which Microsoft invested $17 million over a few rounds, served both companies. Today, however, OpenAI needs more processing power than Microsoft will provide, while Microsoft needs to diversify its AI offerings. Meanwhile, Anthropic’s models have become so popular, especially among the business customers that Microsoft typically caters to, that they make a good match for Microsoft’s cloud offerings. An investment in Anthropic, even at a heightened valuation, puts Microsoft (and Nvidia) in line to benefit as AI continues to go mainstream.
We’re thinking: Wheeling and dealing aside, developers increasingly have access to the model they want, on the cloud platform they want. This is good news for everyone who hates being locked into a single choice.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Meta’s new SAM 3 model and the full open-source release of the Olmo 3 development pipeline. Subscribe today!
Record Labels Back AI-Music Startup
A music-generation newcomer emerged from stealth mode with licenses to train generative AI models on music controlled by the world’s biggest recording companies.
What’s new: Klay Vision, based in Los Angeles, became the first AI company to sign licensing agreements with all three major record labels — Sony Music Entertainment (SME), Universal Music Group (UMG), and Warner Music Group (WMG) — and the publishing companies that own the rights to the underlying compositions their recordings are based on. The agreements, whose financial terms are undisclosed, authorize Klay to train generative models on music whose copyrights are owned by those companies. The startup plans to launch a subscription streaming platform that enables listeners to customize existing music while compensating copyright owners, and it aims to cut similar deals with independent record labels, publishers, artists, and songwriters.
How it works: Unlike music generators that produce original music according to a text prompt, Klay’s system will allow users to alter existing recordings interactively, for instance, changing their mix or style, in a manner the company calls “active listening.”
Behind the news: The partnership between Klay and the music-industry powers follows years of litigation in which copyright owners have sued AI companies over alleged copyright violations.
Why it matters: The market for AI-generated music is still taking shape, but it has a promising future judging by events to date. Suno, for the time being, aims to build a market for generated music under the assumption that training AI systems on copyright-protected recordings is fair use, which will require a court decision or change in the law to confirm. Klay’s strategy contrasts sharply with that approach. Instead, Klay focused on obtaining licenses and compensating copyright owners, which gives it legal protection against claims of copyright infringement as well as goodwill and support from the music industry.
We’re thinking: The difference between music-generation pioneers and Klay echoes the situation circa 2000, when a startup called Napster gave to music fans the means to distribute music files, which it claimed was fair use. Apple launched iTunes in 2001 as an industry-friendly distribution service that provided a legitimate alternative. iTunes made it easier for listeners to play what they wanted to hear, it gave copyright owners revenue, and the industry welcomed it. Similarly, Klay aims to give the music industry a way to make money on generated music in a way that complements, rather than cannibalizes, its existing business.
Toward Steering LLM Personality
Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.
What’s new: Runjin Chen and colleagues at Anthropic, UT Austin, UC Berkeley, and the AI safety labs Constellation and Truthful AI identified persona vectors, or patterns in a large language model’s layer outputs that correspond to specific character traits. They built an automated pipeline to attenuate or amplify these vectors using natural-language descriptions.
Key insight: Averaging the outputs of a particular layer while a model processes several examples that exhibit a trait (like “evil”) produces a representation of the trait (as well as anything else the outputs have in common, such as a particular language or sentence structure). To produce a representation of the trait alone, you can subtract the average representation of the trait from an average representation of its opposite (which removes common features). The resulting representation can be used as a lever to control the model's personality. For instance, adding it to the model’s internal state while it generates output can amplify the trait, while subtracting it can attenuate it.
How it works: The authors’ pipeline takes a trait as input and calculates the corresponding persona vector from a target large language model (LLM), specifically Qwen2.5-7B or Llama-3.1-8B.
Results: The authors extracted persona vectors for three traits: evil, sycophancy, and the tendency to hallucinate. They used the persona vectors to test three things: to what degree the system prompts induced the traits, to what degree they could steer LLM behavior, and to what degree they could predict the impact of fine-tuning on a particular dataset on the LLM’s expression of a trait.They used GPT-4.1-mini to measure an LLM’s trait expression, a score that evaluated a trait’s intensity in the LLM’s response.
Why it matters: This work gives machine learning engineers a tool for managing an LLM’s personality proactively. Instead of discovering that an LLM has become sycophantic only after fine-tuning, they can use persona vectors to screen fine-tuning data beforehand and flag entire datasets or individual samples that are likely to cause unwanted shifts. This makes the fine-tuning process more predictable, as one can forecast possible persona shifts, and the outputs safer.
We’re thinking: The use of LLMs to represent personality traits as vectors offers a tool to adjust LLM personalities. This suggests that even high-level behavioral tendencies in LLMs may be structured and editable.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|