|
Dear friends,
I recently spoke at the Sundance Film Festival on a panel about AI. Sundance is an annual gathering of filmmakers and movie buffs that serves as the premier showcase for independent films in the United States. Knowing that many people in Hollywood are extremely uncomfortable about AI, I decided to immerse myself for a day in this community to learn about their anxieties and build bridges.
Having said that, Hollywood is under no illusions that AI will change entertainment, and that if Hollywood does not adapt, perhaps some other place will become the new center for entertainment. The entertainment industry is no stranger to technology change. Radio, TV, computer graphics special effects, video streaming, and social media transformed the industry. But the path to navigating AI’s transformation is still unclear, and organizations like the new Creators Coalition on AI are trying to stake out positions. Unfortunately, Hollywood’s negative sentiment toward AI also means it will produce a lot more Terminator-like movies that portray AI as more dangerous than helpful, and this hurts beneficial AI adoption as well.
The interests of AI and Hollywood are not always aligned. (Every time I speak in a group like this as the “AI representative,” I can count on being asked very hard questions.) Most of us in tech would prefer a more open internet and more permissive use of creative works. But there is also much common ground, for example in wanting guardrails against deepfakes and a smooth transition for those whose jobs are displaced, perhaps via upskilling.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AI“A2A: The Agent2Agent Protocol” shows how to connect agents built using different frameworks via a shared, open standard. Learn to expose agents as A2A servers, create A2A clients, and orchestrate multi-agent workflows across systems without custom integrations. Explore it today
News
xAI Blasts Off
Elon Musk’s SpaceX acquired xAI, opening the door to richer financing of the merged entity’s AI research, a tighter focus on space applications of AI, and — if Musk’s dreams are realized — solar-powered data centers in space.
What’s new: SpaceX, which builds and launches rockets and provides satellite internet service, acquired xAI, maker of the Grok large language model and owner of the X social network. Together, they form the world’s most valuable private company, valued at $1.25 trillion. The terms of the all-stock deal were not disclosed. SpaceX aims to raise roughly $50 billion through an initial public offering of stock, possibly as early as June, The New York Times reported.
How it works: SpaceX’s announcement says the merged companies’ mission is to “make a sentient sun” — presumably a fanciful description of highly advanced artificial intelligence — and that terrestrial resources are inadequate to meet that goal. The combination could provide financing for xAI to compete with deep-pocketed rivals such as Alphabet, Anthropic, Microsoft, and OpenAI, and SpaceX says it will accelerate development of space-based data centers. Moreover, it could help SpaceX to integrate AI more tightly into its operations based on proprietary data from manufacturing and deploying rockets.
Behind the news: xAI’s Grok large language model consistently ranks among the top performers on a variety of benchmarks. However, it has gained a reputation for its odd and sometimes disturbing output, which can spread quickly and widely on the X social network. For instance, in January, responding to X users’ requests to depict the women wearing skimpy clothes, Grok generated tens of thousands of sexualized images of girls and women without their consent, leading to reports of investigations and legal actions in a number of countries. Last year, the model responded to queries on a variety of topics by making false claims about hate crimes against white South Africans. The company blamed a rogue employee for the incident.
Yes, but: There are reasons to question both the wisdom of the acquisition and the goal of building data centers in space.
Why it matters: The simplest, most direct impact of SpaceX’s acquisition of xAI is to boost xAI’s access capital based on its new parent’s revenue and, soon, its value as a public company. This could put it on firmer footing to compete with AI leaders. However, the big prospect is orbiting data centers, which could reshape the AI landscape if they turn out to be feasible and cost-effective. AI giants have committed immense sums to building data centers that will be necessary to serve their projections of demand for AI. This activity has raised questions about where the energy, water, and land required will come from and worries that the market will not support the huge expenditures. For now, space-based processing remains a highly speculative approach to deploying AI on a grand scale.
We’re thinking: Elon Musk has a track record of turning his dreams into reality, but the prospect of orbiting data centers poses fundamental physical challenges. Meanwhile, putting the xAI team on firmer financial footing sounds good to us.
Claude Opus 4.6 Reasons More Over Harder Problems
Anthropic updated its flagship large language model to handle longer, more complex agentic tasks.
What’s new: Anthropic launched Claude Opus 4.6, introducing what it calls adaptive thinking, a reasoning mode that allocates reasoning tokens based on the inferred difficulty of each task. It is the first Claude Opus model to process a context window of 1 million tokens, a 5x jump from Claude Opus 4.5, and can output 128,000 tokens, double Claude Opus 4.5’s output limit.
How it works: Anthropic disclosed few details about Claude Opus 4.6’s architecture and training. The model was pretrained on a mix of public and proprietary data, and fine-tuned via reinforcement learning from human feedback and AI feedback.
Performance: In Artificial Analysis’ Intelligence Index, a weighted average of 10 benchmarks that emphasize tasks involved in real-world work, Claude Opus 4.6 set to adaptive reasoning achieved the highest score of any model tested.
Yes, but: Claude Opus 4.6 exhibited some “overly agentic” behavior, Anthropic noted.
Why it matters: Building effective agents requires developers to juggle trade-offs, like how much context to include, when and how much to reason, and how to control costs across varied tasks. Opus 4.6 automates some of these decisions. Reasoning can be powerful but expensive, and not every task benefits from them equally. Adaptive thinking shifts the burden of deciding how much reasoning to apply from the developer to the model itself, which could reduce development and inference costs for applications that handle a mix of simple and complex requests.
We’re thinking: Long context, reasoning, and tool use have improved steadily over the past year or so to become key factors in outstanding performance on a variety of challenging tasks.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Claude Opus 4.6’s one-million-token context window and coding gains, and Seedance 2.0’s multimodal video generation. Subscribe today!
Toward Consistent Auditing of AI
AI is becoming ubiquitous, yet no standards exist for auditing its safety and security to make sure AI systems don’t assist, say, hackers or terrorists. A new organization aims to change that.
What’s new: Former OpenAI policy chief Miles Brundage formed AI Verification and Research Institute (Averi), a nonprofit company that promotes independent auditing of AI systems for security and safety. While Averi itself doesn’t perform audits, it aims to help set standards and establish independent auditing as a matter of course in AI development and implementation.
Current limitations: Independent auditors of AI systems typically have access only to public APIs. They’re rarely allowed to examine training data, model code, or training documentation, even though such information can shed critical light on model outputs, and they tend to examine models in isolation rather than deployment. Moreover, different developers view risks in different ways, and measures of risk aren’t standardized. This inconsistency makes audit results difficult to compare.
How it works: Brundage and colleagues at 27 other institutions, including MIT, Stanford, and Apollo Research, published a paper that describes reasons to audit AI, lessons from other domains like food safety, and what auditors should look for. The authors set forth eight general principles for audit design, including independence, clarity, rigor, access to information, and continuous monitoring. The other three may require explanation:
Why it matters: While the risks of AI are debatable, there’s no question that the technology must earn the public’s trust. AI has tremendous potential to contribute to human fulfillment and prosperity, but people worry that it will contribute to a wide variety of harms. Audits offer a way to address such fears. Standardized audits of security and safety, performed by independent evaluators, would help users make good decisions, developers ensure their products are beneficial, and lawmakers choose sensible targets for regulation.
We’re thinking: Averi offers a blueprint for audits, but it doesn’t plan to perform them, and it doesn’t answer the question who will perform them and on what basis. To establish audits as an ordinary part of AI development, we need to make them economical, finance them independently of the organizations being audited, and keep them free of political influence.
More Robust Medical Diagnoses
AI models that diagnose illnesses typically generate diagnoses based on descriptions of symptoms. In practice, though, doctors must be able to explain their reasoning and plan next steps. Researchers built a system that accomplishes these tasks.
What’s new: Dr. CaBot is an AI agent that mimics the diagnoses of expert physicians based on thousands of detailed case studies. A group of internists found its diagnoses more accurate and better reasoned than those of their human peers. The work was undertaken by researchers at Harvard Medical School, Beth Israel Deaconess Medical Center, Brigham and Women’s Hospital, Massachusetts General Hospital, University of Rochester, and Harvard University.
Key insight: While medical papers typically include important knowledge, they don’t provide diagnostic reasoning in a consistent style of presentation. However, a unique body of literature offers this information. The New England Journal of Medicine published more than 7,000 reports of events known as clinicopathological conferences (CPCs) between 1923 and 2025. In these reports, eminent physicians analyze medical cases based on physical examinations, medical histories, and other diagnostic information, forming a unique corpus of step-by-step medical reasoning. Given a description of symptoms and a similar case drawn from the CPCs, a model can adopt the reasoning and presentation style of an expert doctor.
How it works: The authors digitized CPC reports of 7,102 cases published between 1923 and 2025. They built Dr. CaBot, an agentic system that uses OpenAI o3 to generate text. To test Dr.CaBot and other diagnostic systems, they developed CPC-Bench, 10 tasks that range from answering visual questions to generating treatment plans.
Results: To evaluate Dr. CaBot quantitatively, the authors used their own CPC-Bench benchmark. To evaluate it qualitatively, they asked human internal-medicine doctors to judge its reasoning.
Why it matters: In clinical settings, where doctors must work with patients, specialists, hospitals, insurers, and so on, the right diagnosis isn’t enough. It must be backed up by sound reasoning. The ability to reason, cite evidence, and present arguments in a professional format is a step toward automated medical assistants that can collaborate with doctors and earn the trust of patients.
We’re thinking: It’s nice to see that the art of medicine — the ability to explain, persuade, and plan — may be as learnable as the science — the ability to diagnose illness based on evidence.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|