|
Dear friends,
How can businesses go beyond using AI for incremental efficiency gains to create transformative impact? I’m writing this letter from the World Economic Forum (WEF) in Davos, Switzerland, where I’ve been speaking with many CEOs about how to use AI for growth. A recurring theme of these conversations is that running many experimental, bottom-up AI projects — letting a thousand flowers bloom — has failed to lead to significant payoffs. Instead, bigger gains require workflow redesign: taking a broader, perhaps top-down view of the multiple steps in a process and changing how they work together from end to end.
Even though AI is applied only to one step, Preliminary Approval, we end up implementing not just a point solution but a broader workflow redesign that transforms the product offering.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn how to build multi-step workflows from the command line using Gemini CLI, an open-source agent that works across local files, developer tools, and cloud services. Automate coding tasks, build software features, create dashboards, and apply agentic workflows beyond code. Enroll now
News
ChatGPT Shows Ads
AI has a new revenue stream, and it looks a lot like old web banner ads.
What’s new: OpenAI began a test to display advertisements in ChatGPT. Ads appear to U.S. users of OpenAI’s free and least-expensive plans (not to subscribers to ChatGPT Plus, Pro, Business, or Enterprise tiers or users of the API). The company plans to expand the experiment to other regions and test more-conversational ads on an unspecified timeline.
How it works: Ads relevant to a conversation will appear at the bottom of the chat, including a brief message, image, and link. They do not influence chat responses. Ads appear only to adults in the U.S. who are logged in to desktop or mobile versions of the ChatGPT website or app.
Behind the news: OpenAI is figuring out how to bring in enough revenue to yield profit. The company revealed that it took in $20 billion in revenue and used 1.9 gigawatts of computing power in 2025 at a cost estimated to have exceeded $9 billion. (Both revenue and processing have roughly tripled annually since 2023.) Meanwhile, OpenAI projects capital spending of $115 billion by 2029, The Information reported. Advertising is part of an evolving revenue strategy that includes subscriptions, ecommerce, and metered API access.
Why it matters: Delivering AI to a fast-growing, worldwide market incurs immense expenses, and business strategies are still evolving. Unlike its Big Tech rivals, OpenAI doesn’t have other businesses to offset these costs (although Google is also experimenting with chatbot ads). The combination of advertising and low-cost ChatGPT subscriptions gives OpenAI a new route to profit. If it works, the company’s premium tiers will no longer completely subsidize the free ones, and premium-tiers users will continue to use ChatGPT ad-free, at least for now.
We’re thinking: OpenAI is dipping its toes into the water with display ads, a tried-and-true advertising format. However, genuinely chatbot-native advertising probably will look and feel significantly different.
Training Cars to Reason
Chain-of-thought reasoning can help autonomous vehicles decide what to do next.
What’s new: Nvidia released Alpamayo-R1, a vision-language action model for autonomous vehicles that uses reasoning to reduce potential collisions.
How it works: Alpamayo-R1 comprises Cosmos-Reason1 (a vision-language model that’s pretrained to describe actions) and a diffusion transformer that produces vehicle trajectory data. Given video frames and trajectory data that represent the last 2 seconds as well as possible verbal commands, Cosmos-Reason1 produced reasoning text. Given Cosmos Reason’s embeddings of video frames, previous trajectory data, and reasoning text, the diffusion transformer produced future trajectory data. The authors trained the system in three phases:
Results: The authors compared their system to a version that was trained on the same data except the reasoning datasets. In 75 simulated scenarios, the reasoning model experienced “close encounters” (distance undisclosed) with other vehicles 11 percent of the time, which is down from the non-reasoning model’s 17 percent.
Why it matters: Chain-of-thought reasoning is useful for robots. Unlike earlier vision-language-action models that use reasoning, Alpamayo-R1 was trained not only to encourage better performance but to match its actions with its reasoning. This made the model’s reasoning both more effective and more interpretable. In case of a mishap, an engineer can review the system’s reasoning to understand why it made a particular decision and then adapt training or inference to avoid similar outcomes in the future.
We’re thinking: In the past year, reasoning models have outperformed their non-reasoning counterparts in math, science, coding, image understanding, and robotics. Chain-of-thought turns out to be an extremely useful algorithm.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered OpenAI’s global launch of the lower-cost ChatGPT Go and GLM-4.7-Flash, a fast, free, multimodal model. Subscribe today!
Apple’s Foundation Models Will Be Gemini
Apple cut a multi-year deal with Google to use Gemini models as the basis of AI models that reside on Apple devices.
What’s new: Apple will use Google’s technology to drive revamped versions of its AI assistant Siri and other AI features, the companies said in a joint announcement. The fruits of the partnership are expected to start rolling out in spring 2026. The companies did not disclose financial or technical details.
How it works: Bloomberg reported on planned updates of Siri this week as well as rumors of the partnership in November, September, and August. The companies have not confirmed many details in those reports. The information below relies largely from those reports with additional points from the sources noted.
Behind the news: The partnership signals Apple’s retreat from building proprietary AI software and infrastructure after periodic reports that it was trying. As early as July, Apple was evaluating models from Anthropic, Google, and OpenAI as potential replacements for its own technology, Bloomberg reported last year.
Why it matters: In teaming up with Google, Apple is withdrawing from an immensely costly competition to build cutting-edge AI software and infrastructure. At the same time, it’s shoring up its own most lucrative product, the iPhone, which accounts for half its revenue. The deal puts iOS devices back on track to deliver competitive AI capabilities in the short term — despite the irony that Apple’s biggest competitor in mobile devices is Google.
We’re thinking: Google pays $20 billion annually to Apple for the privilege of supplying the default search engine on iPhones. Apple’s payment to Google of $1 billion for access to cutting-edge models — with no requirement to share data — is inexpensive in comparison, and likely reflects Apple’s deftness in playing Google, OpenAI, and Anthropic against each other. Apple’s control over the iPhone has tremendous market power.
Detailed Text- or Image-to-3D, Pronto
Current methods that produce 3D scenes from text or images are slow and produce inconsistent results. Researchers introduced a technique that generates detailed, coherent 3D scenes seconds.
What’s new: Researchers at Xiamen University, Tencent, and Fudan University developed FlashWorld, a generative model that takes a text description or image and produces a high-quality 3D scene, represented as Gaussian splats; that is, millions of colored, semi-transparent ellipsoids. You can run the model using code that’s licensed for noncommercial and commercial uses under Apache 2.0 or download the model under a license that allows noncommercial uses.
Key insight: There are two dominant approaches to generating 3D scenes: 2D-first and 3D-direct. The 2D-first approach generates multiple 2D images of a scene from different angles and constructs a 3D scene from them. This produces highly detailed surfaces but often results in an inconsistent 3D representation. The 3D-direct approach generates a 3D representation directly, which ensures 3D consistency but often lacks detail and photorealism. A model that does both could learn how to represent rich details while enforcing 3D consistency. To accelerate the process, the model could learn to replicate a teacher model’s multi-step refinement in one step.
How it works: FlashWorld comprises a pretrained video-diffusion model (WAN2.2-5B-IT2V) and a copy of its decoder that was modified to generate 3D output. The authors trained the system to generate images and 3D models using a few public datasets that include videos, multi-view images, object masks, camera parameters, and/or 3D point clouds. In addition, they used a proprietary dataset of matching text and multi-view images of 3D scenes including camera poses of the different views.
Results: FlashWorld generated higher-quality 3D scenes at a fraction of the computational cost of previous state-of-the-art methods.
Why it matters: 3D generation is getting both better and faster. Combining previous approaches provides the best of both worlds. Using a pretrained diffusion model as a teacher enabled this system to learn how to produce detailed, consistent 3D representations in little time.
We’re thinking: The ability to generate 3D scenes in seconds is a big step toward generating them in real time. In gaming and virtual reality, it could shift content creation from a pre-production task to a dynamic, runtime experience.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|