The Batch top banner - November 12, 2025

Dear friends,

I recently received an email titled “An 18-year-old’s dilemma: Too late to contribute to AI?” Its author, who gave me permission to share this, is preparing for college. He is worried that by the time he graduates, AI will be so good there’s no meaningful work left for him to do to contribute to humanity, and he will just live on Universal Basic Income (UBI). I wrote back to reassure him that there will still be plenty of work he can do for decades hence, and encouraged him to work hard and learn to build with AI. But this conversation struck me as an example of how harmful hype about AI is.

Yes, AI is amazingly intelligent, and I’m thrilled to be using it every day to build things I couldn’t have built a year ago. At the same time, AI is still incredibly dumb, and I would not trust a frontier LLM by itself to prioritize my calendar, carry out resumé screening, or choose what to order for lunch — tasks that businesses routinely ask junior personnel to do.

Yes, we can build AI software to do these tasks. For example, after a lot of customization work, one of my teams now has a decent AI resumé screener. But the point is it took a lot of customization.

Even though LLMs can handle a much more general set of tasks than previous iterations of AI technology, compared to what humans can do, they are still highly specialized. They’re much better at working with text than other modalities, still require lots of custom engineering to get it the right context for a particular application, and we have few tools — and only inefficient ones — for getting our systems to learn from feedback and repeated exposure to a specific task (such as screening resumés for a particular role).

AI has stark limitations, and despite rapid improvements, it will remain limited compared to humans for a long time.

A megaphone emits a colorful stream of 3D words spelling "Hype", symbolizing the AI hype discussed in the article.

AI is amazing, but it has unfortunately been hyped up to be even more amazing than it is. A pernicious aspect of hype is that it often contains an element of truth, but not to the degree of the hype. This makes it difficult for nontechnical people to discern where the truth really is. Modern AI is a general purpose technology that is enabling many applications, but AI that can do any intellectual tasks that a human can (a popular definition for AGI) is still decades away or longer. This nuanced message that AI is general, but not that general, often is lost in the noise of today's media environment.

Similarly, the progress of frontier models is amazing! But not so amazing that they’ll be able to do everything under the sun without a lot of customization. I know VC investors who are scared to invest in application-layer startups because they are worried that frontier AI model companies will quickly wipe out all of these businesses by improving their models. While some thin wrappers around LLMs no doubt will be replaced, there also remains a huge set of valuable applications that the current trajectory of progress of frontier models won’t displace for a long time.

Without accurate information about the current state of AI and how it is likely to progress, some young people will decide not to enter AI because think think AGI leaves them no meaningful role, or decide not to learn how to code because they fear AI will automate it — right when it is the best time ever to join our field.

Let us all keep working to get to a precise understanding of what’s actually possible, and keep building!

Andrew

A MESSAGE FROM DEEPLEARNING.AI

Promo banner for: "Design, Develop, and Deploy Multi-Agent Systems with CrewAI"

In Design, Develop, and Deploy Multi-Agent Systems with CrewAI, created with João Moura, Co-Founder & CEO of CrewAI, you’ll learn to build production-ready teams of AI agents that plan, reason, and collaborate to automate complex workflows. Enroll now!

News

Icon of silhouettes of kids with a ban symbol, indicating limited chatbot use by teens.

Toward Safer (and Sexier) Chatbots

Chatbot providers, facing criticism for engaging troubled users in conversations that deepen their distress, are updating their services to provide wholesome interactions to younger users while allowing adults to pursue erotic conversations.

What’s new: Character.AI, which provides chatbots designed for entertainment and companionship, temporarily barred teen users from parts of its offering and announced plans to offer a service for younger users. Meanwhile OpenAI, which faces a lawsuit on allegations that ChatGPT contributed to a teenager’s suicide, updated ChatGPT to better help users in psychological distress and reaffirmed that it would allow adult users to generate erotic content later this year.

Character.AI limits access: The startup imposed limits on young users after it received "reports and feedback from regulators, safety experts, and parents" that expressed concern about its impact on teen users, BBC News reported.

Character.AI moved to limit chat time for users under 18, starting with 2 hours daily and tapering to zero by November 25.
The company will roll out a new in-house age verification model and use third-party technology to prevent younger users from engaging in adult chat.
In addition, it will establish an independent AI Safety Lab where it plans to collaborate with other organizations to improve safety alignment and other features for AI entertainment.

OpenAI detects distress: Around 0.15 percent of ChatGPT users — roughly 1.2 million out of the service’s 800 million weekly active users — show signs of suicidal intent and/or excessive emotional attachment to the chatbot, OpenAI revealed. The company said it has made its models more responsive to such issues, paving the way to provide interactions geared toward adults who don’t suffer from distress.

OpenAI updated ChatGPT to avoid encouraging certain groups of users to engage in dangerous and/or self-destructive behavior. The effort targets three vulnerable groups: (i) people with severe mental illnesses like psychosis or mania, (ii) people with depression and suicidal ideation, and (iii) people with excessive emotional attachments to AI.
In a test of 1,000 mental health-related conversations, the new version of GPT-5 boosted desired responses to mental-health crises from 27 percent to 92 percent, and reduced undesired responses by 65 percent. For suicidal conversations, the rate of desired responses rose from 77 percent to 91 percent, and the rate of undesired responses fell by 65 percent. Conversations that showed signs of excessive attachment saw undesired responses drop by 80 percent and desired responses rise by 50 percent to 97 percent.
Sam Altman wrote on the social network X that, given the company’s progress in providing psychologically sensitive output and restricting output based on users’ ages, it would provide output geared toward verified adults, including erotica, beginning in December.

Behind the news: Both Character.AI and OpenAI were sued by families of underage users who committed suicide after they conversed with their chatbots. In the U.S., California recently passed a state law that outlaws exposing minors to sexual content and requires supporting users who are suicidal and otherwise at risk psychologically. In August, 44 state attorneys general warned xAI, Meta, and OpenAI to restrict sexually explicit material as much as possible. xAI openly embraced adult interactions in July, when it introduced sexually explicit chatbots.

Why it matters: Chatbot companionship is a growing phenomenon, and companies that offer such services — or simply chat — must be ready to manage emotional relationships between users and their software. Managing sexually charged interactions and conversations about mental illness are linked under the umbrella of building guardrails. Sycophancy also plays a role, since models that are prone to agreeing with users can encourage dangerous behavior. A depressed, underage user and a permissive chatbot make a worrisome combination.

We’re thinking: Mental health is a hard problem, in part because it affects so many people. A recent study shows that 5.3 percent of Americans had suicidal thoughts in 2024 — far higher than ChatGPT users’ 0.15 percent. It’s important that chatbot providers do what they can to help troubled users get help.

Bar chart shows HunyuanImage 3.0's performance against Nano Banana and Seedream 4.0, highlighting differences.

Better Images Through Reasoning

A new image generator reasons over prompts to produce outstanding pictures.

What’s new: Tencent released HunyuanImage-3.0, which is fine-tuned to apply reasoning via a variety of reinforcement learning methods. The company says this helps it understand users’ intentions and improve its output.

Input/output: Text and images in, text and images out (fine-tuned for text in, images out only)
Architecture: Mixture of experts (MoE) diffusion transformer (80 billion parameters, 13 billion parameters active per token), one VAE, one vision transformer, two vanilla neural network projectors
Performance: Currently tops LMArena Text-to-Image leaderboard
Availability: Weights available for commercial and noncommercial use by companies with fewer than 100 million monthly active users under Tencent license
Undisclosed: Input and output size limits; parameter counts of VAE, vision transformer, and projectors; training data; models used for labeling, filtering, and captioning images; reward models

How it works: The authors built a training dataset of paired text and images. They trained the model on image generation via diffusion through several stages and fine-tuned it on text-to-image generation in further stages.

To produce the dataset, the authors collected 10 billion images. (i) They built models specially trained to measure image clarity and aesthetic quality, and removed images that didn’t make the grade. (ii) They also built models to identify text and named entities such as brands, artworks, and celebrities, and extracted this information from the remaining images. (iii) They fed the images, extracted text, and extracted entities to a captioning model that produced a text caption for each image. (iv) For a subset of the data, they manually annotated chains of thought, producing data that linked text to chains of thought to images. (v) They added text-to-text data and image-text data from unspecified corpi.
The authors pretrained the system to generate text and images from the various text and image elements in the dataset. Specifically, for text-to-image tasks: (i) First, the VAE’s encoder embedded an image. (ii) The authors added noise to the embedding. (iii) Given the noisy embedding and a text prompt, the MoE removed the noise. (iv) The VAE’s decoder generated an image from the embedding with noise removed.
The authors fine-tuned the system (i) for text-to-image tasks by training it in a supervised fashion to remove noise from human-annotated examples, (ii) via DPO to be more likely to generate higher-quality examples, like human-annotated ones, than lower-quality ones, (iii) via the reinforcement learning method MixGRPO to encourage the model to generate more aesthetically pleasing images as judged by unspecified reward models, and (iv) via SRPO (another reinforcement learning method) to encourage the model to generate images more like a text description that specified desired traits and less like a text description that specified negative traits. While applying SRPO, they also encouraged the model to generate images similar to those in an author-chosen distribution.

Results: At present, HunyuanImage 3.0 holds first place in the LMArena Text-to-Image leaderboard, ahead of Google Gemini 2.5 Flash Image (Nano Banana), Google Imagen 4.0 Ultra Generate, and ByteDance Seedream 4.0. In addition, 100 people compared 1,000 outputs of 4 competing models to those of HunyuanImage 3.0 in side-by-side contests. The people evaluated which image was better, or whether they were both equally good or equally poor.

On average, the people preferred HunyuanImage 3.0’s images over those of the competitors.
For example, 20.01 percent of the time they preferred HunyuanImage 3.0, 18.84 percent of the time they preferred Seedream 4.0, 39.3 percent of the time they were equally good, and 21.85 percent of the time they were equally poor.

Behind the news: Tencent has been on a streak of releasing vision models.

Tencent recently launched the API version of Hunyuan-Vision-1.5, its latest vision-language model, with promises to release the weights and a paper soon.
The company released Hunyuan3D-Omni, a model that takes an image and rough 3D representation (such as a skeleton or bounding box) and generates a detailed 3D representation.
It also played a role in the release of FlashWorld, which accepts an image and text prompt and generates a 3D scene.

Why it matters: Simplifying training methods can be helpful, since each additional step adds time spent not only training but also debugging, and each additional component can interact with other components in unexpected ways, which adds to the time required to debug the system. Yet Tencent used several stages of pretraining and fine-tuning and produced a superior model.

We’re thinking: One key to this success may be to use different methods for different purposes. For instance, the team used MixGRPO to fine-tune the model for aesthetics and SRPO to better match human preferences.

Two young adults in a conversation, one African and the other southeast Asian, in traditional dress, wearing microphone headsets

Learn More About AI With Data Points!

AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Meta’s Omnilingual ASR, which transcribes speech in over 1,600 languages, and GEN-0, a new class of robotics foundation models that reveal real-world scaling laws. Subscribe today!

AI models are compared on a graph showing benchmark accuracy from 20% to 100%, highlighting GPT-5's rise.

The Year AI Went Industrial

A year-in-review report heralds the dawn of AI’s industrial era.

What’s new: The eighth annual State of AI Report 2025 aims to reflect the trajectory of AI through a selection of significant work from the past 12 months. It declares 2025 to be the beginning of the industrial age of AI, noting that the barriers to the technology’s economic potential have shifted from technical limitations to matters of capital, politics, and physics. Nathan Benaich, a venture investor, led the effort and acknowledges unspecified conflicts of interest.

How it works: The sprawling 300-slide deck highlights the year’s progress in research, industry, politics, and security.

Research: Introduced late last year, reasoning models have redefined the capabilities of large language models. OpenAI’s closed models retained their lead despite strong progress among open-weights competitors, especially China-based developers DeepSeek, Alibaba, and Moonshot. Such models showed significant gains in efficiency, shrinking numbers of trainable parameters by as much as 50 times while maintaining high performance. Models from OpenAI, Google, and Harmonic achieved gold-level performance on problems from the International Mathematical Olympiad, and the medical dialog model AIME outperformed unassisted doctors in diagnostic accuracy.

Industry: Demand for AI services mounted. According to Ramp Business Corporation, which maintains an index of AI adoption by U.S. companies, 44 percent of U.S. companies pay for AI tools, up from 5 percent in 2023. A cohort of 16 companies made nearly $18.5 billion in annualized revenue as of August, demonstrating a business case that gave some confidence to extend their financial commitments into hundreds of billions of dollars. Anticipating further growth, OpenAI and others committed to hundreds of billions of dollars to build data centers, and the availability of electrical power to drive such facilities emerged as a major issue that will shape the path forward. Among providers of closed models, OpenAI led not only in capability but also in price: GPT-5 costs 12 times less than Anthropic Claude Opus for roughly comparable performance.

Politics: National regulators in Europe and the U.S. backed off as they faced the prospect that overregulation might stymie AI’s potential to drive economic growth. OpenAI, Meta, Google, and others lobbied to pre-empt state-level laws even as California forged ahead with its own legislation, which Anthropic supported. Internationally, the race to advance AI technology intensified. The U.S. launched an America-first AI strategy, blocking U.S. AI technologies from rivals, distributing it to allies, expediting permits for data-center sites, and providing the sites themselves. China responded by accelerating its efforts to build its domestic AI industry, and Chinese companies displaced Meta as premier suppliers of open-weights models.

Security: Cybersecurity concerns rose as one analysis estimated that offensive capabilities are doubling every 5 months. Criminals successfully used Claude Code to create false identities that gained remote employment at Fortune 500 companies, and researchers demonstrated that it’s possible to disable safety guardrails of open-weights models using minimal processing power. Anthropic and OpenAI responded to concerns that their models might be used to develop biological or chemical weapons by adopting preemptive safety measures.

Why it matters: State of AI Report 2025 brings into focus notable trends in AI over the past year and presents them with detailed context and evidence. It’s chock-full of information that weaves diverse threads into coherent lines of progress. Moreover, it provides a consistent perspective on outstanding developments from year to year.

We’re thinking: By the authors’ own reckoning, half of their 2024 predictions came to pass (more or less). This year’s predictions mostly seem like matters of course. For instance, AI agents will purchase greater than 5 percent of a major retailer’s annual online sales, a movie produced using AI will attract a large audience, and resistance to building data centers will sway U.S. state-level elections. But it also includes the alarming, and imaginable, prospect that an event driven by deepfakery or agents will trigger a NATO emergency. The need for AI practitioners to attend to ethical and security concerns is as high as ever.

Series of graphs transformed via tokenization and transformer layers, resulting in predicted outputs.

Forecasting Multiple Time Series

Transformers are well suited to predicting future values of time series like energy prices, wages, or weather, but often — as in those examples — multiple time series often influence one another. Researchers built a model that can forecast multiple time series simultaneously.

What’s new: Chronos-2 is a pretrained model that can accept and predict multiple time series in a zero-shot manner to forecast series of a single variable (univariate forecasting), multiple variables (multivariate forecasting), and single variables that depend on other variables (covariate-informed forecasting). Its authors include Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, and colleagues at Amazon, University of Freiburg, Johannes Kepler University Linz, Boston College, and Rutgers.

Input/output: Time series in (up to 8,192 time steps), time series out (up to 1,024 time steps)
Architecture: Modified transformer, 120 million parameters
Performance: Lower error on average than 14 competing models
Availability: Weights available for commercial and noncommercial uses under Apache 2.0 license

How it works: Given any number of time series, Chronos 2 predicts values at multiple future time steps. Chronos 2 learned to minimize the difference between its predicted future values and ground truth values in subsets of datasets that contain univariate series (including synthetic data generated using methods from earlier work). They supplemented these datasets with synthetic multivariate and covariate data produced using a method devised by the authors: Their method generates multiple independent time series and then produces dependencies between them by applying mathematical transformations at the same time step and across time steps.

Chronos 2 stacks each input time series to make a series of vectors, where each vector represents one time step. These values can be historical or future values that are known (such as dates of holidays or weather forecasts). For non-overlapping time series (for example, one past and one future), the model aligns the time series by the corresponding time step and adds zeros to either end to equalize the number of time steps.
Given the series of vectors, the model splits them into non-overlapping patches, and a vanilla neural network with added skip connections, or residual network, turns each patch into an embedding.
Given the embeddings, it predicts values of each time series for a number of future time steps that haven’t already been assigned a value.
In addition to the attention layers that perform attention across a given time series, Chronos 2 includes what the authors call group attention layers. These layers process attention across time series, or more specifically, across groups of time series. The user specifies groups, so the model can produce multiple independent forecasts at once.

Results: Across various benchmarks, Chronos 2 outperformed 14 competing zero-shot models according to their skill score, a measure of how much a model reduces the average difference in predicted values relative to a baseline (higher is better, one is a perfect score).

Across univariate, multivariate, and covariate subsets of fev-bench, Chronos-2 achieved the highest skill score.
On fev-bench, 100 realistic time-series tasks including single and multiple input and output time series, Chronos-2 (0.473) outperformed TiRex (0.426), which processes only univariate time series, and Toto-1.0 (0.407), which can process multivariate and covariate time series in some cases.

Behind the news: Most previous works, including the previous versions Chronos and Chronos-Bolt, predict only univariate time series. Later models like Toto-1.0 and COSMIC process multiple inputs or outputs in limited ways. For instance, Toto-1.0 processes multiple inputs and outputs, but the multiple inputs can only represent past information, not future or static information. COSMIC, on the other hand, can handle multiple inputs (past or future) but not multiple outputs.

Why it matters: Chronos 2 can handle past, future, and static inputs as well as multiple outputs, giving developers, researchers, and companies alike the ability to better predict complex time series.

We’re thinking: The author’s attention setup is similar to the way many video transformers apply attention separately across space and time. It saves memory compared to performing attention across both at once, and remains an effective method for understanding data across both.

A MESSAGE FROM RAPIDFIRE AI

Promo banner for: "PyPI Project RapidFire AI"

RapidFire AI helps you find the RAG configuration that grounds best for your domain. Empirical RAG, not guesswork. Try it on PyPI

Work With Andrew Ng

Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.

Subscribe and view previous issues here.

Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.