Dear friends,
Just as many businesses are transforming to become more capable by using AI, universities are too. I recently visited the UK to receive an honorary doctorate from the University of Exeter’s Faculty of Environment, Science and Economy. The name of this faculty stood out to me as a particularly forward-looking way to organize an academic division. Having Computer Science sit alongside Environmental Science and the Business School creates natural opportunities for collaboration across these fields.
Having been born in the UK and spent most of my career in Silicon Valley, I find it exciting to see Exeter’s leadership embrace AI with an enthusiasm I more often associate with California. The UK has always punched above its weight in research, and seeing that tradition continue in the AI era is encouraging.
Like almost all other universities and businesses worldwide, Exeter’s AI transformation is just beginning. But the enthusiastic embrace of AI by its leadership will give it momentum. As someone who is proud to be an honorary graduate of the university, I look forward to seeing what comes next!
Keep building, Andrew
A MESSAGE FROM DEEPLEARNING.AIBuild working GenAI app prototypes in hours, not weeks. Turn Python scripts into interactive web apps using Streamlit and Snowflake's Cortex AI, and deploy to Snowflake or Streamlit Cloud for user feedback. Learn more and enroll now!
News
GPT-5 Takeoff Encounters Turbulence
OpenAI launched GPT-5, the highly anticipated successor to its groundbreaking series of large language models, but glitches in the rollout left many early users disappointed and frustrated.
What’s new: Rather than a family of models, GPT-5 is a family of systems — GPT-5, GPT-5 Mini, GPT-5 Nano, and GPT-5 Pro — that include non-reasoning and variable-reasoning models along with a router that switches between them automatically depending on the input. OpenAI made GPT-5 the only option in the ChatGPT user interface without prior notice, but the router failed right out of the gate, causing the company to reinstate ChatGPT access to earlier models for paid users.
How it works: OpenAI revealed few details about GPT-5’s architecture and training except “safe completions” fine-tuning to balance safety and helpfulness, which is documented in a paper.
Results: GPT-5 topped some benchmarks according to OpenAI's evaluations. However, it fell short of competing models on some measures of abstract reasoning in independent tests.
Behind the news: Launched in March 2023, GPT-4 raised the bar for vision-language performance, and anticipation of the next version grew steadily over the two years since. In December 2024, The Wall Street Journal reported GPT-5 was delayed as the scale of the project stretched OpenAI’s computational limits. In a mid-February 2025 post on the X social network, OpenAI CEO Sam Altman offered GPT-4.5 as a stopgap and outlined the improvements expected with GPT-5. But in April, he said GPT-5 would be delayed further and launched o3 and o4-mini, whose performance once again topped leaderboards. GPT-5’s August 7 debut brought an end to the long wait, but misleading graphs of its performance, rate limits, and the malfunctioning switcher marred the event, while the unexpected deprecation of earlier models in ChatGPT hamstrung many users.
Why it matters: OpenAI models have consistently topped language benchmarks. With GPT-5, the company has launched a system architecture that integrates its best models and takes advantage of the strengths of each: rapid output, slower output with adjustable computation devoted to reasoning, and graceful degradation to smaller versions.
We’re thinking: Novices may find that the GPT-5 router’s ability to choose a model for any given input simplifies things, but it remains to be seen whether expert users, who may be better at selecting the appropriate model for their tasks, will be happy to give up this control.
India Pushes to Build Indigenous AI
India, which has limited funding and large numbers of languages and dialects, is redoubling its efforts to build native large language models.
What’s new: India is funding startups and marshaling processing resources, MIT Technology Review reported. Companies such as CoRover, Sarvam AI, and Soket AI Labs are working on efficient models that can process many of the 22 officially recognized languages spoken in India while running on relatively small compute budgets.
Challenges: India is home to more than 120 languages and 19,500 dialects. However, training models to process them faces hurdles both cultural and technical.
Initiatives: To overcome the challenges, India’s government, cloud providers, and startups are attempting to kickstart indigenous model development. Several Indian AI leaders said they’re inspired by DeepSeek, the Chinese developer that built a leading large language model while spending far less than its international competitors.
Why it matters: As LLMs have become more sophisticated, it has become clear that one size doesn’t fit all. Countries (and subcultures within countries) need models that reflect their values, habits of thought, and languages. Yet resources are unequally distributed, leaving developers in some countries struggling to realize this dream. India is making a push to overcome the obstacles and develop AI that suits its own needs.
We’re thinking: Different countries deserve models that reflect their distinctive characters, but their development efforts need not remain insular. AI is an international project, and teams in different countries benefit by collaborating with one another. Let’s all help one another realize the benefits of AI worldwide.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered OpenAI’s launch of GPT-5 with routed reasoning and competitive pricing, and its decision to restore GPT-4o access while doubling GPT-5 usage limits. Subscribe today!
AI Video Goes Mainstream
Generated video clips are capturing eyeballs in viral videos, ad campaigns, and a Netflix show.
What’s new: The Dor Brothers, a digital video studio based in Berlin, uses AI-generated clips to produce of social-media hits including “The Drill,” which has been viewed 16 million times. Similarly, AI-focused creative agency Genre.ai made a raucous commercial for gaming company Kalshi for less than $2,000, stirring debate about the future of advertising. Netflix generated a scene for one of its streaming productions, the sci-fi series The Eternaut.
How it works: For Genre.ai and The Dor Brothers, making stand-out videos requires entering new prompts repeatedly until they’re satisfied with the output, then assembling the best clips using traditional digital video editing tools. For the Kalshi ad, for instance, Genre.ai generated 300 to 400 clips to get 15 keepers. Netflix did not describe its video-generation process.
Behind the news: Top makers of video generation models have been courting commercial filmmakers to fit generative AI into their production processes.
Why it matters: Video generation enables studios to produce finished work on schedules and budgets that would be unattainable any other way. Sets, lighting, cameras, talent, makeup, even scripts and scores — generative AI subsumes them all. For newcomers like The Dor Brothers or Genre.ai, this is liberating. They can focus on realizing their ideas without going to the effort and expense of working with people, video equipment, and locations. For established studios, it’s an opportunity to transform traditional methods and do more with less.
We’re thinking: AI is rapidly transforming the labor, cost, and esthetics of filmmaking. This isn’t the first time: It follows close upon streaming and social video, or before that, computer-generated effects and digital cameras. The Screen Actors Guild and Writers Guild of America negotiated agreements with film/video producers that limit some applications of AI, but creative people will find ways to use the technology to make products that audiences like. This creates opportunities for producers not only to boost their productivity but also to expand their revenue — which, we hope, will be used to make more and better productions than ever before.
Training Data for Coding Assistants
A bottleneck in fine-tuning large language models for software engineering is building a dataset that can show them how to edit code, search for subroutines, write test scripts, control a terminal, manage a file system, and so on. Researchers built a pipeline that produces such data automatically.
What’s new: John Yang and colleagues at Stanford, Princeton, and Alibaba introduced SWE-smith, a method that generates realistic examples of bug fixes and other code alterations. The code, dataset, and a model that was fine-tuned on the data are freely available for commercial and noncommercial uses.
Key insight: Automated unit tests determine whether code does what it’s supposed to do. Code that doesn’t pass a unit test has a bug, so one way to generate bug-fix examples is to start with code that passes a unit test and modify it until it doesn’t. Another is to start with working code and revert to previous versions that contain bugs or lack desired features. Having introduced issues, we can prompt an LLM to eliminate them, producing valid before-and-after examples that don’t require manual validation.
How it works: The authors started with 128 GitHub repositories of Python code.
Results: The authors fine-tuned Qwen 2.5 Coder-32B on 5,000 examples, focusing on the bugs produced by methods (i) and (iv) above, which they found most effective. To represent a diversity of bugs, they kept no more than 3 example fixes for any given bug. Paired with SWE-agent, their model solved software engineering problems in SWE-bench Verified in one attempt 40.2 percent of the time. Paired with the OpenHands agentic framework, the same-size R2E-Gym-32B (fine-tuned on different data) and the much bigger Qwen3-235B-A22B (not fine-tuned) solved 34.4 percent in one attempt.
Why it matters: Previous datasets for fine-tuning LLMs on coding tasks are small, often comprising thousands of training instances from less than a dozen repositories. The authors’ method can produce such data at scale, potentially enabling major developers to improve their AI-assisted coding models and everyone else to build better systems.
We’re thinking: AI-assisted coding is revolutionizing software development, and the tools are still evolving. The ability to produce effective training data at scale is likely to further accelerate the progress — already moving at breakneck speed! — in this area.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|