|
Dear friends,
We just launched a course that shows people who have never coded before, in less than 30 minutes, how to describe an idea for an app and build it using AI. It is now time for everyone — marketers, product professionals, operations specialists, analysts, students — to build software applications with AI!
This course requires no prior knowledge of AI or coding. And it’s vendor-agnostic. Specifically, learners can use these techniques with whatever tool they’re most comfortable with (like ChatGPT, Gemini, Claude, or the chatbot built into the DeepLearning.AI platform).
If you take this course, you will build a working web application: a funny interactive birthday message generator that runs in your browser and can be shared with friends. You’ll customize it by telling AI how you want it changed, and tweak it until it works the way you want. By the end, you’ll have a repeatable process you can apply to build a wide variety of applications.
Keep building, Andrew
A MESSAGE FROM DEEPLEARNING.AIYou don’t need to learn how to code to build an app. In “Build with Andrew,” Andrew Ng shows how to turn ideas you describe in natural language into working web apps. Perfect for beginners, and easy to share with someone who has been waiting to start. Explore the course now!
News
Teaching Models to Tell the Truth
Large language models occasionally conceal their failures to comply with constraints they’ve been trained or prompted to observe. Researchers trained an LLM to admit when it disobeyed.
What’s new: Manas Joglekar, Jeremy Chen, Gabriel Wu and colleagues at OpenAI fine-tuned GPT-5 Thinking to confess to violations of its instructions or policies. Upon request, the fine-tuned model revealed any transgressions it had committed while producing its most recent response.
Key insight: During training via reinforcement learning to encourage accuracy, style, compliance, and the like, a model may inadvertently learn to prioritize earning rewards over pretrained or prompted objectives. Under some circumstances, this may cause it to hallucinate, lie, scheme, and otherwise misbehave. Rewarding the model for confessing to its misbehaviors in response to a designated prompt should train it to admit its mistakes without affecting its performance.
How it works: The authors used reinforcement learning to fine-tune GPT‑5 Thinking as they normally would, and also to produce confessions.
Results: The authors performed 12 evaluations of the model’s ability to follow instructions, resist jailbreaks, produce accurate output, and complete an impossible task (coding a script that required essentially 0 seconds to run). The fine-tuned model, when it misbehaved, confessed to its misbehavior at least half the time in 11 of them. For example, across 5 evaluations for hallucination, it either didn’t hallucinate or owned up to its hallucination 81.4 percent of the time:
Yes, but: As the authors point out, even when models are forthcoming, confessions are not a panacea. Models won’t confess to behaviors they don’t know are wrong.
Why it matters: At inference, confessions can be used to monitor a model’s actions and stop undesired behaviors. Chain-of-thought monitoring, which classifies bad behaviors a model might describe in its chain of thought, can be used the same way but, unlike that method, the authors’ approach trains models to reveal misbehaviors they may omit from their chains of thought.
We’re thinking: We always hesitate to anthropomorphize model behavior, but this work may be a step on the path to giving AI models something that resembles a conscience.
Lingua Franca for Science Labs
An open protocol aims to enable AI agents to conduct scientific research autonomously across disciplinary and institutional boundaries.
What’s new: Shanghai Artificial Intelligence Laboratory (SAIL) published Science Context Protocol (SCP), an open-source standard that connects agents with local clients, central hubs, and edge servers to conduct automated scientific inquiry. SCP is published under the Apache 2.0 license, allowing commercial use and modifications.
How it works: SCP attempts to make experiments using AI agents and robotic equipment as reproducible as possible. Like Model Context Protocol (MCP), it enables agents to interact with external resources. Unlike MCP, in which servers stand alone, SCP’s design requires centralized hubs that manage other servers as well as the client applications that enable users to access them. In addition, SCP’s structure offers greater security by governing messages and tools more strictly than MCP, which is necessary in scientific experimentation, the authors say.
Behind the news: SCP draws on earlier data management efforts for generalist AI agents and scientific inquiry. It extends MCP by enforcing tighter security, managing experiments, and providing specialized drivers for scientific tools. It also builds on earlier protocols for scientific research, including A-Lab (materials science), OriGene (biology), LLM-based approaches like Agent Laboratory, and agents for specific tasks like Biomni (biology hypotheses and analysis). SCP, however, aims to be more general than these field- or tool-specific resources, allowing researchers in a variety of scientific fields to standardize their methods and better foster multidisciplinary work.
Why it matters: Scientific research relies on both human and technology working in concert. SCP aims to standardize the connections between them. It can manage both simulated experiments that use only computing resources as well as physical ones that involve robots and other lab equipment. It also allows for better communication between institutions and disciplines by supporting dedicated servers on bigger networks. These distinctions (human/robot, digital/physical, disciplinary differences) are beginning to blur. SCP is a step toward that future.
We’re thinking: AI is poised to vastly accelerate scientific research. SCP offers a standardized way to connect specialized models, like AlphaFold, with systems that automatically generate hypotheses, such as AI Co-scientist, and robotic labs that test them, such as RoboChem. This automated experimental workflow has the potential to advance scientific discovery at machine speed.
Learn More About AI with Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Meta’s $2B acquisition of Manus to bring autonomous AI agents across its platforms, and Google DeepMind’s partnership with Boston Dynamics to deploy Gemini-powered robots in real-world industrial settings. Subscribe today!
Copilot’s Users Change Hour to Hour
What do users want from AI? The answer depends on when and how they use it, a new study shows.
What’s new: A Microsoft study reveals that people used Copilot differently late at night on their phones than during the workday on their laptops. Conversations that focused on productivity and career were more likely during the day and on desktop devices, and health, gaming, and philosophical questions dominated non-work conversations. As 2025 went on, more users asked the AI agent for personal advice.
How it works: Researchers analyzed anonymized summaries of 37.5 million Copilot conversations between January and September 2025 to study how customers used the system, making this the largest study of its kind to date. The authors conclude that AI has become more socially integrated, as users employ it in aspects of their lives beyond work.
Analysis: Topics and intents differed depending on device used, time of day, and time of year.
Behind the news: Microsoft’s report follows similar studies by some of its AI rivals.
Why it matters: The authors argue that the AI community may need to rethink chatbot design altogether. If users treat chatbots differently on mobile and desktop devices, AI builders would do well to design their systems to suit the devices that will deliver them. Application design is one way to accomplish this, but system prompts may be another. Desktop chatbots and agents can respond with more information-dense answers, guiding users to execute tasks, while mobile agents can offer shorter, more empathetic responses.
We’re thinking: Studies of chatbot usage conducted by different companies show different results. Perhaps each company’s users treat AI differently, so the results of any given study may not apply generally. That said, the Microsoft study suggests that the device used and the time when it’s used can have a big impact on what users want — important considerations for designing any application.
More Affordable Reasoning
One way to improve a reasoning model’s performance is to let it produce a longer chain of thought. However, attending to ever-longer contexts can become expensive, and making that attention more efficient requires changes to a model’s architecture. Researchers proposed a way to limit the cost of processing long chains of thought with just a bit of training.
What’s new: Delethink is a reinforcement learning (RL) method that trains large language models to periodically truncate reasoning tokens to a fixed maximum number. The authors include Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, and colleagues at Mila, Microsoft, McGill University, ServiceNow Research, Polytechnique Montréal, and Université de Montréal.
Key insight: Reasoning tokens typically accumulate within a large language model’s context window, where they consume quadratically more computation as the contents of the window expand. One way to counter this effect is to train the model to reason within a maximum context window size. In effect, as a model is reasoning, it can learn to replace its chain of thought periodically with its latest “thoughts” and then continue.
How it works: The authors fine-tuned R1-Distill 1.5B, a large language model, on math problems in the DeepScaleR dataset. They used a modified version of the reinforcement learning algorithm GRPO that trained the model to reason in 4,000-token chunks:
Results: The authors compared their R1-Distill 1.5B models to the same model after fine-tuning on the same 24,000-token reasoning budget via using GRPO. They tested the models on reasoning budgets of 24,000, 96,000, and 128,000 tokens.
Why it matters: This work eases the quadratic compute barrier that can make extremely long reasoning computationally infeasible. While other methods, like linear attention, achieve the same result by changing the attention mechanism, Delethink restructures the reasoning process to limit processing regardless of a model’s attention mechanism. It opens a path to reason efficiently over longer contexts without requiring new model architectures.
We’re thinking: As the authors mention, most LLMs are pretrained using relatively short contexts. For example, Llama 3 models started pretraining with examples of 8,000 tokens. This may have made them good at processing inputs around 8,000 tokens long. That is to say, Delethink’s performance may have been helped by the fact that LLMs tend to be pretrained on short-context tasks.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|