Dear friends,
Automated software testing is growing in importance in the era of AI-assisted coding. Agentic coding systems accelerate development but are also unreliable. Agentic testing — where you ask AI to write tests and check your code against them — is helping. Automatically testing infrastructure software components that you intend to build on top of is especially helpful and results in more stable infrastructure and less downstream debugging.
Software testing methodologies such as Test Driven Development (TDD), a test-intensive approach that involves first writing rigorous tests for correctness and only then making progress by writing code that passes those tests, are an important way to find bugs. But it can be a lot of work to write tests. (I personally never adopted TDD for that reason.) Because AI is quite good at writing tests, agentic testing enjoys growing attention.
In the last example, when pressed, the agent apologized and agreed “that was an incredibly stupid mistake.” This made us feel better, but the damage had already been done!
I love coding agents despite such mistakes and see them making us dramatically more productive. To make them more reliable, I’ve found that prioritizing where to test helps.
I rarely write (or direct an agent to write) extensive tests for front-end code. If there's a bug, hopefully it will be easy to see and also cause little lasting damage. For example, I find generated code’s front-end bugs, say in the display of information on a web page, relatively easy to find. When the front end of a web site looks wrong, you’ll see it immediately, and you can tell the agent and have it iterate to fix it. (A more advanced technique: Use MCP to let the agent integrate with software like Playwright to automatically take screenshots, so it can autonomously see if something is wrong and debug.)
Keep testing! Andrew
A MESSAGE FROM DEEPLEARNING.AIBuild an app that works with Box content in place. In this short course, you’ll connect to the Box MCP server to discover tools, list folders, and extract fields without moving files, then evolve the solution into a multi-agent system coordinated via A2A. Sign up for free
News
Qwen3-Next Accelerates
Alibaba updated its popular Qwen3 open-weights models with a number of fresh, speed-boosting tweaks.
What’s new: Alibaba released weights for Qwen3-Next-80B-A3B in Instruct and Thinking variations. They incorporate some of the latest research on alternate forms of attention and mixture-of-experts approaches to use less processing power at inference.
How it works: The team modified the Qwen3-30B-A3B architecture and training method to increase training efficiency and stability as follows:
Results: Qwen3-Next models were faster than Qwen3-30B-A3B and Qwen3-32B in Alibaba’s tests. They performed in the middle of the pack in independent tests.
Behind the news: Since transformers gained traction, researchers have been working to design faster variants of attention and new layers (like Mamba). However, the resulting models tend to be limited in size and performance relative to the state of the art when the innovations were proposed, sometimes because adapting them to existing GPU hardware is difficult. Qwen3-Next takes advantage of recent research without these limitations. It outperforms current large and popular models, potentially pointing a way toward future LLM architectures.
Why it matters: Qwen3-Next offers a recipe for faster inference without compromising performance. Mixture-of-experts architectures enable models to learn more while using fewer parameters at inference, increasing throughput. Swapping vanilla attention for more-efficient layers boosts throughput further, especially as context lengths increase. Predicting multiple tokens at once provides an additional edge.
We’re thinking: Rapidly rising demand for cheaper and faster token generation is pushing more teams to tune mixture-of-experts architectures so they use fewer active parameters. Such techniques will continue to grow in importance as demand for inference increases.
States Ban AI-Driven Treatments for Mental Health
Illinois became the second U.S. state, after Nevada, to ban AI applications that administer psychotherapy.
What’s new: Illinois passed the Wellness and Oversight for Psychological Resources Act, which prohibits uses of AI to treat mental-health conditions without a doctor’s direct participation. Violations could result in fines up to $10,000 for each use.
How it works: The bill effectively bans the use of chatbots to administer therapy on their own and restricts some other uses of AI in mental-health care, even by licensed professionals. Proponents say it will protect patients from unproven treatments and human therapists from being replaced by AI systems.
Behind the news: In June, Nevada became the first U.S. state to prohibit AI in treatments for mental health, and California, New Jersey, and Pennsylvania are considering their own limits. These actions come as some experts in public and mental health warn of potential hazards posed by chatbots that deliver therapy without having established their safety and effectiveness. An April study found that many general-purpose chatbots failed to respond appropriately when given conversational prompts that simulated mental-health issues. Recent weeks have seen reports that detailed unhealthy relationships between chatbots users, and some conversations between chatbots and vulnerable people have led to harm.
Why it matters: In the absence of national laws, regulation of AI in the U.S. is proceeding state by state. The Illinois and Nevada laws essentially ban AI-driven therapy, whether it’s dispensed by general-purpose models or those that have been fine-tuned and shown to behave in ways that are consistent with accepted clinical practice. They prohibit companies from marketing poorly designed and untested AI systems as beneficial therapeutic agents, but they also prevent licensed mental-heath professionals from using specialized systems to make treatment decisions. The upshot is that helpful AI models will be unavailable to people who may benefit from them.
We’re thinking: We favor regulations based on applications rather than underlying technology. However, by banning AI-driven therapy outright, Illinois and Nevada have left no room for legitimate AI-powered applications that provide effective therapy. Large language models are helping many people with therapy-like matters. They can lower the cost of therapy, offer around-the-clock service, and alleviate shortages of qualified professionals. They’re not yet perfect replacements for human therapists, but they will improve. Banning them will do more harm than good.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Claude’s new ability to create and edit Microsoft Office files directly, and Alibaba’s Qwen3-Next model that uses hybrid attention for long context inputs. Subscribe today!
Drone Swarms Go to War
Swarms of drones that coordinate with one another autonomously have become a battlefield staple in Ukraine.
What’s new: The Ukrainian army is deploying squads of weaponized drones that decide among themselves which will attack first and when. Small swarms controlled by software developed by Swarmer, a U.S.-based startup, have been targeting Russian soldiers, equipment, and infrastructure for the better part of a year, The Wall Street Journal reported.
How it works: Swarmer’s swarm-control software is designed to work with a wide variety of unmanned aerial vehicles. A human operator makes decisions about use of lethal force: “You set the target and they do the rest,” said a Ukrainian officer whose unit has used Swarmer’s technology more than 100 times. Unlike popular drone-driven light shows, in which a crowd of drones are pre-programmed to move in particular ways, Swarmer swarms adapt to one another’s motions. And unlike typical drones, which depend on cloud computing, they operate in ways that are designed to avoid enemy interference with communications. For instance, the human operator can transmit to the swarm only once per minute. The units maintain distance and avoid collisions with one another, but they navigate independently to avoid presenting an aggregate target.
Behind the news: Drones are deployed en masse by both sides as Ukraine defends itself against invasion by Russia. They have changed the course of the war, as tactical and strategic goals have shifted to accommodate enormous fleets of unmanned air power, often in the form of consumer-grade equipment.
Why it matters: AI has a long history in warfare, and drone swarms are only the latest of what promises to be an ongoing stream of military uses of the technology. Yet the increasing autonomy of military drone systems poses difficult challenges, both practical and ethical. Swarmer’s software keeps humans in the loop to make firing decisions but, driven by the brutal logic of armed conflict, drones seem bound to become more widespread, capable, and autonomous.
We’re thinking: War is tragic. At the same time, democratic nations must have the means to defend themselves, and we support the Ukrainian people in their struggle to defend their country.
Transformers Energized
A new type of transformer can check its work. Instead of guessing the next output token in one shot like a typical transformer, it starts with a rough version of the token and improves it step by step.
Energy-based model basics: For a given input context paired with a candidate response (for example, a prompt and potential next token), an energy-based model produces a number called “energy” that represents how likely the potential next token would follow the prompt. During training, the model learns to assign low energy if a context/potential-response pair is very likely and high energy if it’s not.
Key insight: A typical transformer is trained to predict the next token directly, while an energy-based model learns how to score an input text. How would a researcher use an energy-based model to predict the next text token? A naive way would be to measure the energy of an input prompt with a random token, randomly modify the text token a number of times, and select the prompt-token combination with the lowest energy. Instead of random modification, a model can use gradient descent repeatedly to compute the change needed to decrease the token’s energy. This process enables the model to refine the token over several steps, ultimately producing a token with low energy (and high likelihood to follow the previous text).
How it works: Among other models, the authors trained a 44 million-parameter autoregressive EBT on the RedPajama-Data-v2 dataset of 32 billion text tokens scraped from the web. As input, EBT received a sequence of tokens and a probability vector (for the next token). It learned to output an energy score that measured the likelihood that the predicted next token would follow the context.
Results: The authors compared EBTs and transformers of the same sizes and trained on the same numbers of tokens by measuring perplexity (a measure of the likelihood that a model will predict the next word, lower is better) on several benchmarks including math problems, question answering, and reading comprehension. Overall, EBT proved to be better at generalization but worse at generating text that followed the training data’s distribution. EBTs in the sizes tested proved to be significantly less compute-efficient than transformers, but they scaled better, and larger versions may be more efficient than transformers.
Why it matters: This work offers intriguing possibilities for higher performance at larger scales. A typical transformer learns to predict the next token directly, but that locks it into a single forward pass per output token and provides no built-in measure of whether the prediction is good. In contrast, EBT learns to assign a score that it uses both to generate tokens (by iteratively lowering their energy) and to verify them (by checking if the energy is high). Work remains to learn whether larger EBTs can be more compute-efficient.
We’re thinking: When it comes to energy, AI research is a renewable resource!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|