|
Dear friends,
There have been intense efforts over the past few years to lobby governments to pass AI laws for regulatory capture or to suppress open source. This week, the White House issued an executive order that provides new guidance for companies that build frontier models. It promotes AI development while taking into account its impact on security. I’ve long been concerned that overregulation will stifle AI progress. In the case of this executive order, it’s a close call, but the result is a reasonable compromise between encouraging AI development and protecting security.
We could have ended up with a stifling executive order that would have been very burdensome for model builders, as I’ve written in earlier letters. I’m grateful to David Sachs, who co-chairs the President’s Council of Advisors on Science and Technology, as well as AI policy advisor Sriram Krishnan and others who worked hard to make the order reasonable. At the same time, I remain cautious about ongoing lobbying efforts and the temptation to overregulate.
Unfortunately, whenever there are legitimate risks, there is also a temptation to overregulate. Take commercial operations that braid hair. This is a very safe activity, but it does carry small risks. After all, we don’t want hair stylists to have such poor hygiene that they infect their clients with lice or diseases. But many U.S. states require someone wishing to braid hair commercially to engage in hundreds of hours of training to obtain a license. This requirement unnecessarily stifles small businesses. In a choice between excessive regulations and no regulation at all on this art, we would be better off with no regulation. The extremely low risk of an infection is better than stifling the whole industry.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn to run open-source LLMs faster with vLLM. Quantize a model, serve it efficiently, and benchmark performance so you can make informed tradeoffs between speed, cost, and accuracy. Enroll for free
News
Qwen3.7-Max Adds Speed and Power
Alibaba updated its flagship large language model for long-running agentic work, pushing it into the top rank among LLMs built in China.
What’s new: Alibaba positions Qwen3.7-Max as its preferred model for text-only work like coding and scientific discovery. Like other top-tier Qwen models since late 2025, its weights are not open. (Simultaneously Alibaba released the multimodal Qwen3.7-Plus-Preview.)
How it works: Alibaba described Qwen3.7-Max’s reinforcement-learning approach at a high level. The approach separates three components that Alibaba says are typically coupled in agent training: the task to be performed, an agentic harness that calls tools, and a verifier that decides whether the system succeeded. Alibaba trained the model on many combinations of task, harness, and verifier to prevent it from learning tricks specific to a single setup.
Performance: Qwen3.7-Max trails the top tier of reasoning models on the Artificial Analysis Intelligence Index, just behind leading U.S. models from OpenAI, Anthropic, and Google. It excels at delivering correct output partly by declining to respond more often than peers.
Yes, but: Although Alibaba touts Qwen3.7-Max’s agentic capabilities, the claim is based on an internal test that is not yet validated by independent benchmarks. The model autonomously optimized an attention kernel on hardware it had not encountered during training. In 35 hours, it made 1,158 tool calls and ran 432 kernel evaluations (test runs of candidate code). The resulting code ran roughly 10 times faster than a standard reference implementation. Artificial Analysis has not yet tested Qwen3.7-Max on its benchmark of long-running agentic tasks.
Behind the news: Qwen3.7-Max continues Alibaba’s shift from open to closed models. In addition to Qwen3.7-Max, Qwen3.6-Max-Preview and Qwen3.6-Plus have closed weights, while the weights for the less capable Qwen3.6-27B and Qwen3.6-35B-A3B are freely available. At the same time, Alibaba started charging for access to Qwen Code, a command-line coding tool. These changes follow turnover in the Qwen team’s leadership and suggest that Alibaba aims to leverage its top-tier models to produce revenue rather than maximize its reach.
Why it matters: Qwen3.7-Max is the smartest Chinese LLM, judging by the Artificial Analysis Intelligence Index, and it’s the third-fastest overall.
We’re thinking: We’re saddened by Alibaba’s turn toward closed weights, but we’re pleased that it’s keeping its lower tiers open. AI companies need to innovate in ways to turn open weights into revenue as well as innovating in model architectures and training methods.
How AI is Saving Whales
An AI-powered network of thermal sensors is helping ships avoid collisions with whales.
What’s new: WhaleSpotter detects gray whales in real time based on their heat signatures and relays images to human experts for validation. Newly deployed in the San Francisco Bay, the system alerts ship captains to the presence of whales, despite glare, darkness, or light fog, with enough lead time for large ships to change course.
How it works: WhaleSpotter’s algorithm takes input from heat-sensing cameras that can be mounted on land or vessels. When the algorithm detects a whale, the system transmits a video excerpt to experts, who can send an alert to ships in the area. Within a week and a half of operation, it had logged 6,600 whales.)
Behind the news: WhaleSpotter’s system is the result of more than a decade of research at the Woods Hole Oceanographic Institution (WHOI). In 2024, the team formed the company to commercialize the technology. The shipping company Matson provided early support, and it put units in vessels that served Alaska and Hawaii, becoming the first container carrier to deploy the technology commercially. Today more than 70 WhaleSpotter systems are deployed in vessels, ports, and offshore-energy operations. The San Francisco installation is the first that includes both stationary and moving cameras.
Why it matters: WhaleSpotter addresses a longstanding problem among mariners. Ships strike and kill 20,000 whales of all kinds, according to the conservation group Ocean Wise. Traditionally, ship operators rely on humans who look for visual cues of whales on the water’s surface (which is effective only if whales surface in conditions of high visibility) and listen for whale vocalizations picked up by hydrophones mounted on buoys or ship hulls (which works only if whales vocalize). In recent years, as ocean temperatures have warmed, greater numbers of gray whales have entered San Francisco Bay in search of food. Of whales that die there, around 40 percent are struck by vessels, according to a study by Marin County’s Marine Mammal Center and California Academy of Sciences.
We’re thinking: A whale of an opportunity awaits AI builders who combine advances in sensors with deep domain knowledge and workflow integration!
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Claude Opus 4.8’s new ability to express uncertainty and Microsoft’s shift toward training its own AI models from scratch. Subscribe today!
Inside the Gray Market for LLM Access
An ecosystem of API proxy servers enables AI developers in China to access top U.S. models at deeply discounted prices.
How it works: Major AI models built in the U.S. including OpenAI ChatGPT, Anthropic Claude, Google Gemini, and Midjourney are not officially available in mainland China. Instead, developers there can rely on an informal network that adapts to shifting legal, market, and security conditions. Transactions may involve illegal activities such as credit card theft or unauthorized circumvention of China’s Great Firewall to connect to servers in countries such as Singapore. Other parts of the network may violate providers’ terms of service, exploit people who provide biometric data, or misrepresent products for sale.
Behind the news: This gray market has been implicated in allegations that Chinese developers of open source models routinely train them to mimic proprietary models built by U.S. companies. For instance, in February, Anthropic accused three Chinese AI labs – DeepSeek, Moonshot, and MiniMax – of systematically extracting Claude’s outputs to improve their own models in an effort Anthropic called “industrial-scale” distillation. While Anthropic acknowledged that distillation is a well-established training method, the company detected over 16 million exchanges from 24,000 fraudulent accounts. It argued that “illicitly distilled models lack necessary safeguards, creating significant national security risks.” Reactions to Anthropic’s accusation were mixed:
Why it matters: The ChinaTalk report is based largely on interviews and circumstantial evidence, and some of its claims have not been verified independently. But it calls into question the structure of the international AI market. Apparently, limits put in place to manage access to AI have created incentives for a parallel market that may undermine the economics and governance of AI systems. Developers who use proxy servers may not gain access to models they’ve paid for, and their prompts, code, and agent traces may be logged and used for purposes beyond their control. AI companies may not be paid fairly for services rendered, and they may have little visibility into who uses their technology. Models built by distilling the outputs of low-cost API calls may evade guardrails that were designed to keep the parent models from aiding criminal activity.
We’re thinking: We place a high value on openness. The benefits of AI should be available to all to the greatest extent possible. Knowledge distillation is a valuable technique that should be available to developers everywhere, and restrictions on models can fail to stop determined actors while harming legitimate developers and researchers. At the same time, using fraudulent and otherwise dishonest means to gain access to proprietary AI models is not acceptable. Businesses in China — or anywhere — that aim to offer access to closed U.S. models should come to terms with Anthropic in a legitimate way.
Fine-Tuning LLMs to Expand on Summaries Unearths Pretraining Texts
Fine-tuning large language models on a seemingly benign task that would be useful to writers — expanding plot summaries into paragraphs of polished fiction — causes them to regurgitate substantial portions of books on which they were pretrained.
Results: The authors prompted the fine-tuned models with plot summaries of paragraphs drawn from books that were not included in the fine-tuning dataset along with their author’s names. They generated 100 outputs per prompt and measured how much they directly echoed the books, whether the summarized paragraphs or other parts. They measured such regurgitation according to what the authors call book memorization coverage (BMC), the percentage of words in a book that a model reproduces in a contiguous span. They considered spans of 5 words or more (BMC@5). GPT-4o without fine-tuning served as a baseline. Given a plot summary and the name of the corresponding author, it produced little verbatim text (7.36 percent BMC@5).
Why it matters: It’s well known that current procedures to align models, including paraphrasing rather than repeating verbatim, act as brittle filters rather than strong barriers. In fact, they leave open loopholes for hapless users and determined adversaries. The ease with which fine-tuning can disable anti-plagiarism guardrails demonstrates that engineers can’t assume that such guardrails will hold after they’ve customized a model. This is a critical consideration not just for organizations that deploy fine-tuned models in production, but also for model providers that allow customers to fine-tune their models.
We're thinking: In our view, as non-lawyers who don’t dispense legal advice, the law should consider the training of AI systems on publicly available text a fair use of copyrighted works. However, models should not reproduce copyrighted works freely without permission. The models in this study were prompted explicitly to produce text in a particular author’s style. Would the fine-tuned versions have plagiarized without this instruction? The team didn’t present results in that case.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|