|
Dear friends,
As AI agents accelerate coding, what is the future of software engineering? Some trends are clear, such as the Product Management Bottleneck, referring to the idea that we are more constrained by deciding what to build rather than the actual building. But many implications, like AI’s impact on the job market, how software teams will be organized, and more, are still being sorted out.
The theme of our AI Developer Conference on April 28-29 in San Francisco is The Future of Software Engineering. I look forward to speaking about this topic there, hearing from other speakers on this theme, and chatting with attendees about it. We’re shaping the future, and I hope you will join me there!
In software engineering, I see a lot of exciting work ahead to adapt our workflows. It is already clear that: (i) As AI makes coding easier, a lot more people will be doing it. (ii) Writing code by hand and even reading (generated) code is not that important, because we can ask an LLM about the code and operate at a higher level than the raw syntax (although how high we can or should go is rapidly changing). (iii) There will be a lot more custom applications, because now it’s economical to write software for smaller and smaller audiences. (iv) Deciding what to build, more than the actual building, is becoming a bottleneck. (v) The cost of paying down technical debt is decreasing (since AI can refactor for you).
I’m excited to explore these and other questions about the future of software engineering at AI Dev. I expect this to be an exciting event. Please join us!
Keep building, Andrew
A MESSAGE FROM DEEPLEARNING.AINew course available: Efficient Inference with SGLang: Text and Image Generation. Learn how LLM inference works and how to reduce cost and latency using KV cache and RadixAttention in SGLang. Apply the same principles to accelerate diffusion models and image generation. Enroll now
News
Claude Mythos Preview Raises Security Worries
Anthropic took unusual steps to prepare the world for a forthcoming large language model that it said poses extraordinary risks to cybersecurity.
What’s new: Claude Mythos Preview, which is not generally available, broadly outperforms the two-month-old Claude Opus 4.6, but it’s “strikingly capable” of identifying and exploiting vulnerabilities in existing code, Anthropic said. The company detailed its capabilities in a model card that fills 244 pages — the first time it has published a model card without making the model itself available commercially. Anthropic did not announce plans for a commercial release.
Precautions: To harden existing code against such capabilities, the company assembled a consortium called Project Glasswing that includes Amazon Web Services, Apple, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, and Nvidia along with more than 40 other organizations. Anthropic is funding exclusive access for Glasswing members ($100 million worth of credits at $25/$125 per million input/output tokens) as well as $4 million in donations to organizations that are devoted to maintaining open source projects, so these organizations can discover and patch vulnerabilities in code they control before the model, or another one like it, becomes widely available. Anthropic promised to share what Glasswing does and learns.
Security risks: Anthropic didn’t train Claude Mythos Preview on security-related tasks. The model’s skills arose from training in coding, reasoning, and autonomous behavior.
Performance: Claude Mythos Preview’s reported performance is impressive. In tests conducted by Anthropic, it substantially outperformed Claude Opus 4.6, OpenAI GPT-5.4, and Google Gemini 3.1 Pro on several popular benchmarks. The model card details the team’s efforts to minimize the impact of contamination of the model’s training data with benchmark test sets.
Yes, but: Anthropic’s way of introducing Claude Mythos Preview — promoting safety worries while withholding access from all but a small number of selected parties — comes right out of OpenAI’s early playbook. In 2019, that company promoted GPT-2’s ability to generate plausible text while keeping the model under wraps, citing the danger of its ability to produce disinformation and spam. Of course, the world greeted GPT-3 and subsequent iterations with unprecedented enthusiasm, and society has adjusted to the foibles and limitations of subsequent large language models. Anthropic’s caution may be justified but, like OpenAI’s product-release strategy, it has elements of a publicity stunt.
Why it matters: As large language models become more capable of coding, they also become more capable of finding bugs and exploiting them. Anthropic says the forthcoming Claude Mythos Preview does this dramatically better than its predecessors, posing risks to critical software that keeps society running. As long as the model outperforms top competitors, the company has little to lose — and potentially much to gain (say, avoiding damage to its brand after making the world less secure, if nothing else) — by creating a buzz while it prepares to deploy the model for commercial use.
We’re thinking: In the long term, as coding agents become more capable, defenders will gain the upper hand, as easy identification of vulnerabilities results in more-secure systems. But navigating this transition will be tricky, since advanced attackers may use tools that defenders have not yet gotten around to using.
Pitfalls in Assistive Models for the Blind
People whose vision is impaired increasingly use AI to assess their own appearance, raising questions about the psychological impact of AI models that are trained on conventional standards of beauty.
What’s new: Milagros Costabel, a blind freelance journalist, wrote about her experiences using a vision-language model as a virtual mirror. Her article on BBC.com explores challenges and potential pitfalls of relying on AI to judge personal qualities that are largely subjective and individual.
How it works: Costabel uses Be My Eyes, a smartphone app that provides a voice chatbot based on GPT-4 Vision. (Users can request to speak with a human volunteer to address critical or difficult issues.) She acknowledges the benefit of greater independence but highlights the challenge for blind people, who have little choice but to trust AI’s interpretation of what it sees. “For many blind people interviewed for this article, the experience feels both empowering and disorienting at once,” she writes.
Behind the news: A number of products aim to use vision-language models to assist visually impaired users. In addition to Be My Eyes and Envision AI, offerings include Microsoft Seeing AI, Aira Explorer, and navigation app Oko. Such apps increasingly connect with wearable devices. For instance, Envision Glasses and Ray-Ban Meta Smart Glasses (you can read a vision-impaired user’s report here) provide hands-free, real-time narration that describes surroundings, reads documents, and identifies specific faces.
Why it matters: AI applications that serve visually impaired users should be able to provide objective, factual interpretations of visual input, to the extent that it’s feasible. More broadly, truly accessible AI products must accommodate users who have no way to verify their output. This may require further technology development, and meanwhile keeping humans in the loop (as Be My Eyes, Aira Explorer, and others do) or providing certainty scores that help users modulate their trust in the model’s output.
We’re thinking: Building products of any kind requires empathy with users, but building AI products that help people to overcome sensory and other impairments requires exceptional empathy. Extensive testing in the real world and careful revisions based on user feedback will go a long way toward making products that help people both do and feel their best.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered how NeurIPS reversed restrictions on Chinese researchers after a boycott threat, highlighting rising tensions in global AI research. Subscribe today!
Dark DNA Unveiled
An open-weights model could help scientists compare the impact of genetic variations, identify mutations that cause diseases, and develop treatments.
What’s new: AlphaGenome interprets the 98 percent of the human and mouse genomes that don’t code for proteins but regulate gene expression and other functions. It finds properties such as where in a DNA sequence a gene begins and ends; how much RNA it directs a cell to produce; and where, as a cell reads a gene, it skips over parts of the gene sequence, a process in which errors can cause a variety of diseases.
How it works: The authors pretrained 64 models of identical architecture on gene sequences and their properties, and then distilled their knowledge into a single model. Thus AlphaGenome learned the aggregate performance of all 64 models. They pretrained the models on mouse and human DNA and gene properties in four large public datasets.
Results: The authors compared AlphaGenome to nine earlier models across two broad evaluations: finding properties of a gene sequence and predicting the effect of mutation (an alteration in the sequence) on those properties.
Why it matters: As recently as 15 years ago, non-coding DNA was widely believed to have no function at all. Since then, probing its functions has required painstaking experimentation. AlphaGenome puts that research into a model that anyone can use to find connections between this genomic netherworld and biological processes. For instance, the model makes it practical to compare functional differences between normal and mutated genes, revealing information that could be valuable in medicine and other biological disciplines.
We’re thinking: The notion that most of the human genome was “junk DNA” was curious, and scientists have discovered that it does essential things. We may be about to learn just how much it can do.
How Liquids and Gases Behave
Simulating complex physical systems through traditional numerical methods is slow and expensive, and simulations based on machine learning are usually specialized for a specific type of system, such as water in a pipe or atmosphere surrounding a planet. Researchers built a general, transformer-based model for liquids, gases, and plasmas.
What’s new: Michael McCabe and colleagues at Polymathic AI Collaboration, a multi-institution, multi-disciplinary lab for scientific AI, released Walrus, a 1.3 billion-parameter model that simulates how fluids move, interact, and change over time. The model is freely available under an MIT license.
Key insight: Models often fail to simulate chaotic systems, which are highly sensitive to initial conditions, over long time periods because errors compound over time. In transformers, these failures also stem from aliasing, in which errors compound in specific locations over multiple time steps. (The resulting artifacts resemble pixelation in image processing.) Randomly jittering, or time-shifting, the data at each time step before feeding it back into the model reduces these artifacts.
How it works: Walrus predicts the next state of a physical system given a sequence of previous states. It comprises (i) two encoders, one for 2D data like velocity and one for 3D data like volume, that compress previous snapshots of the physical system, or frames, into tokens; (ii) a split attention block that generates tokens that represent the next frame; and (iii) two decoders (2D and 3D) that turn those tokens into the next frame.
Results: The authors compared Walrus to earlier physics models including MPP-AViT, Poseidon, and DPOT.
Why it matters: Walrus potentially accelerates simulations in fields like climate science, aerospace, and materials. Moreover, the authors’ jittering technique may improve vision and video generation models by suppressing artifacts that are common to transformer architectures. In fact, the pixel-like artifacts common to vision transformers led them to take this approach.
We’re thinking: Physics’ shift from specialized numerical solvers and special-purpose models to general-purpose transformers mirrors natural language processing’s evolution from task-specific models to LLMs. Just as LLMs learn to read and predict the most likely next words across a wide range of tasks and languages, transformers trained on diverse data appear to be able to predict the behavior of diverse materials in a wide array of domains.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|