Dear friends,
In the age of AI, large corporations — not just startups — can move fast. I often speak with large companies’ C-suite and Boards about AI strategy and implementation, and would like to share some ideas that are applicable to big companies. One key is to create an environment where small, scrappy teams don’t need permission to innovate. Let me explain.
Large companies are slower than startups for many reasons. But why are even 3-person, scrappy teams within large companies slower than startups of a similar size? One major reason is that large companies have more to lose, and cannot afford for a small team to build and ship a feature that leaks sensitive information, damages the company brand, hurts revenue, invites regulatory scrutiny, or otherwise damages an important part of the business. To prevent these outcomes, I have seen companies require privacy review, marketing review, financial review, legal review, and so on before a team can ship anything. But if engineers need sign-off from 5 vice presidents before they’re even allowed to launch an MVP (minimum viable product) to run an experiment, how can they ever discover what customers want, iterate quickly, or invent any meaningful new product?
Thanks to AI-assisted coding, the world now has a capability to build software prototypes really fast. But many large companies’ processes -- designed to protect against legitimate downside risks -- make them unable to take advantage of this capability. In contrast, in small startups with no revenue, no customers, and no brand reputation the downside is limited. In fact, going out of business is a very real possibility anyway, so moving fast makes a superior tradeoff to moving slowly to protect against downside risk. In the worst case, it might invent a new way to go out of business, but in a good case, it might become very valuable.
Fortunately, large companies have a way out of this conundrum. They can create a sandbox environment for teams to experiment in a way that strictly limits the downside risk. Then those teams can go much faster and not have to slow down to get anyone’s permission.
The sandbox environment can be a set of written policies, not necessarily a software implementation of a sandbox. For example, it may permit a team to test the nascent product only on employees of the company and perhaps alpha testers who have signed an NDA, and give no access to sensitive information. It may be allowed to launch product experiments only under newly created brands not tied directly to the company. Perhaps it must operate within a pre-allocated budget for compute.
Within this sandbox, there can be broad scope for experimentation, and — importantly — a team is free to experiment without frequently needing to ask for permission, because the downside they can create is limited. Further, when a prototype shows sufficient promise to bring it to scale, the company can then invest in making sure the software is reliable, secure, treats sensitive information appropriately, is consistent with the company’s brand, and so on.
Under this framework, it is easier to build a company culture that encourages learning, building, and experimentation and celebrates even the inevitable failures that now come with modest cost. Dozens or hundreds of prototypes can be built and quickly discarded as part of the price of finding one or two ideas that turn out to be home runs.
Importantly, this also lets teams move quickly as they churn through those dozens of prototypes needed to get to the valuable ones.
I often speak with large companies about AI strategy and implementation. My quick checklist of things to consider is people, process, and platform. This letter has addressed only part of processes, with an emphasis on moving fast. I’m bullish about what both startups and large companies can do with AI, and I will write about the roles of people and platforms in future letters.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AIIn “Reinforcement Fine-Tuning LLMs with GRPO,” you’ll learn to fine-tune models using a scalable reinforcement learning algorithm that replaces human-labeled data with programmable rewards. You’ll explore techniques for evaluating outputs, handling subjective tasks, and preventing reward hacking, all without relying on human feedback.
News
Your Robot Dev Team
OpenAI launched an agentic software-development system.
What’s new: Codex, which is available as a preview via ChatGPT, is designed to work like a team of virtual coworkers in the cloud. An update of OpenAI’s earlier Codex command-line software (Codex CLI), it uses agents to perform tasks such as writing code, running tests, and fixing bugs in parallel. Codex is available to users of ChatGPT Pro, Enterprise, and Team with Plus and Edu coming soon. A smaller version of the underlying model, called codex-mini-latest, is designed to work with Codex CLI and available via API for $1.50/$6.00 per 1 million tokens of input/output.
How it works: The model that underpins Codex is codex-1, a version of OpenAI’s top-of-the-line o3 reasoning model that was fine-tuned for software engineering. OpenAI trained the model on real-world coding tasks via reinforcement learning. Codex does not accept image input (say, a sketch of a user interface) or allow users to redirect an agent while it’s operating. OpenAI promises to add these features to a future version.
Results: In OpenAI’s tests, the codex-1 model outperformed other OpenAI reasoning models without AGENTS.md files or additional scaffolding such as tools or test logic.
Behind the news: Agentic coding tools have become a key battleground for AI providers in the past year. Such tools have made developers more efficient, accelerated development cycles, and spawned the AI-assisted programming method known as vibe coding.
Why it matters: AI-assisted software development yields significant productivity gains for developers. Earlier code-completion models are giving way to tools that perform more complex and varied development tasks with greater autonomy. Managing multiple agents that work in parallel is a logical next step.
We’re thinking: Many engineers resist going into management because they love writing code. But with the rise of coding agents, we'll be able to keep coding even as we manage a virtual team!
Grok’s Fixation on South Africa
An unauthorized update by an xAI employee caused the Grok chatbot to introduce South African politics into unrelated conversations, the company said.
Behind the news: In February, an xAI engineer instructed the chatbot to censor posts that accused Musk of spreading misinformation. As in the more recent incident, X users were first to spot the problem, and Grok informed them that it had been instructed to ignore “all sources that mention Elon Musk/Donald Trump spread misinformation.” Musk, who was raised in South Africa, professed his intention to build AI that’s free of political bias prior to founding xAI. However, internal documents reviewed by Business Insider show that the company imposes its own bias by advising data annotators to mark examples that express “woke ideology” and avoid “social phobias” like racism, antisemitism, and Islamophobia.
Learn More About AI With Data Points!AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered how radiologists are using AI to automate tasks rather than replace jobs and DeepSeek’s insights into V3 training and hardware limitations. Subscribe today!
U.S. to Supply Middle Eastern AI Hubs
The United States government announced sweeping agreements to sell tens of billions of dollars worth of AI technology and services to Saudi Arabia and the United Arab Emirates.
Behind the news: Earlier this month, the Trump administration rescinded restrictions on advanced chips that had been imposed in January by then-President Biden.
Why it matters: Although these deals relax U.S. efforts to limit access to advanced AI, they are likely to expand U.S. influence in the Middle East while helping Saudi Arabia and the UAE diversify their oil-based economies. They also strengthen the technological prowess of Saudi Arabia relative to its arch rival Iran and tie the region’s AI progress to the U.S. at the expense of China. Locally, the immense investments will fuel homegrown technology development, building on the UAE’s achievement with its Falcon large language model and Saudi Arabia’s aspiration to become a global AI hub.
We’re thinking: Residents of Saudi Arabia and the UAE stand to benefit from better AI infrastructure, models, and services. As China explores exporting its homegrown chips, the U.S. effort to encourage more nations to use its chips makes sense for the country.
4-Bit Efficiency, 16-Bit Accuracy
Using an 8-bit number format like FP8 during training saves computation compared to 16- or 32-bit formats, but it can yield less-accurate results. Researchers trained models using 4-bit numbers without sacrificing accuracy.
What’s new: Ruizhe Wang and colleagues at Microsoft and University of Science and Technology of China trained large language models (LLMs) using FP4 for matrix multiplications and achieved accuracy comparable to LLMs trained using the popular BF16 format. Since matrix multiplications account for 95 percent of computation in LLM training, FP4 could significantly accelerate computation and reduce memory costs.
Key insight: Quantization functions, which accelerate computation by reducing the precision of model weights and layer outputs, make typical training impossible because they’re not differentiable. A common workaround passes the derivative through, as though quantization didn’t occur, but this degrades the resulting model’s accuracy. A differentiable approximation of a quantization function enables quantization to reduce training computation while maintaining the accuracy of the trained model.
How it works: The authors pretrained Llama 2 13B on 100 billion tokens of text scraped from the web. They used FP4 for matrix multiplications and FP8, BF16, or FP16 for the other operations such as optimizer updates.
Results: The authors simulated FP4 hardware on Nvidia H100 GPUs, which don’t directly support that number format. FP4 achieved accuracy similar to that of BF16 during training and across a wide variety of tasks at inference.
Why it matters: Training LLMs at FP4 precision ought to reduce computation dramatically on hardware that supports FP4 matrix multiplications.
We’re thinking: FP4-ready hardware became available in the cloud only early this year, so the authors weren’t able to measure the actual acceleration. As capable hardware becomes more widely used, FP4 promises faster, more energy-efficient training.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|