Dear friends,
LandingAI’s Agentic Document Extraction (ADE) turns PDF files into LLM-ready markdown text. I’m excited about this tool providing a powerful building block for developers to use in applications in financial services, healthcare, logistics, legal, insurance, and many other sectors.
Accurate extraction of data is important in many valuable applications. However, achieving accuracy is not easy.
Further, even though LLMs hallucinate, our intuition is still that computers are good at math. Some of the most disconcerting mistakes I’ve seen a computer make have been when a system incorrectly extracted figures from a large table of numbers or complex form and output a confident-sounding but incorrect financial figure. Because our intuition tells us that computers are good at numbers (after all, computers are supposed to be good at computing!), I’ve seen users find silent failures in the form of incorrect numerical outputs particularly hard to catch.
How can we accurately extract information from large PDF files? Humans don’t just glance at a document and reach a conclusion on that basis. Instead, they iteratively examine different parts of the document to pull out information piece by piece. An agentic workflow can do the same.
ADE iteratively decomposes complex documents into smaller sections for careful examination. It uses a new custom model we call the Document Pre-trained Transformer (DPT); more details are in this video. For example, given a complex document, it might extract a table and then further extract the table structure, identifying rows, columns, merged cells, and so on. This breaks down complex documents into smaller and easier subproblems for processing, resulting in much more accurate results.
Keep building! Andrew
A MESSAGE FROM RAPIDFIRE.AIRun 20x more experiments using the same resources, even on a single GPU! RapidFire AI is open source software for efficient model training. Run configs in hyper-parallel with live control and automatic orchestration. Get started: pip install rapidfireai
News
OpenAI On the Road to Trillion-Dollar Spending
A flurry of announcements brought into sharper focus OpenAI’s plans to build what may amount to trillions of dollars of global computing capacity.
What’s new: OpenAI, Oracle, and SoftBank, the primary partners in the massive data-center buildout called Stargate, announced 5 new sites in the United States that entail $400 billion in spending in addition to its prior commitments. In addition, OpenAI introduced Stargate UK, a partnership with Nvidia and the Norwegian data-center builder Nscale that will build AI infrastructure in England. All told, OpenAI’s current plans will cost $1 trillion, The Wall Street Journal reported.
How it works: OpenAI forecasts demand for data centers in terms of electricity they will consume. Each 1-gigawatt increment of capacity (roughly enough to light 100 million LED bulbs) costs around $50 billion to build. The company’s current plans amount to 20 gigawatts worldwide, and it predicts demand as high as 100 gigawatts, according to one executive. To satisfy that level of demand would bring the total outlay to $5 trillion (roughly the gross domestic product of Germany).
Behind the news: Stargate, a partnership between OpenAI, Oracle, and SoftBank to build 20 data centers over four years at a cost of $500 billion, began in January. That plan is proceeding ahead of schedule and has expanded considerably.
Yes, but: Some analysts worry that giant infrastructure commitments by big AI companies could jeopardize their financial health if demand for AI doesn’t keep pace. “Someone is going to lose a phenomenal amount of money,” OpenAI CEO Sam Altman told The Verge, adding that winners will gain even more.
Why it matters: Big AI’s capital spending continues to rise. In addition to Stargate, Alphabet, Amazon, Meta, and Microsoft together plan to spend more than $325 billion this year on data centers, with much more to come. This outsized effort brings with it outsized risks: Companies are betting their balance sheets, investors are putting money on the line, governments are hoping that data centers will supercharge their economies, energy providers are scrambling to provide sufficient electricity, and communities are balancing potential prosperity versus environmental hazard. The optimistic view sees AI’s value rising, costs falling, social benefits spreading, and energy use declining as AI models produce higher-quality output with greater efficiency.
We’re thinking: $5 trillion spent on AI infrastructure is more than 10 times OpenAI’s latest valuation. But the company’s valuation has increased by more than 20 times since it launched ChatGPT in 2022. So far, its bets are paying off.
AI Generates Viral Genomes
Researchers used AI models to create novel viruses from scratch.
What’s new: Samuel King and colleagues at the nonprofit biotech lab Arc Institute, Stanford University, and Memorial Sloan Kettering Cancer Center used model architectures related to transformers, trained on DNA sequences rather than text, to synthesize viruses that fight a common bacterial infection.
Key insight: The class of models known as genomic language models can produce DNA sequences by generating chains of nucleotides, the building blocks of DNA. Typically such models produce sequences up to the length of a single gene, of which many are required to make a genome. But fine-tuning such models on sequences associated with a family of viruses can enable them to produce longer sequences within that family. At inference, feeding the fine-tuned model the initial part of the genome of a virus from the fine-tuned family can prompt the model to generate an entire novel genome.
How it works: The authors fine-tuned existing genome language models on the genomes of 14,500 viruses in the Microviridae family of bacteriophages, viruses that kill specific bacteria. Using the fine-tuned models, they generated potential viral genomes similar to Microviridae, identified the most promising ones, and synthesized them.
Results: The authors tested a cocktail of 16 synthetic viruses on 3 bacterial strains that are resistant to ΦX174. Initially, the cocktail failed to kill the bacteria within three hours. However, when they moved the viruses to new cultures of the same bacterial strain to give them opportunities to recombine and mutate, the bacteria succumbed.
Behind the news: Genome engineering typically relies on selective breeding, introducing random mutations, or making specific changes based on known biology, all of which modify existing genomes instead of designing new ones. These approaches struggle to change features like genome lengths and the speed at which bacteriophages kill bacterial cells.
Why it matters: Bacteriophage therapy is a potential alternative to antibiotics. However, bacteria can evolve resistance bacteriophages, just as they develop resistance to antibiotics. In this work, AI generated genomes for viable, diverse, novel synthetic bacteriophages that defeated resistant bacteria. This approach could give doctors a fresh approach to fighting bacterial infections.
We’re thinking: Making new viruses from scratch is cause for both excitement and concern. On one hand, the implications for medicine and other fields are enormous. On the other, although the authors took care to produce viruses that can’t infect humans, malicious actors may not. Research into responding to biological threats is as critical as research that enables us to create such threats.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Meta unveiling an open world model for code generation research and Anthropic releasing Claude Sonnet 4.5 with major upgrades for coding. Subscribe today!
Generating Music, Paying Musicians
A Swedish organization that collects royalties on behalf of songwriters and record companies has formed a technology-legal-business ecosystem designed to allow AI developers to use music legally while compensating publishers of recordings and compositions.
What’s new: STIM, which collects royalties on behalf of over 100,000 composers and recording artists, devised a license for use of musical works to train AI models. Sureel, a Swedish startup, provides technology that calculates the influence of a given training example on a model’s output. The music-generation startup Songfox is the first licensee.
How it works: STIM considers its deal with Songfox a pilot project that will shape future licensing arrangements. Members of the organization can license their music if they (i) opt in to allowing AI developers to use it and (ii) distribute it via STIM’s music-by-subscription subsidiary Cora Music.
Yes, but: To take advantage of the license, AI developers must integrate Sureel’s attribution technology into their model training process. Consequently, the STIM license is not useful for artists that aim to collect revenue from music-generation companies such as Suno and Udio, which trained their models without Sureel’s involvement.
Behind the news: Owners of copyrights to creative works have sued AI companies for training models on their works without permission, but the likely outcomes of such lawsuits are uncertain.
Why it matters: It remains to be seen whether allowing AI models to learn from copyrighted works is considered fair use under the laws of many countries. Regardless, the current uncertainly over the interpretation of existing laws opens AI companies to potential liability for claims that they have infringed copyrights. Licensing could help to insulate AI developers from legal risk and incentivize creative people to continue to produce fresh works on which to train next-generation models. The STIM license is an early effort to find a formula that works for both parties.
We’re thinking: As technology has evolved from recording to broadcast to streaming, the avenues for musicians to profit from their work have increased, and we expect AI to continue to expand the options.
Earth Modeled in 10-Meter Squares
Researchers built a model that integrates satellite imagery and other sensor readings across the entire surface of the Earth to reveal patterns of climate, land use, and other features.
What’s new: Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, and colleagues at Google built AlphaEarth Foundations (AEF), a model that produces embeddings that represent every 10-meter-square spot on the globe for each year between 2017 and 2024. The embeddings can be used to track a wide variety of planetary characteristics such as humidity, precipitation, or vegetation and global challenges such as food production, wildfire risk, or reservoir levels. You can download them here for commercial and noncommercial uses under a CC BY 4.0 license. Google offers financial grants to researchers who want to use them.
Key insight: During training, feeding a model one data type limits its performance. On the other hand, feeding it too many types can cause it to learn spurious patterns. A sensible compromise is feeding it the smallest set of input data types that contain most of the relevant information.
How it works: The authors used three data types — optical, radar, and thermal videos taken by satellites— as training inputs, but the loss terms referred to several others. Given the three types of satellite videos, each of which represented around 1.28 square kilometers, AEF encoded each video using unspecified encoders. It fed the encoded video to a custom module that integrated both self-attention (within and across frames) and convolutional layers. The architecture enabled the model to produce embeddings that represented each 10x10-meter area over the course of a year. To learn to produce good embeddings, the team trained the model using 4 loss terms:
Results: The authors compared AEF to 9 alternatives, including manually designed approaches to embedding satellite imagery such as MOSAIKS and CCDC as well as learned models like SatCLIP. Across 11 datasets, AEF outperformed the alternatives by a significant margin.
Why it matters: Satellites examine much of Earth’s surface, but their output is fragmentary (due to cloud cover and orbital coverage) and difficult to integrate. Machine learning can pack a vast range of overhead data into a comprehensive set of embeddings that can be used with Google’s own Earth Engine system and other models. By embedding pixels, AEF makes it easier to map environmental phenomena and track changes over time, and the 10x10-meter resolution offers insight into small-scale features of Earth’s surface. The team continues to collect data, revise the model, and publish updated embeddings.
We’re thinking: This project brings AI to the whole world!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|