Dear friends,
One of the most effective things the U.S. or any other nation can do to ensure its competitiveness in AI is to welcome high-skilled immigration and international students who have the potential to become high-skilled. For centuries, the U.S. has welcomed immigrants, and this helped make it a worldwide leader in technology. Letting immigrants and native-born Americans collaborate makes everyone better off. Reversing this stance would have a huge negative impact on U.S. technology development.
Nonetheless, other nations are working hard to attract immigrants who can drive innovation — a good move for them! Many have thoughtful programs to attract AI and other talent. There are the UK’s Global Talent Visa, France’s French Tech Visa, Australia’s Global Talent Visa, UAE’s Golden Visa, Taiwan’s Employment Gold Card, China’s Thousand Talents Plan, and many more. The U.S. is fortunate that many people already want to come here to study and work. Squandering that advantage would be a huge unforced error.
These stories, and many far worse, are heartbreaking. While I do what I can to help individuals I know personally, it is tragic that we are creating such an uncertain environment for immigrants, that many people who have extraordinary skills and talents will no longer want to come here.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn to build with Meta’s Llama 4 models in this new short course! Work with the official Llama API, explore multimodal reasoning with image grounding, reason over long-context, and use tools for prompt optimization and synthetic data generation. Join in for free
News
Apple Sharpens Its GenAI Profile
Apple revamped two vision-language models in a bid to catch up with fast-moving competitors.
What’s new: Apple updated the Apple Foundation Models (AFM) family, including smaller on-device and larger server-hosted versions, to improve their capabilities, speed, and efficiency. It also released the Foundation Models framework, an API that enables developers to call the on-device model on Apple devices that have Apple Intelligence enabled.
How it works: Introduced last year, AFM models use a vision encoder to produce an image embedding, which a vision adapter modifies for the LLM. The LLM takes the modified image embedding and text prompt and generates a response. The team trained the systems to predict the next token, align embeddings produced by the vision encoder and LLM, and align responses with human feedback. They trained the models on text and image-text data from publicly available datasets, data scraped from the web, and data licensed from publishers.
Performance: In human evaluations, the AFM models achieved mixed performance compared to selected models of similar or greater size. The tests included language tasks in U.S. English, non-U.S. English (including Canada and UK), and a basket of European and Asian languages.
Behind the news: Apple dominated social media last week with a controversial paper that purported to show that 5 state-of-the-art reasoning models couldn’t solve puzzles beyond a certain level of complexity.
Why it matters: Apple has been viewed as falling behind in AI. A promised upgrade of Siri, Apple’s AI assistant, is delayed indefinitely, and the lack of advanced AI features in new iPhones has led to a class-action lawsuit. Meanwhile, Google and its Android smartphone platform are racing ahead. The new models, especially the Foundation Models framework, look like a bid for a reset.
We’re thinking: Apple may be behind in AI, but its control over iOS is a huge advantage. If the operating system ships with a certain model and loads it into the limited memory by default, developers have a far greater incentive to use that model than an alternative. Limited memory on phones and the large size of good models make it impractical for many app developers to bundle models with their software, so if a model is favored by Apple (or Android), it’s likely to gain significant adoption for on-device uses.
Hollywood Joins AI Copyright Fight
Hollywood studios joined the record companies, publishers, and artists in the fight against companies that have trained AI models on their copyrighted works.
What’s new: Disney and Universal sued Midjourney, accusing the image-generation startup of training its models on “countless” unauthorized copies of their copyrighted works and distributing images that depict characters the plaintiffs created.
How it works: Disney and Universal asked the court to order Midjourney to cease its alleged unauthorized distribution of their intellectual property. Further, they want Midjourney, which took in revenue of $300 million in 2024, to pay unspecified damages based on the claim that copyright law entitles them to $150,000 per infringed image. The studios accuse Midjourney of both direct infringement (that is, directly violating their copyrights by copying, displaying, or distributing their work without permission) and secondary infringement (enabling or encouraging direct infringement by others).
Behind the news: Copyright law is ambiguous on whether training AI systems on copyrighted works requires permission from the copyright holders, and several cases are winding their way through U.S. courts to answer this question. Starting in 2023, artists, authors, and publishers initiated legal actions against Alphabet, Meta, and OpenAI. Last year, some of the largest companies in the recording industry sued the AI music startups Suno and Udio. In February, a Delaware federal court issued the first major decision in this area, when a U.S. Circuit judge ruled that an AI-powered legal research service could not claim that training its models on writings produced by Thomson Reuters was a fair use because the resulting products competed with Thomson Reuters’ own products.
Why it matters: AI systems require enormous amounts of data. Historically, developers have felt free to use whatever copyrighted works they could find, typically online. As AI systems show greater potential to erode the market for human-made creative works — and to reproduce such works and create new works derived from them — owners of copyrighted material are looking for compensation as well as protection against this new form of competition. A single lawsuit won’t settle the issue, but this case, brought by two of the most powerful entertainment companies in the world, could set a precedent that strongly influences future lawsuits, the behavior of AI companies, and future legislation to update copyright for the AI era.
We’re thinking: Film studios and music labels once considered YouTube a copyright violator. Viacom, the entertainment company behind MTV and The Jersey Shore, once sued YouTube for copyright infringement. YouTube prevailed in two proceedings before the parties settled out of court, and YouTube subsequently improved its ability to detect and remove copyrighted works. Today, movie and recording companies rely on the enormously popular web video service to promote their wares. Given that history, Hollywood might consider partnering with AI companies instead of suing them. The pie would be bigger if Hollywood and AI companies worked together, although how to divide it would need to be worked out.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered how Google DeepMind is using neural networks to predict hurricanes up to 15 days in advance. Subscribe today!
More Reasoning for Harder Problems
OpenAI launched o3-pro, a more capable version of its most advanced reasoning vision-language model.
What’s new: o3-pro is designed to respond to difficult challenges involving science, mathematics, and coding. But its reasoning firepower dramatically slows response times.
Performance: o3-pro outperformed OpenAI’s own o3 (set to medium effort) and o1-pro in tests performed by OpenAI.
What they’re saying: Reviews of o3-pro so far generally are positive, but the model has been criticized for the time it takes to respond. Box CEO Aaron Levie commented that o3-pro is “crazy good at math and logic.” However, entrepreneur Yuchen Jin noted that it’s the “slowest and most overthinking model.”
Behind the news: OpenAI rolled out o3-pro with a lower price, $20/$80 per 1 million input/output tokens, than o1-pro (which was priced at $150/$600 per 1 million input/output tokens but was deprecated in favor of the new model). Simultaneously it cut the price of o3 by 80 percent to $2/$8 per 1 million input/output tokens. These moves continue the plummeting price of inference over the past year. DeepSeek-R1 offers performance that approaches that of top models for $0.55/$2.19 per 1 million input/output tokens.
Why it matters: OpenAI is pushing the limits of current approaches to reasoning, and the results are promising if incremental. o3-pro’s extensive reasoning may appeal to developers who are working on the multi-step scientific problems. For many uses, though, the high price and slow speed may be a dealbreaker.
We’re thinking: Letting developers choose between o3 and o3-pro lets them calibrate their computational budget to the difficulty of the task at hand. What if we want to do the same with a trained, open-weights, large language model? Forcing an LLM to generate “Wait” in its output causes it to keep thinking, and can improve its output significantly.
LLM Rights Historical Wrongs
In Northern California, old property deeds may still include racial clauses: language, made illegal decades ago, that was designed to ban people of color from owning or living in certain homes. The state of California now requires counties to find and remove them, but manually combing through millions of documents would take years. Researchers used AI to find them automatically.
What’s new: Faiz Surani, Mirac Suzgun, and colleagues at Stanford University and Princeton University fine-tuned a large language model to find racial clauses in deeds for property in the California county of Santa Clara.
Key insight: Manual and keyword searches may fail to catch racial clauses if they’re obscured by subtle wording or errors in optical character recognition (OCR). But a fine-tuned large language model can understand context, identify relevant phrases, and avoid potential false alarms like the surnames Black or White. Lawyers can confirm the model’s findings.
How it works: The authors used an OCR system to extract text from 5.2 million pages of Santa Clara property deeds filed between 1850 and 1980. They drew examples from that corpus to form training and validation datasets and then processed the rest to find deeds that contained racial clauses.
Results: The authors fed the remaining roughly 5.2 million unlabeled pages to the fine-tuned model. When the model identified a deed that contained a racial clause, county staff confirmed the finding and redacted the clause.
Why it matters: Large language models can interpret historical documents to reveal the nature and scope of actions in the past that otherwise would remain obscure — in this case, housing discrimination. By flagging discriminatory language, this work enables historians to identify areas affected by racial clauses and trace their broader social and economic effects. The team open-sourced the model, streamlining the process for other United States counties.
We’re thinking: While AI is making history, it’s also illuminating it!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|