One of the most effective things the U.S. or any other nation can do to ensure its competitiveness in AI is to welcome high-skilled immigration and international students who have the potential to become high-skilled.
View in browser
The Batch top banner - June 18, 2025
Subscribe   Submit a tip

 

 

Dear friends,

 

One of the most effective things the U.S. or any other nation can do to ensure its competitiveness in AI is to welcome high-skilled immigration and international students who have the potential to become high-skilled. For centuries, the U.S. has welcomed immigrants, and this helped make it a worldwide leader in technology. Letting immigrants and native-born Americans collaborate makes everyone better off. Reversing this stance would have a huge negative impact on U.S. technology development. 


I was born in the UK and came to the U.S. on an F-1 student visa as a relatively unskilled and clueless teenager to attend college. Fortunately I gained skills and became less clueless over time. After completing my graduate studies, I started working at Stanford under the OPT (Optional Practical Training) program, and later an H-1B visa, and ended up staying here. Many other immigrants have followed similar paths to contribute to the U.S. 


I am very concerned that making visas harder to obtain for students and high-skilled workers, such as the pause in new visa interviews that started last month and a newly chaotic process of visa cancellations, will hurt our ability to attract great students and workers. In addition, many international students without substantial means count on being able to work under OPT to pay off the high cost of a U.S. college degree. Gutting the OPT program, as has been proposed, would both hurt many international students’ ability to study here and deprive U.S. businesses of great talent. (This won’t stop students from wealthy families. But the U.S. should try to attract the best talent without regard to wealth.)

 
Failure to attract promising students and high-skilled workers would have a huge negative impact on American competitiveness in AI. Indeed, a recent report by the National Security Commission on Artificial Intelligence exhorts the government to “strengthen AI talent through immigration.” 


If talented people do not come to the U.S., will they have an equal impact on global AI development just working somewhere else? Unfortunately, the net impact will be negative. The U.S. has a number of tech hubs including Austin, Boston/Cambridge, Los Angeles, New York, Pittsburgh, Seattle, and Silicon Valley, and these hubs concentrate talent and foster innovation. (This is why cities, where people can more easily find each other and collaborate, promote innovation.) Making it harder for AI talent to find each other and collaborate will slow down innovation, and it will take time for new hubs to become as advanced.

H1B visa approval notice from USCIS for Stanford University, dated July 11, 2005, valid from September 2005 to August 2006.

Nonetheless, other nations are working hard to attract immigrants who can drive innovation — a good move for them! Many have thoughtful programs to attract AI and other talent. There are the UK’s Global Talent Visa, France’s French Tech Visa, Australia’s Global Talent Visa, UAE’s Golden Visa, Taiwan’s Employment Gold Card, China’s Thousand Talents Plan, and many more. The U.S. is fortunate that many people already want to come here to study and work. Squandering that advantage would be a huge unforced error.


Beyond the matter of national competitiveness, there is the even more important ethical matter of making sure people are treated decently. I have spoken with international students who are terrified that their visas may be canceled arbitrarily. One recently agonized about whether to attend an international conference to present a research paper, because they were worried about being unable to return. In the end, with great sadness, they cancelled their trip. I also spoke with a highly skilled technologist who is in the U.S. on an H-1B visa. Their company shut down, leading them — after over a decade in this country, and with few ties to their nation of origin — scrambling to find alternative employment that would enable them to stay.

 

These stories, and many far worse, are heartbreaking. While I do what I can to help individuals I know personally, it is tragic that we are creating such an uncertain environment for immigrants, that many people who have extraordinary skills and talents will no longer want to come here. 


To every immigrant or migrant in the U.S. who is concerned about the current national environment: I see you and empathize with your worries. As an immigrant myself, I will be fighting to protect everyone’s dignity and right to due process, and to encourage legal immigration, which makes both the U.S. and individuals much better off.

 

Keep building!

Andrew 

 

 

A MESSAGE FROM DEEPLEARNING.AI

Promo banner for: "Building with Llama 4"

Learn to build with Meta’s Llama 4 models in this new short course! Work with the official Llama API, explore multimodal reasoning with image grounding, reason over long-context, and use tools for prompt optimization and synthetic data generation. Join in for free

 

News

Apple AI models outperform rivals in instruction accuracy and human text evaluations across devices and servers.

Apple Sharpens Its GenAI Profile

 

Apple revamped two vision-language models in a bid to catch up with fast-moving competitors.

 

What’s new: Apple updated the Apple Foundation Models (AFM) family, including smaller on-device and larger server-hosted versions, to improve their capabilities, speed, and efficiency. It also released the Foundation Models framework, an API that enables developers to call the on-device model on Apple devices that have Apple Intelligence enabled.

  • Input/output: Text, images in (up to 65,000 tokens), text out
  • Architecture: AFM-on-device: 3 billion-parameter transformer, 300-million parameter vision transformer. AFM-server: custom mixture-of-experts transformer (parameter count undisclosed), 1 billion-parameter vision transformer.
  • Performance: Strong in non-U.S. English, image understanding
  • Availability: AFM-on-device for developers to use via Foundations Models framework, AFM-server not available for public use
  • Features: Tool use, 15 languages, vision
  • Undisclosed: Output token limit, AFM-server parameter count, details of training datasets, vision adapter architecture, evaluation protocol

How it works: Introduced last year, AFM models use a vision encoder to produce an image embedding, which a vision adapter modifies for the LLM. The LLM takes the modified image embedding and text prompt and generates a response. The team trained the systems to predict the next token, align embeddings produced by the vision encoder and LLM, and align responses with human feedback. They trained the models on text and image-text data from publicly available datasets, data scraped from the web, and data licensed from publishers.

  • Quantization: The team used quantization aware training (simulating quantization during training to improve performance of the quantized model at inference) to compress AFM-on-device to 2 bits per weight (except for the embedding layer, which was compressed to 4 bits per weight). They used Adaptive Scalable Texture Compression, a method initially designed for graphics pipelines, to compress the AFM-server model to an average of 3.56 bits per weight (except for the embedding layer, which is compressed to 4 bits per weight).
  • LoRA adapters: They trained LoRA adapters to recover performance loss due to compression, which adapted the model to specific tasks including summarization, proofreading, replying to  email, and answering questions.
  • MoE architecture: While AFM-on-device uses a transformer architecture, AFM-server uses a custom mixture-of-experts (MoE) architecture. A typical MoE can be viewed as splitting a portion of its fully connected layers into a number of parallel fully connected layers, of which it uses only a portion at inference. In comparison, the AFM-server’s MoE first splits the model into groups of layers, then it splits each group into parallel blocks. Each block is a separate multi-layer transformer outfitted with MoE layers (processed on a small number of hardware devices). While a typical MoE combines results across all devices at every mixture-of-experts layer, Apple’s architecture combines them only at the end of each block, which saves communication overhead during processing.

Performance: In human evaluations, the AFM models achieved mixed performance compared to selected models of similar or greater size. The tests included language tasks in U.S. English, non-U.S. English (including Canada and UK), and a basket of European and Asian languages. 

  • AFM-on-device: The on-device model performed better than the competitors at language tasks in non-U.S. English and image understanding. For instance, answering questions about images, AFM-on-device bested Qwen2.5-VL-3B more than 50 percent of the time and was judged worse 27 percent of the time.
  • AFM-server: The server model’s performance was not decisively better than that of the competitors. For instance, AFM-server outperformed Qwen3-23B 25.8 percent of the time but was judged worse 23.7 percent of the time. It underperformed GPT-4o in all tests reported.

Behind the news: Apple dominated social media last week with a controversial paper that purported to show that 5 state-of-the-art reasoning models couldn’t solve puzzles beyond a certain level of complexity.

  • The researchers prompted the models with four puzzles that allowed them to control complexity, including swapping the positions of red and blue checkers on a one-dimensional checkers board, Tower of Hanoi, River Crossing, and Blocks World. For all the puzzles and models, they found, the models’ performance fell to zero when the puzzles reached a certain degree of complexity (for example, a certain number of checkers to swap).
  • A rebuttal paper quickly appeared, penned by Open Philanthropy senior program associate Alex Lawsen with help from Claude 4 Opus. Lawsen contended that Apple’s conclusions were unfounded because its tests included unsolvable puzzles, didn’t account for token output limits, and posed unrealistic criteria for judging outputs. However, he later posted a blog, “When Your Joke Paper Goes Viral,” in which he explained that he intended his paper as “obvious satire” of authors who use LLMs to write scientific papers, and that he hadn’t checked Claude 4 Opus’ output. He updated his paper to correct errors in the original version but maintained his fundamental critique.

Why it matters: Apple has been viewed as falling behind in AI. A promised upgrade of Siri, Apple’s AI assistant, is delayed indefinitely, and the lack of advanced AI features in new iPhones has led to a class-action lawsuit. Meanwhile, Google and its Android smartphone platform are racing ahead. The new models, especially the Foundation Models framework, look like a bid for a reset.

 

We’re thinking: Apple may be behind in AI, but its control over iOS is a huge advantage. If the operating system ships with a certain model and loads it into the limited memory by default, developers have a far greater incentive to use that model than an alternative. Limited memory on phones and the large size of good models make it impractical for many app developers to bundle models with their software, so if a model is favored by Apple (or Android), it’s likely to gain significant adoption for on-device uses.

 

Midjourney AI outputs mimic Disney characters, raising copyright concerns in lawsuit by Disney and Universal.

Hollywood Joins AI Copyright Fight

 

Hollywood studios joined the record companies, publishers, and artists in the fight against companies that have trained AI models on their copyrighted works.

 

What’s new: Disney and Universal sued Midjourney, accusing the image-generation startup of training its models on “countless” unauthorized copies of their copyrighted works and distributing images that depict characters the plaintiffs created.

 

How it works: Disney and Universal asked the court to order Midjourney to cease its alleged unauthorized distribution of their intellectual property. Further, they want Midjourney, which took in revenue of $300 million in 2024, to pay unspecified damages based on the claim that copyright law entitles them to $150,000 per infringed image. The studios accuse Midjourney of both direct infringement (that is, directly violating their copyrights by copying, displaying, or distributing their work without permission) and secondary infringement (enabling or encouraging direct infringement by others).

  • The lawsuit alleges that Midjourney reproduces copyrighted and derivative works, including the images of movie and television characters from Star Wars, Toy Story, Cars, Ironman, and The Simpsons.
  • Midjourney generates such images even if users don’t ask for them explicitly. For instance, an image allegedly generated in response to the prompt, “Superhero fight scene,” includes Disney’s Spider-Man character.
  • Midjourney is aware of the infringement, the studios claim, pointing out that Midjourney’s website includes infringing images in sections curated by the company.
  • Midjourney could use software that would prevent its system from generating and distributing copyrighted material, the lawsuit says, citing other software products that identify copyrighted works automatically.
  • The filing alleges that Disney and Universal sent cease-and-desist letters to Midjourney, but the AI company didn’t stop producing and distributing images that infringe their copyrights.

Behind the news: Copyright law is ambiguous on whether training AI systems on copyrighted works requires permission from the copyright holders, and several cases are winding their way through U.S. courts to answer this question. Starting in 2023, artists, authors, and publishers initiated legal actions against Alphabet, Meta, and OpenAI. Last year, some of the largest companies in the recording industry sued the AI music startups Suno and Udio. In February, a Delaware federal court issued the first major decision in this area, when a U.S. Circuit judge ruled that an AI-powered legal research service could not claim that training its models on writings produced by Thomson Reuters was a fair use because the resulting products competed with Thomson Reuters’ own products. 

 

Why it matters: AI systems require enormous amounts of data. Historically, developers have felt free to use whatever copyrighted works they could find, typically online. As AI systems show greater potential to erode the market for human-made creative works — and to reproduce such works and create new works derived from them — owners of copyrighted material are looking for compensation as well as protection against this new form of competition. A single lawsuit won’t settle the issue, but this case, brought by two of the most powerful entertainment companies in the world, could set a precedent that strongly influences future lawsuits, the behavior of AI companies, and future legislation to update copyright for the AI era.

 

We’re thinking: Film studios and music labels once considered YouTube a copyright violator. Viacom, the entertainment company behind MTV and The Jersey Shore, once sued YouTube for copyright infringement. YouTube prevailed in two proceedings before the parties settled out of court, and YouTube subsequently improved its ability to detect and remove copyrighted works. Today, movie and recording companies rely on the enormously popular web video service to promote their wares. Given that history, Hollywood might consider partnering with AI companies instead of suing them. The pie would be bigger if Hollywood and AI companies worked together, although how to divide it would need to be worked out.

 

Machine learning engineers analyzing real-time hurricane data across multiple screens in a high-tech control room.

Learn More About AI With Data Points!

 

AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered how Google DeepMind is using neural networks to predict hurricanes up to 15 days in advance. Subscribe today!

 

OpenAI o3-pro outperforms o3 and o1-pro on math, science, and coding benchmarks, but responds much more slowly.

More Reasoning for Harder Problems

 

OpenAI launched o3-pro, a more capable version of its most advanced reasoning vision-language model.

 

What’s new: o3-pro is designed to respond to difficult challenges involving science, mathematics, and coding. But its reasoning firepower dramatically slows response times.

  • Input/output: Text and images in (up to 200,000 tokens), text out (up to 100,000 tokens, 20.7 tokens per second, 129.2 seconds to first token)
  • Knowledge cutoff: June 1, 2024
  • Features: Function calling including web search, structured output
  • Availability/price: Available to ChatGPT Pro and Team users via OpenAI API, soon to Enterprise and Edu users, for $20/$80 per 1 million tokens of input/output
  • Undisclosed: Details about architecture, training data, and training methods

Performance: o3-pro outperformed OpenAI’s own o3 (set to medium effort) and o1-pro in tests performed by OpenAI.

  • Solving AIME 2024’s advanced high-school math competition problems on the first try, o3-pro (93 percent) bested o3 (90 percent) and o1-pro (86 percent).
  • Answering GPQA Diamond’s graduate-level science questions on the first try, o3-pro (85 percent) outperformed o3 (81 percent) and o1-pro (79 percent).
  • Completing Codeforces competition-coding problems in one pass, o3-pro (2748 CodeElo) surpassed o3 (2517 CodeElo) and o1-pro (1707 CodeElo).
  • In qualitative tests, human reviewers consistently preferred o3-pro over o3 for queries related to scientific analysis (64.9 percent), personal writing (66.7 percent), computer programming (62.7 percent), and data analysis (64.3 percent).

What they’re saying: Reviews of o3-pro so far generally are positive, but the model has been criticized for the time it takes to respond. Box CEO Aaron Levie commented that o3-pro is “crazy good at math and logic.” However, entrepreneur Yuchen Jin noted that it’s the “slowest and most overthinking model.”

 

Behind the news: OpenAI rolled out o3-pro with a lower price, $20/$80 per 1 million input/output tokens, than o1-pro (which was priced at $150/$600 per 1 million input/output tokens but was deprecated in favor of the new model). Simultaneously it cut the price of o3 by 80 percent to $2/$8 per 1 million input/output tokens. These moves continue the plummeting price of inference over the past year. DeepSeek-R1 offers performance that approaches that of top models for $0.55/$2.19 per 1 million input/output tokens.

 

Why it matters: OpenAI is pushing the limits of current approaches to reasoning, and the results are promising if incremental. o3-pro’s extensive reasoning may appeal to developers who are working on the multi-step scientific problems. For many uses, though, the high price and slow speed may be a dealbreaker.

 

We’re thinking: Letting developers choose between o3 and o3-pro lets them calibrate their computational budget to the difficulty of the task at hand. What if we want to do the same with a trained, open-weights, large language model? Forcing an LLM to generate “Wait” in its output causes it to keep thinking, and can improve its output significantly. 

 

Diagram showing AI pipeline using OCR and LLMs to detect racist clauses in historic California property deeds.

LLM Rights Historical Wrongs

 

In Northern California, old property deeds may still include racial clauses: language, made illegal decades ago, that was designed to ban people of color from owning or living in certain homes. The state of California now requires counties to find and remove them, but manually combing through millions of documents would take years. Researchers used AI to find them automatically.

 

What’s new: Faiz Surani, Mirac Suzgun, and colleagues at Stanford University and Princeton University fine-tuned a large language model to find racial clauses in deeds for property in the California county of Santa Clara.

 

Key insight: Manual and keyword searches may fail to catch racial clauses if they’re obscured by subtle wording or errors in optical character recognition (OCR). But a fine-tuned large language model can understand context, identify relevant phrases, and avoid potential false alarms like the surnames Black or White. Lawyers can confirm the model’s findings.

 

How it works: The authors used an OCR system to extract text from 5.2 million pages of Santa Clara property deeds filed between 1850 and 1980. They drew examples from that corpus to form training and validation datasets and then processed the rest to find deeds that contained racial clauses.

  • To curate examples for training and validation, the authors started by sampling 20,000 pages at random. Since deeds have significant variation in format and quality, they added 10,000 deeds from other U.S. counties.
  • They filtered the combined examples using keywords that may indicate racial clauses, such as “Negro,” “Mongolian,” or “No person of,” yielding 3,801 pages.
  • They manually labeled the spans that included such language, which appeared on roughly 80 percent of those pages.
  • They fine-tuned Mistral-7B via LoRA on the labeled examples to learn to detect and reproduce discriminatory text.

Results: The authors fed the remaining roughly 5.2 million unlabeled pages to the fine-tuned model. When the model identified a deed that contained a racial clause, county staff confirmed the finding and redacted the clause.

  • The authors found 24,500 Santa Clara lots covered by racial clauses — about one in four homes in the county in 1950.
  • It also revealed that 10 developers, out of what the authors estimate were hundreds, were responsible for one-third of the racial clauses, demonstrating that only a small number of actors shaped decades of segregation.
  • The fine-tuned model reviewed all pages in 6 days, which would cost an estimated $258 based on current prices for cloud access to GPUs. In contrast, few-shot prompting GPT-3.5 Turbo would have been faster (3.6 days) but less accurate and over 50 times more expensive ($13,634). Working manually, a single county staff member would have needed nearly 10 years and $1.4 million.

Why it matters: Large language models can interpret historical documents to reveal the nature and scope of actions in the past that otherwise would remain obscure — in this case, housing discrimination. By flagging discriminatory language, this work enables historians to identify areas affected by racial clauses and trace their broader social and economic effects. The team open-sourced the model, streamlining the process for other United States counties.

 

We’re thinking: While AI is making history, it’s also illuminating it!

 

Work With Andrew Ng

 

Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.

 

Subscribe and view previous issues here.

 

Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.

 

DeepLearning.AI, 195 Page Mill Road, Suite 115, Palo Alto, CA 94306, United States

Unsubscribe Manage preferences