Dear friends,
Earlier this month, my team AI Fund held its annual co-founder and CEO summit, where many of our collaborators gathered in California for two days to discuss how to build AI companies. Three themes emerged from many presentations: persistence, fast iteration and community.
One technique that’s now very widespread for prototyping is retrieval augmented generation (RAG). I’ve been surprised at how many nontechnical business leaders seem to know what RAG is. Investors are sometimes leery of people who build a thin layer around LLMs. As Laurence Moroney (lead AI Advocate at Google, AI Fund Fellow) says, “I’m a huge fan of RAG . . .. I think this is one way to go beyond a thin veneer around [commercial] models and build a somewhat thicker veneer.”
Community. Despite the wide range of startups represented in sectors including deep AI tech, healthcare, finance, edtech, and so on, a recurring theme was that company builders end up stronger when they come together. Emil Stefanutti (co-founder of ContractRoom, Venture Advisor at AI Fund) said he was glad that many of the scars he has acquired by building businesses are turning out to be treasures for others, as he's able to share experiences that other entrepreneurs can benefit from. Tim Westergren said, “You can’t white-knuckle it. You also can’t do it alone.”
“It is not the critic who counts; not the [person] who points out how the strong [person] stumbles, or where the doer of deeds could have done them better. The credit belongs to the [person] who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, … who knows great enthusiasms, the great devotions; who spends himself in a worthy cause.”
Keep learning! Andrew
News
Generated Video Gets Real(er)OpenAI’s new video generator raises the bar for detail and realism in generated videos — but the company released few details about how it built the system.
What we don’t know: OpenAI is sharing the technology with outside researchers charged with evaluating its safety, The New York Times reported. Meanwhile, the company published neither quantitative results nor comparisons to previous work. Also missing are detailed descriptions of model architectures and training methods. (Some of the results suggest that Sora was trained not only to remove noise from tokens, but also to predict future tokens and generate tokens in between other tokens.) No information is available about the source(s) of the dataset or how it may have been curated. Why it matters: While we’ve seen transformers for video generation, diffusion models for video generation, and diffusion transformers for images, this is an early implementation of diffusion transformers for video generation (along with a recent paper). Sora shows that diffusion transformers work well for video. We’re thinking: Did Sora learn a world model? Learning to predict the future state of an environment, perhaps given certain actions within that environment, is not the same as learning depict that environment in pixels —just like the ability to predict that a joke will make someone smile is different than the ability to draw a picture of that smile. Given Sora's ability to extrapolate scenes into the future, it does seem to have some understanding of the world. Its world model is also clearly flawed — for instance, it will synthesize inconsistent three-dimensional structures — but it’s a promising step toward AI systems that comprehend the 3D world through video.
Competition Heats Up in AI ChipsHuawei is emerging as an important supplier of AI chips. What’s new: Amid a U.S. ban on exports of advanced chips to China, demand for Huawei’s AI chips is so intense that the company is limiting production of the chip that powers one of its most popular smartphones so it can serve the AI market, Reuters reported. Demand and supply: China’s biggest chip fabricator, Semiconductor Manufacturing International Corp. (SMIC), fabricates both the Ascend 910B, which is optimized to process neural networks, and the Kirin chip that drives Huawei’s popular Mate 60 phone. Production capacity is limited, so making more Ascend 910Bs means making fewer Kirins.
Behind the news: Nvidia accounted for 90 percent of the market for AI chips in China prior to the advent of U.S. export restrictions. China has responded to the limits by building its ability to manufacture advanced chips domestically — a tall order, since it requires technology that is very difficult to develop. In August, Baidu ordered 1,600 Ascend 910B chips for delivery by the end of the year, according to an earlier Reuters report. The order, which is tiny compared to typical data center purchases, nonetheless demonstrated that SMIC could manufacture the chips and that Baidu was experimenting with alternatives to Nvidia in anticipation of even tighter U.S. restrictions on AI chips that took effect in October. Currently, SMIC is gearing up to produce Huawei’s next-generation Ascend chips. Why it matters: For years, Nvidia’s GPUs have been the only practical choice for processing deep learning models. The company’s lead over competitors both in hardware implementation and software support are likely to protect its dominant position for some time to come. However, competitors like AMD and Huawei are beginning to nip at Nvidia’s heels. That means more hardware options for developers, and the competition may drive lower prices and still higher performance. We’re thinking: AI chips are at the heart of the current technological competition between the U.S. and China. While Huawei and SMIC still have a lot to prove in terms of scaling up production, their rate of progress is impressive and illustrates the limits of the current U.S. restrictions.
A MESSAGE FROM DEEPLEARNING.AIIn our next live workshop, we’ll share how to build high-quality and production-ready applications using tools like Pinecone Canopy and TruLens. Notebooks will be available for participants to explore! Register now
Gymnastics Judge’s HelperJudges in competitive gymnastics are using an AI system to double-check their decisions. What’s new: Olympic-level gymnastic contests have adopted Judging Support System (JSS), an AI-based video evaluation system built by Fujitsu, MIT Technology Review reported. In September and October, for the first time, judges at the 2023 World Artistic Gymnastics Championships in Antwerp used JSS in competitions that involved the full range of gymnastics equipment including mat, balance beam, parallel bars, pommel horse, and so on. How it works: Judges penalize gymnasts for imperfections in any pose or move. JSS identifies deviations that correspond to particular penalties. The system can evaluate roughly 2,000 poses and moves with 90 percent accuracy compared to human judges. It can assess both isolated actions and entire routines.
Behind the news: Sporting authorities have embraced AI both inside and outside the arena.
Why it matters: Gymnastic competitors are scored on subjective criteria such as expression, confidence, and personal style as well as technical competence, raising questions of unconscious bias and whether some judges might favor certain competitors over others. An AI system that tracks technical minutiae may help judges to avoid bias while focusing on the sport’s subjective aspects. We’re thinking: Tracking gymnasts in motion sets a high bar for AI!
Memory-Efficient OptimizerResearchers devised a way to reduce memory requirements when fine-tuning large language models. What's new: Kai Lv and colleagues at Fudan University proposed low memory optimization (LOMO), a modification of stochastic gradient descent that stores less data than other optimizers during fine-tuning. Key insight: Optimizers require a lot of memory to store an entire network’s worth of parameters, gradients, activations, and optimizer states. While Adam has overtaken stochastic gradient descent (SGD) for training, SGD remains a popular choice for fine-tuning partly because it requires less memory (since it stores fewer optimizer states). Nonetheless, SGD must store an entire network’s gradients — which, with state-of-the-art models, can amount to tens or hundreds of gigabytes — before it updates the network all at once. Updating the network layer by layer requires storing only one layer’s gradients — a more memory-efficient twist on typical SGD. How it works: The authors fine-tuned LLaMA on six datasets in SuperGLUE, a benchmark for language understanding and reasoning that includes tasks such as answering multiple-choice questions.
Results: LOMO required less memory than popular optimizers and achieved better performance than the popular memory-efficient fine-tuning technique LoRA.
Why it matters: Methods like LoRA save memory by fine-tuning a small number of parameters relative to a network’s total parameter count. However, because it adjusts only a small number of parameters, the performance gain from fine-tuning is less than it could be. LOMO fine-tunes all parameters, maximizing performance gain while reducing memory requirements. We're thinking: SGD’s hunger for memory is surprising. Many developers will find it helpful to have a memory-efficient alternative.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|