|
Dear friends,
Much has been said about many companies’ desire for more compute (as well as data) to train larger foundation models. I think it’s under-appreciated that we have nowhere near enough compute available for inference on foundation models as well.
Fortunately, this prediction has been right so far. However, beyond training, I believe we are also far from exhausting the benefits of faster and higher volumes of inference.
Today, a lot of LLM output is primarily for human consumption. A human might read around 250 words per minute, which is around 6 tokens per second (250 words/min / (0.75 words/token) / (60 secs/min)). So it might initially seem like there’s little value to generating tokens much faster than this.
That’s why I’m excited about the work of companies like Groq, which can generate hundreds of tokens per second. Recently, SambaNova published an impressive demo that hit hundreds of tokens per second.
Keep learning!
News
Songs Made to OrderA new breed of audio generator produces synthetic performances of songs in a variety of popular styles. What’s new: Udio launched a web-based, text-to-song generator that creates songs in styles from barbershop to heavy metal. Suno, which debuted its service late last year with similar capabilities, upgraded to its offering. How it works: Both services take text prompts and generate full-band productions complete with lyrics, vocals, and instrumental solos, two separate generations per prompt. Users can generate lyrics to order or upload their own words, and they can download, share, and/or post the results for others to hear. Leaderboards rank outputs according to plays and likes.
Behind the news: Most earlier text-to-music generators were designed to produce relatively free-form instrumental compositions rather than songs with structured verses, choruses, and vocals. Released earlier this month, Stable Audio 2 generates instrumental tracks up to three minutes long that have distinct beginnings, middles, and endings. Users can also upload audio tracks and use Stable Audio 2.0 to modify them. Yes, but: Like text-to-image generators circa last year, current text-to-music models offer little ability to steer their output. They don’t respond consistently to basic musical terminology such as “tempo” and “harmony,” and requesting a generic style like “pop” can summon a variety of subgenres from the last 50 years of popular music. Why it matters: With the advent of text-to-music models that produce credible songs, audio generation seems primed for a Midjourney moment, when the public realizes that it can produce customized music at the drop of a prompt. Already Udio’s and Suno’s websites are full of whimsical paeans to users’ pets and hobbies. The technology has clear implications for professional performers and producers, who, regrettably, have little choice but to adapt to increasing automation. But for now fans have fun, new toys to play with. We’re thinking: You can dance to these algo-rhythms!
Benchmarks for IndustryHow well do large language models respond to professional-level queries in various industry domains? A new company aims to find out. What’s new: Vals.AI, an independent model testing service, developed benchmarks that rank large language models’ performance of tasks associated with income taxes, corporate finance, and contract law; it also maintains a pre-existing legal benchmark. Open AI’s GPT-4 and Anthropic’s Claude 3 Opus did especially well in recent tests. How it works: Vals AI hosts leaderboards that compare the performance of several popular large language models (LLMs) with respect to accuracy, cost, and speed, along with with analysis of the results. The company worked with independent experts to develop multiple-choice and open-ended questions in industrial domains. The datasets are not publicly available.
Results: Among 15 models, GPT-4 and Claude 3 Opus dominated Vals.AI’s leaderboards as of April 11, 2024. GPT-4 topped CorpFin and TaxEval, correctly answering 64.8 and 54.5 percent of questions, respectively. Claud 3 Opus narrowly beat GPT-4 on ContractLaw and LegalBench, achieving 74.0 and 77.7 percent, respectively. The smaller Claude 3 Sonnet took third place in ContractLaw, CorpFin, and TaxEval with 67.6, 61.4, and 37.1 percent. Google’s Gemini Pro 1.0 took third place in LegalBench with 73.6 percent. Behind the news: Many practitioners in finance and law use LLMs in applications that range from processing documents to predicting interest rates. However, LLM output in such applications requires oversight. In 2023, a New York state judge reprimanded a lawyer for submitting an AI-generated brief that referred to fictitious cases. Why it matters: Typical AI benchmarks are designed to evaluate general knowledge and cognitive abilities. Many developers would like to measure more directly performance in real-world business contexts, where specialized knowledge may come into play. We’re thinking: Open benchmarks can benefit from public scrutiny, and they’re available to all developers. However, they can be abused when developers cherry-pick benchmarks on which their models perform especially well. Moreover, they may find their way into training sets, making for unfair comparisons. Independent testing on proprietary benchmarks is one way to address these issues.
NEW FROM DEEPLEARNING.AIJoin “Getting Started with Mistral” and access Mistral AI’s open source and commercial models via API calls. Learn to select the right model for your use case and get hands-on with features like JSON mode, function calling, and effective prompting techniques. Enroll for free!
AI Progress Report: ManufacturingManufacturers are embracing AI even as they struggle to find the talent and data required. What’s new: The market-research arm of MIT Technology Review surveyed manufacturers’ use of AI in engineering, design, procurement, and production. All respondents were at least experimenting with AI, and many expect to launch their first deployments in the next year or two. Microsoft sponsored the research. How it works: The authors interviewed executives at 300 manufacturers in aerospace, automotive, chemicals, electronics, and heavy equipment. All were either applying or considering AI in product design or factory operations.
Behind the news: Manufacturers are using AI to help design products, visually inspect goods, and maintain equipment. The field has attracted major players: Last year, Microsoft and Siemens launched a pilot of Industrial Copilot, which enables users to interact in natural language with software that drives assembly lines. Why it matters: Manufacturers want to use AI, but many face obstacles of talent and data. That spells opportunities for budding practitioners as well as for manufacturers that lack infrastructure for collecting and managing data. We’re thinking: One key to successful implementation of AI in manufacturing is tailoring systems to the unique circumstances of each individual facility. The highly heterogeneous tasks, equipment, and surroundings in different factories mean that one model doesn’t fit all. Developers who can solve this long-tail problem stand to reap rewards.
A 3D Model From One 2D ImageVideo diffusion provides a new basis for generating 3D models.
Results: The authors produced 3D models from images of 50 objects in GSO, a 3D object dataset of scanned household items. They compared their 3D models to those produced by other methods including EscherNet, a method that uses an image diffusion model to generate images of an object from different angles that are used to train a pair of vanilla neural networks to produce a 3D model. Evaluated according to Chamfer distance, a measure of the distance between the points on the ground truth and generated 3D models (lower is better), their method achieved .024, while EscherNet achieved .042. Why it matters: Video diffusion models must generate different views of the same object, so they require a greater understanding of 3D objects than image diffusion models, which need to generate only one view at a time. Upgrading from an image diffusion model to a video diffusion model makes for better 3D object generation.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|