Dear friends,
When entrepreneurs build a startup, it is often their speed and momentum that gives them a shot at competing with the tech behemoths. This is true of countries as well.
I was recently in Thailand, where I was delighted to see tremendous momentum building in AI (and sip the best Thai ice tea I’ve ever tasted). Even though Thailand is not as advanced in AI technology or applications as leading tech countries, the enthusiasm for building AI throughout government, corporations, and academia was thrilling. I came away heartened that AI’s benefits will be spread among many countries and convinced that one's level of AI development right now matters less than your momentum toward increasing it.
I also met with many business leaders and enjoyed seeing a rapid pace of experimentation with AI. KBTG, an affiliate of the country’s leading digital bank KBank, is working on a financial chatbot advisor, AI-based identity verification for anti-fraud, AI for auto insurance, and a Thai-language financial large language model. These features are growing mobile banking and increasing financial access. Many business leaders in other sectors, too, have asked their teams to run experiments. There are many AI applications yet to be built in industrial sectors, tourism, trade, and more! (KBTG is an investor in AI Fund, which I lead.)
I often visit universities in both developed and developing economies, and I’ve been surprised to see that universities in developing economies sometimes adopt AI faster. At Chulalongkorn University (known as Chula), I met with the University President Bundhit Eua-arporn and Director of Chula AI Professor Proadpran Punyabukkana. Chula AI has rolled out campus-wide training in generative AI for faculty, staff, and students. In addition, it supports building AI applications such as AI screening for depression and gastrointestinal cancer.
Keep building, Andrew
A MESSAGE FROM DEEPLEARNING.AIOur short course “Improving Accuracy of LLM Applications” teaches a step-by-step approach to improving the accuracy of applications built on large language models. You’ll build an evaluation framework, incorporate self-reflection, and fine-tune models using LoRA and memory tuning to embed facts and reduce hallucinations. Enroll for free
NewsHigher Performance, Lower PricesPrices for access to large language models are falling as providers exploit new efficiencies and compete for new customers. What’s new: Open AI cut the price of calls to GPT-4o’s API by 50 percent for input tokens and 33 percent for output tokens, with an even steeper discount for asynchronous processing. Not to be outdone, Google cut the price of API calls to Gemini 1.5 Flash by approximately 75 percent. How it works: The latest price reductions follow a steady trend, tracked by Smol.ai CEO Shawn Wang, in which providers are charging less even as model performance (as measured by LMSys’s Chatbot Arena Leaderboard Elo ratings) rises. Here’s a list of recent prices in order of each model’s rank on the leaderboard as of this writing:
Behind the news: Less than six months ago, cutting-edge large language models like GPT-4, Claude 2, Gemini 1.0, Llama 2, and Mistral Large were less capable and more expensive than their current versions. For instance, GPT-4 costs $30/$60 per million tokens input/output. Since then, models have notched higher benchmark performances even prices have fallen. The latest models are also faster, have larger context windows, support a wider range of input types, and do better at complex tasks such as agentic workflows. Why it matters: Competition is fierce to provide the most effective and efficient large language models, offering an extraordinary range of price and performance to developers. Makers of foundation models that can’t match the best large models in performance or the best small models in cost are in a tight corner. We’re thinking: What an amazing time to be developing AI applications! You can choose among models that are open or closed, small or large, faster or more powerful in virtually any combination. Everyone is competing for your business!
Out of the Black ForestA new company with deep roots in generative AI made an eye-catching debut. What’s new: Black Forest Labs, home to alumni of Stability AI, released the Flux.1 family of text-to-image models under a variety of licenses including open options. The largest of them outperformed Stable Diffusion 3 Ultra, Midourney v6.0, and DALL·E 3 HD in the company’s internal qualitative tests. How it works: The Flux.1 models are based on diffusion transformers that were trained using flow matching, a form of diffusion. Like other latent diffusion models, given text and a noisy image embedding, they learn to remove the noise. At inference, given text and an embedding of pure noise, they remove the noise in successive steps and render an image using a decoder that was trained for the purpose.
Results: Black Forest Labs evaluated the models internally in qualitative tests. Given images produced by one of the Flux.1 family and a competitor, roughly 800 people judged which they preferred for various qualities. The two larger versions achieved high scores.
Behind the news: The Black Forest Labs staff includes former core members of Stability AI, which lost many top employees in April. Black Forest CEO Robin Rombach co-authored the papers that introduced VQGAN, latent diffusion, adversarial diffusion distillation, Stable Diffusion XL, and Stable Video Diffusion. Why it matters: Text-to-image models generally occupy three tiers: large commercial models like Midjourney v6, Google DALL·E 3, and Adobe Firefly; offerings that are open-source to varying degrees like Stability AI’s Stable Diffusion 3 Medium; and smaller models that can run locally like Stable Diffusion’s Stable Diffusion XL Lightning. The Flux.1 suite checks all the boxes with high marks in head-to-head comparisons. We’re thinking: In late 2022, Stability AI’s release of the open Stable Diffusion unleashed a wave of innovation. We see a similar wave building on the open versions of Flux.1.
AI Leadership Makes for a Difficult Balance SheetOpenAI may be spending roughly twice as much money as it’s bringing in, a sign of the financial pressures of blazing the trail in commercial applications of AI. What’s new: OpenAI’s operating expenses could amount to $8.5 billion in 2024, according to an estimate by The Information based on anonymous sources. Meanwhile, its annual revenue is shaping up to be around $3.5 billion to $4.5 billion, putting it on course to lose between $4 billion and $5 billion this year. Revenue versus expenses: The report combined previous reporting with new information from people “with direct knowledge” of OpenAI’s finances and its relationship with Microsoft, which provides computing power for GPT-4o, ChatGPT, and other OpenAI products.
Why it matters: ChatGPT famously grew at an extraordinary pace in 2023 when the number of visits ballooned to 100 million within two months of the service’s launch. OpenAI’s internal sales team turned that enthusiasm into fast-growing revenue, reportedly outpacing even Microsoft’s sales of OpenAI services. Yet that growth rests on top-performance AI models, which are expensive to develop, train, and run. We’re thinking: OpenAI is a costly undertaking: OpenAI CEO Sam Altman said it would be “the most capital-intensive startup in Silicon Valley history.” But generative AI is evolving quickly. With OpenAI’s revenue rising, its models becoming more cost-effective (witness GPT-4o mini), and the cost of inference falling, we wouldn’t bet against it.
Machine Translation Goes AgenticLiterary works are challenging to translate. Their relative length, cultural nuances, idiomatic expressions, and expression of an author’s individual style call for skills beyond swapping words in one language for semantically equivalent words in another. Researchers built a machine translation system to address these issues. What’s new: Minghao Wu and colleagues at Monash University, University of Macau, and Tencent AI Lab proposed TransAgents, which uses a multi-agent workflow to translate novels from Chinese to English. You can try a demo here. Key insight: Prompting a large language model (LLM) to translate literature often results in subpar quality. Employing multiple LLMs to mimic human roles involved in translation breaks down this complex problem into more tractable parts. For example, separate LLMs (or instances of a single LLM) can act as agents that take on roles such as translator and localization specialist, and they can check and revise each other’s work. An agentic workflow raises unsolved problems such as how to evaluate individual agents’ performance and how to measure translation quality. This work offers a preliminary exploration. How it works: TransAgents prompted pretrained LLMs to act like a translation company working on a dataset of novels. The set included 20 Chinese novels, each containing 20 chapters, accompanied by human translations into English.
Results: Professional translators compared TransAgents’ output with that of human translators and GPT-4 Turbo in a blind test. One said TransAgents “shows the greatest depth and sophistication,” while another praised its “sophisticated wording and personal flair” that “effectively conveys the original text’s mood and meaning.”
Why it matters: While machine translation of ordinary text and conversations has made great strides in the era of LLMs, literary translation remains a frontier. An agentic workflow that breaks down the task into subtasks and delegates them to separate LLM instances makes the task more manageable and appears to produce results that appeal to human judges (and an LLM as well). That said, this is preliminary work that suggests a need for new ways to measure the quality of literary translations. We’re thinking: Agentic workflows raise pressing research questions: What is the best way to divide a task for different agents to tackle? How much does the specific prompt at each stage affect the final output? Good answers to questions like this will lead to powerful applications.
A MESSAGE FROM DEEPLEARNING.AIAre you an experienced developer? Share your coding story and inspire new learners! We’re celebrating the launch of “AI Python for Beginners,” taught by Andrew Ng, and we’d like to feature your story to inspire coders who are just starting out. Submit your story here!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|