Dear friends,
AI Product Management is evolving rapidly. The growth of generative AI and AI-based developer tools has created numerous opportunities to build AI applications. This is making it possible to build new kinds of things, which in turn is driving shifts in best practices in product management — the discipline of defining what to build to serve users — because what is possible to build has shifted. In this letter, I’ll share some best practices I have noticed.
Use concrete examples to specify AI products. Starting with a concrete idea helps teams gain speed. If a product manager (PM) proposes to build “a chatbot to answer banking inquiries that relate to user accounts,” this is a vague specification that leaves much to the imagination. For instance, should the chatbot answer questions only about account balances or also about interest rates, processes for initiating a wire transfer, and so on? But if the PM writes out a number (say, between 10 and 50) of concrete examples of conversations they’d like a chatbot to execute, the scope of their proposal becomes much clearer. Just as a machine learning algorithm needs training examples to learn from, an AI product development team needs concrete examples of what we want an AI system to do. In other words, the data is your PRD (product requirements document)!
Using examples (such as inputs and desired outputs) to specify a product has been helpful for many years, but the explosion of possible AI applications is creating a need for more product managers to learn this practice.
Assess technical feasibility of LLM-based applications by prompting. When a PM scopes out a potential AI application, whether the application can actually be built — that is, its technical feasibility — is a key criterion in deciding what to do next. For many ideas for LLM-based applications, it’s increasingly possible for a PM, who might not be a software engineer, to try prompting — or write just small amounts of code — to get an initial sense of feasibility. For example, a PM may envision a new internal tool for routing emails from customers to the right department (such as customer service, sales, etc.). They can prompt an LLM to see if they can get it to select the right department based on an input email, and see if they can achieve high accuracy. If so, this gives engineering a great starting point from which to implement the tool. If not, the PM can falsify the idea themselves and perhaps improve the product idea much faster than if they had to rely on an engineer to build a prototype.
In addition to using LLMs to help write code for prototyping, tools like Replit, Vercel’s V0, Bolt, and Anthropic’s Artifacts (I’m a fan of all of these!) are making it easier for people without a coding background to build and experiment with simple prototypes. These tools are increasingly accessible to non-technical users, though I find that those who understand basic coding are able to use them much more effectively, so it’s still important to learn basic coding. (Interestingly, highly technical, experienced developers use them too!) Many members of my teams routinely use such tools to prototype, get user feedback, and iterate quickly.
AI is enabling a lot of new applications to be built, creating massive growth in demand for AI product managers who know how to scope out and help drive progress in building these products. AI product management existed before the rise of generative AI, but the increasing ease of building applications is creating greater demand for AI applications, and thus a lot of PMs are learning AI and these emerging best practices for building AI products. I find this discipline fascinating, and will keep on sharing best practices as they grow and evolve.
Keep learning! Andrew
A MESSAGE FROM DEEPLEARNING.AIWrite and code more effectively with OpenAI Canvas, a user-friendly workspace for collaborating with AI. In this free course, explore use cases like building game apps and designing SQL databases from screenshots, and gain insights into how GPT-4o powers Canvas’ features. Join for free
NewsCompetitive Performance, Competitive PricesAmazon introduced a range of models that confront competitors head-on. What’s new: The Nova line from Amazon includes three vision-language models (Nova Premier, Nova Pro, and Nova Lite), one language model (Nova Micro), an image generator (Nova Canvas), and a video generator (Nova Reel). All but Nova Premier are available on Amazon’s Bedrock platform, and Nova Premier, which is the most capable, is expected in early 2025. In addition, Amazon plans to release a speech-to-speech model in early 2025 and a multimodal model that processes text, images, video, and audio by mid-year. (Disclosure: Andrew Ng serves on Amazon’s board of directors.) How it works: Nova models deliver competitive performance at relatively low prices. Amazon hasn’t disclosed parameter counts or details about how the models were built except to say that Nova Pro, Lite, and Micro were trained on a combination of proprietary, licensed, public, and open-source text, images, and video in over 200 languages.
Behind the news: The company launched Bedrock in April 2023 with Stability AI’s Stable Diffusion for image generation, Anthropic’s Claude and AI21’s Jurassic-2 for text generation, and its own Titan models for text generation and embeddings. Not long afterward, it added language models from Cohere as well as services for agentic applications and medical applications. It plans to continue to provide models from other companies (including Anthropic), offering a range of choices. Why it matters: While other AI giants raced to outdo one another in models for text and multimodal processing, Amazon was relatively quiet. With Nova, it has staked out a strong position in those areas, as well as the startup-dominated domains of image and video generation. Moreover, it’s strengthening its cloud AI offerings with competitive performance, pricing, and speed. Nova’s pricing continues the rapid drop in AI prices over the last year. Falling per-token prices help make AI agents or applications that process large inputs more practical. For example, Simon Willison, developer of the Django Python framework for web applications, found that Nova Lite generated descriptions for his photo library (tens of thousands of images) for less than $10. We’re thinking: The Nova suite is available via APIs only, without a web-based user interface. This accords with Amazon Web Services’ focus on developers. For consumers, Amazon offers the Rufus shopping bot.
Higher ReasoningOpenAI launched not only its highly anticipated o1 model but also an operating mode that enables the model to deliver higher performance — at a hefty price. What’s new: Kicking off a 12-day holiday blitz, OpenAI launched o1 (previously available in preview and mini versions) and introduced o1 pro mode, which processes more tokens at inference to produce more accurate output. Both options accept text and image inputs to generate text outputs. They’re available exclusively through a new ChatGPT Pro subscription for $200 monthly. API access is not yet available. How it works: According to an updated system card, o1 models were trained on a mix of public, licensed, and proprietary text, code, and images, with a focus on technical, academic, and structured datasets. They respond to prompts by breaking them down into intermediate steps, each of which consumes a number of hidden “reasoning tokens.” The models don’t reveal these steps, but ChatGPT presents a natural-language summary of the reasoning process. The new o1 and o1 pro mode perform better than o1-preview and o1-mini, but their additional reasoning requires more processing, which translates into higher costs and slower responses.
Behind the news: Since September, when OpenAI introduced o1-preview and o1-mini, other model providers have implemented similar reasoning capabilities. DeepSeek’s R1 displays reasoning steps that o1 models keep hidden. Alibaba’s QwQ 32B excels at visual reasoning but is slower and has a smaller context window. Amazon’s Nova Premier, which is billed as a model for “complex reasoning tasks,” is expected in early 2025, but Amazon has not yet described its performance, architecture, or other details. Why it matters: o1 and o1 pro mode highlight a dramatic shift in model development and pricing. Giving models more processing power at inference enables them to provide more accurate output, and it’s a key part of agentic workflows. It also continues to boost performance even as scaling laws that predict better performance with more training data and compute may be reaching their limits. However, it also raises OpenAI’s costs, and at $200 a month, the price of access to o1 and o1 pro is steep. It’s a premium choice for developers who require exceptional accuracy or extensive reasoning. We’re thinking: Discovering scaling laws for using more processing at inference, or test-time compute, is an unsolved problem. Although OpenAI hasn’t disclosed the algorithm behind o1 pro mode, recent work at Google allocated tokens dynamically at inference based on a prompt’s difficulty. This approach boosted the compute efficiency by four times and enabled a model that had shown “nontrivial success rates” to outperform one that was 14 times larger.
Learn More About the Week in AI!AI news is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points, a sister publication of The Batch, arrives in your inbox twice a week with six brief news stories in each issue. Read the latest issues and subscribe today!
Game Worlds on TapA new model improves on recent progress in generating interactive virtual worlds from still images. What’s new: Jack Parker-Holder and colleagues from Google introduced Genie 2, which generates three-dimensional video game worlds that respond to keyboard inputs in real time. The model’s output remains consistent (that is, elements don’t morph or disappear) for up to a minute, and it includes first-person shooters, walking simulators, and driving games from viewpoints that include first person, third person, and isometric. Genie 2 follows up on Genie, which generates two-dimensional games. How it works: Genie 2 is a latent diffusion model that generates video frames made up of an encoder, transformer, and decoder. The developers didn’t reveal how they built it or how they improved on earlier efforts.
Behind the news: Genie 2 arrives on the heels of Oasis, which generates a Minecraft-like game in real time. Unlike Oasis, Genie 2 worlds are more consistent and not limited to one type of game. It also comes at the same time as another videogame generator, World Labs. However, where Genie 2 generates the next frame given previous frames and keyboard input (acting, in terms of game development, as both graphics and physics engines), World Labs generates a 3D mesh of a game world from a single 2D image. This leaves the implementation of physics, graphics rendering, the player’s character, and other game mechanics to external software. Why it matters: Genie 2 extends models that visualize 3D scenes based on 2D images to encompass interactive worlds, a capability that could prove valuable in design, gaming, virtual reality, and other 3D applications. It generates imagery that, the authors suggest, could serve as training data for agents to learn how to navigate and respond to commands in 3D environments. We’re thinking: Generating gameplay directly in the manner of Genie 2 is a quick approach to developing a game, but the current technology comes with caveats. Developers can’t yet control a game’s physics or mechanics and they must manage any flaws in the model (such as a tendency to generate inconsistent worlds). In contrast, generating a 3D mesh, as World Labs does, is a more cumbersome approach, but it gives developers more control.
Getting the Facts RightLarge language models that remember more hallucinate less. What’s new: Johnny Li and colleagues at Lamini introduced Mixture of Memory Experts (MoME), a method that enables large language models (LLMs) to memorize many facts with relatively modest computational requirements. (Disclosure: Andrew Ng invested in Lamini.) Key insight: The key to getting factual answers from LLMs is to keep training it until it chooses the correct answer every time. In technical terms, train past the point where tokens relevant to the answer have a similar probability distribution, and continue until a single token has 100 percent probability. But this amount of training takes a lot of computation and, since the model may overfit the training set, it also may degrade performance on the test set. Fine-tuning is one solution, and fine-tuning a LoRA adapter to memorize facts reduces the computational burden. But a single LoRA adapter isn’t enough to store all of the knowledge in a large dataset. Training multiple adapters that are selected by cross-attention enables the LLM to memorize a variety of facts. How it works: The authors extended a pretrained Llama-3-8B with a large number (on the order of 1 million) of LoRA adapters and a cross-attention layer. They froze Llama-3-8B and trained the LoRA adapters to predict the next token in a custom dataset of over 1 million questions and answers.
Results: The authors tested their LoRA-enhanced model’s ability to answer questions about a database via SQL queries. The model, which was outfitted for retrieval-augmented generation (RAG), achieved 94.7 percent accuracy. An unnamed model with RAG achieved 50 percent accuracy. Yes, but: It stands to reason that the authors’ approach saves processing, but it’s unclear how much. The authors didn’t mention the cost of fine-tuning Llama-3-8B in the usual way on their training dataset for the same number of epochs. Why it matters: The authors argue that eliminating hallucinations is possible in typical training, it’s just computationally very expensive (not to mention the risk of overfitting). An architecture designed to store and retrieve facts, via LoRA adapters in this case, makes the process more feasible. We’re thinking: While some researchers want large language models to memorize facts, others want them to avoid memorizing their training data. These aims address very different problems. Preventing LLMs from memorizing training data would make them less likely to regurgitate it verbatim and thus violate copyrights. On the other hand, this work memorizes facts so the model can deliver consistent, truthful responses that might be stated in a variety of ways.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|