Dear friends,
The past week has been an unusual split-screen time in AI. On one side, I see rapidly developing innovations from OpenAI, as well as Elon Musk's Grok and Kai-Fu Lee's open source Yi-34B large language model. On the other side, the White House and participants in last week’s AI Safety Summit are making regulations that may slow down AI by stifling innovation and limiting open source.
In my experience, speed in decision-making and execution is a huge predictor of startup success. Bearing in mind the importance of responsible AI, I respect leaders and teams that make decisions and execute quickly. In contrast, I’ve also seen companies where shipping a feature can require 3 months for legal, marketing, and privacy review. Systemically forcing yourself to make a decision quickly rather than calling another meeting to talk about a topic some more (unless it’s really necessary) can push an organization to move faster.
Keep learning! Andrew
P.S. Vector databases are a backbone of large language model (LLM) search and data-retrieval systems, for example in retrieval augmented generation (RAG). In our new short course, created with Weaviate and taught by Sebastian Witalec, you’ll learn the technical foundations of how vector databases work and how to incorporate them in your LLM applications. You’ll also learn to build RAG and search applications. I invite you to sign up for “Vector Databases: from Embeddings to Applications”!
NewsGenerative AI as Development PlatformOpenAI added new features designed to help developers build applications using its generative models. Upgrades and more: The company rolled out the upgraded GPT-4 Turbo (which now underpins ChatGPT). It extended API access to its DALL·E 3 image generator, text-to-speech engine, speech recognition, and agent-style capabilities. And it showed off a new concept in chatbots called GPTs.
Why it matters: OpenAI is enabling developers to build intelligence into an ever wider range of applications. GPT-4 Turbo's 128,000-token context window makes possible applications that require tracking information across huge volumes of input. The expanded APIs open up language, vision, and multimodal capabilities as well as agent-style applications that respond to changing conditions and behave in complex ways. The opportunities for developers are immense.
AI Safety Summit Mulls RisksAn international conference of political leaders and tech executives agreed to regulate AI. What’s new: 28 countries including China and the United States as well as the European Union signed a declaration aimed at mitigating AI risks. How it works: The declaration kicked off the United Kingdom’s first AI Safety Summit at Bletchley Park, a country house outside London, where Alan Turing and others cracked Germany’s Enigma code during World War II.
More to come: The AI Safety Summit is set to be the first in a series. South Korea will host a follow-up in six months. France will host a third summit six months later. Yes, but: Critics found the conference wanting. Some researchers criticized it for failing to endorse concrete limits on AI. Others blamed the speakers for promoting fear, particularly UK prime minister Rishi Sunak, who compared the AI risks to a global pandemic or nuclear war. Why it matters: AI is developing rapidly, and regulatory frameworks are already emerging in China, Europe, and the U.S. The summit is an effort to lay groundwork for a coherent international framework. We’re thinking: We applaud approaches that engage leaders in government, industry, and research. But we remain concerned that exaggerated fear of risks may lead to regulations that stifle innovation, especially by limiting open source development. UK Deputy Prime Minister Oliver Dowden spoke about the value of open source and said there should be a very high bar to restrict open source in any way. We heartily agree!
A MESSAGE FROM DEEPLEARNING.AILearn how to use vector databases with large language models to build applications that include hybrid and multilingual searches! Take our new course, “Vector Databases: from Embeddings to Applications.” Enroll for free
The Language of SchizophreniaLarge language models may help psychiatrists resolve unanswered questions about mental illness. What’s new: Researchers from University College London, Beijing Normal University, and Lisbon’s Champalimaud Centre for the Unknown used a large language model to measure differences in the ways people with schizophrenia use words. How it works: The authors asked 26 people who had been diagnosed with schizophrenia and 26 people who hadn’t to name as many animals as they could in five minutes. They also asked the subjects to name as many words that start with the letter “P” as they could in five minutes.
Results: Responses by subjects with schizophrenia had greater randomness. To control for variations in the contents of various patients’ lists, the researchers expressed the degree of randomness as a standard score, where 0 indicates complete randomness, and the lower the negative number, the more optimal the order. On average, people with schizophrenia achieved -5.81, while people without schizophrenia achieved -7.02. Why it matters: The fastText model’s embeddings helped the authors demonstrate a relationship between cognitive activity and psychiatric symptoms that previously was purely theoretical. Such a relationship has been difficult to establish through brain imaging or traditional testing. We’re thinking: It’s important to note that the authors don’t propose using their method as a diagnostic tool to determine whether or not a patient has schizophrenia. Unlike diagnosing, say, a cancerous tumor, establishing ground truth in mental illness is extremely complicated. The fact that AI-based measurements agree with doctors’ assessments is a very positive sign.
Synthetic Data Helps Image ClassificationGenerated images can be more effective than real ones in training a vision model to classify images. What's new: Yonglong Tian, Lijie Fan, and colleagues at Google and MIT introduced StableRep, a self-supervised method that trains vision transformers on images generated by Stability.AI’s Stable Diffusion image generator. Key insight: Models that employ a contrastive loss learn to represent examples as more or less similar. For example, images that depict a particular object are more similar to each other, and images that depict other objects are less similar to the first group. The training method known as SimCLR uses a contrastive loss with two augmented (cropped, rotated, flipped, and so on) versions of each image, so a model learns that augmented versions of one image, which is closely related but different, are similar to one another — but not to augmented versions of other images. Given a prompt, an image generator produces images that are closely related but significantly more different than augmented versions of the same image. This makes for greater variety among similar examples, which can lead to more effective learning using a contrastive loss. How it works: The authors generated images and trained a vision transformer on them using a contrastive loss.
Results: The authors compared the ViT-B/16 trained using StableRep to two models of the same architecture trained using SimCLR (one using generated images, the other using images from Conceptual Captions). They also compared it to two CLIP models that produced matching embeddings for images and their paired captions, one trained on generated images and their prompts, the other on real images and their captions. For each of 11 computer vision datasets, the authors trained a linear classifier on top of each model without changing the model’s weights. Comparing the classifiers’ performance, StableRep achieved the best results on 9 of them. For example, on FGVC-Aircraft (10,000 images of 100 different aircraft), StableRep achieved 57.6 percent accuracy, while the best competing model, CLIP pretrained on generated images, scored 53.5 percent. Why it matters: The fact that text-to-image generators can produce images of similar things that are quite different in appearance makes them a powerful resource for training vision models. And they provide a practically unlimited source of such images! We're thinking: Different foundation models understand different aspects of the world. It’s exciting that a large diffusion model, which is good at generating images, can be used to train a large vision transformer, which is good at analyzing images!
A MESSAGE FROM FOURTHBRAINJoin us for two live workshops! Learn to leverage large language models in these interactive, hands-on sessions. Team registrations are available. Register here
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|