Dear friends,
As the winter holiday approaches, it occurs to me that, instead of facing AI winter, we are in a boiling-hot summer of AI.
And supervised learning is still far from achieving even a small fraction of its potential! Millions of applications that can be solved by supervised learning have not yet been built. Many teams are still trying to figure out best practices for developing products though supervised learning.
Happy holidays, Andrew
Top AI Stories of 2022
A Dazzling Year in AIAs we settle into a cup of hot cocoa and badger ChatGPT to suggest holiday gifts for our loved ones, we reflect on a year of tremendous advances in AI. Systems that generate human-like text, images, and code — with video and music on the horizon — delighted users even as they raised questions about the future of creativity. Models that decode chemistry and physics drove scientific discovery, while governments moved to control the supply of specialized microprocessors that make such innovations possible. While such developments give us pause, in this special issue of The Batch — as in past years at this season — we survey the marvels wrought by AI in 2022.
Synthetic Images EverywherePictures produced by AI went viral, stirred controversies, and drove investments.
Yes, but: Such models are trained on images scraped from the web. Like large language models, they inherit biases embedded in online content and imitate the inflammatory styles of expression.
Behind the news: Diffusion models generate output by starting with noise and removing it selectively over a series of steps. Introduced in 2015 by researchers at UC Berkeley and Stanford, they remained in the background for several years until further work showed that they could produce images competitive with the output of generative adversarial networks (GANs). Stability AI put a diffusion model at the heart of Stable Diffusion. OpenAI, which based the initial version of DALL·E on a GAN, updated it with a diffusion model at around the same time. Where things stand: The coming year is shaping up for a revolution in computer-aided creativity. And the groundswell of generated imagery isn’t going to stop at pictures. Google and Meta released impressive text-to-video models this year, and OpenAI accelerated text-to-3D-object generation by an order of magnitude.
Programmer’s Best FriendBehind schedule on a software project? There’s an app for that. Driving the story: AI-powered code generators made their way into large companies, and even small-time developers (and non-developers) gained access to them.
Behind the news: Users of OpenAI’s GPT-3 language model showed that it could generate working code as early as mid-2020. A year later, OpenAI introduced a fine-tuned version known as Codex, which serves as the foundation for GitHub's Copilot. Yes, but: The widely available versions of this technology aren’t yet able to write complex programs. Often their output looks right at first glance but turns out to be buggy. Moreover, their legal status may be in jeopardy. A class-action lawsuit against GitHub, OpenAI, and Microsoft claims that the training of Codex violated open source licensing agreements. The outcome could have legal implications for models that generate text, images, and other media as well. Where things stand: AI-powered coding tools aren’t likely to replace human programmers in the near future, but they may replace the tech question-and-answer site Stack Overflow as the developer’s favorite crutch.
AI’s Eyes EvolveWork on vision transformers exploded in 2022. What happened: Researchers published well over 17,000 ViT papers during the year. A major theme: combining self-attention and convolution. Driving the story: A team at Google Brain introduced vision transformers (ViTs) in 2020, and the architecture has undergone nonstop refinement since then. The latest efforts adapt ViTs to new tasks and address their shortcomings.
Behind the news: While much ViT research aims to surpass and ultimately replace convolutional neural networks (CNNs), the more potent trend is to marry the two. The ViT’s strength lies in its ability to consider relationships between all pixels in an image at small and at large scales. One downside is that it needs additional training to learn in ways that are baked into the CNN architecture after random initialization. CNN’s local context window (within which only local pixels matter) and weight sharing (which enables it to process different image locations identically) help transformers to learn more from less data. Where things stand: The past year expanded the Vision Transformer’s scope in a number of applications. ViTs generated plausible successive video frames, generated 3D scenes from 2D image sequences, and detected objects in point clouds. It's hard to imagine recent advances in text-to-image generators based on diffusion models without them.
A MESSAGE FROM DEEPLEARNING.AIMathematics for Machine Learning and Data Science is our next specialization. Set to launch in January 2023, it’s a beginner-friendly way to master the math behind AI algorithms and data analysis techniques. Join the waitlist and be among the first to enroll!
Language Models, ExtendedResearchers pushed the boundaries of language models to address persistent problems of trustworthiness, bias, and updatability. Driving the story: The capacity of language models to generate plausible text outstrips their ability to discern facts and resist spinning fantasies and expressing social biases. Researchers worked to make their output more trustworthy and less inflammatory.
Behind the news: Amid the progress came a few notable stumbles. The public demo Meta’s Galactica, a language model trained to generate text on scientific and technical subjects, lasted three days in November before its developers pulled the plug due to its propensity to generate falsehoods and cite nonexistent sources. In August, the chatbot BlenderBot 3, also from Meta, quickly gained a reputation for spouting racist stereotypes and conspiracy theories. Where things stand: The toolbox of truth and decency in text generation grew substantially in the past year. Successful techniques will find their way into future waves of blockbuster models.
One Model Does It AllIndividual deep learning models proved their mettle in hundreds of tasks. Driving the story: Researchers pushed the limits of how many different skills a neural network can learn. They were inspired by the emergent skills of large language models — say, the ability to compose poetry and write computer programs without architectural tuning for either — as well as the capacity of models trained on both text and images to find correspondences between the disparate data types.
Behind the news: The latest draft of the European Union’s proposed AI Act, which could become law in 2023, would require users of general-purpose AI systems to register with the authorities, assess their systems for potential misuse, and conduct regular audits. The draft defines general-purpose systems as those that “perform generally applicable functions such as image/speech recognition, audio/video generation, pattern-detection, question-answering, translation, etc.,” and are able to “have multiple intended and unintended purposes.” Some observers have criticized the definition as too broad. The emerging breed of truly general-purpose models may prompt regulators to sharpen their definition. Where things stand: We’re still in the early phases of building algorithms that generalize to hundreds of different tasks, but the year showed that deep learning has the potential to get us there.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|