Dear friends,
Prompt-based development is making the machine learning development cycle much faster: Projects that used to take months now may take days. I wrote in an earlier letter that this rapid development is causing developers to do away with test sets.
The speed of prompt-based development is also changing the process of scoping projects. In lieu of careful planning, it’s increasingly viable to throw a lot of projects at the wall to see what sticks, because each throw is inexpensive.
I find this workflow exciting because, in addition to increasing the speed of iteration for individual projects, it significantly increases the volume of ideas we can try. In addition to plotting the sentiment of customer emails, why not experiment with automatically routing emails to the right department, providing a brief summary of each email to managers, clustering emails to help spot trends, and many more creative ideas? Instead of planning and executing one machine learning feature, it’s increasingly possible to build many, quickly check if they look good, ship them to users if so, and get rapid feedback to drive the next step of decision making.
One important caveat: As I mentioned in the letter about eliminating test sets, we shouldn’t let the speed of iteration lead us to forgo responsible AI. It’s fantastic that we can ship quick-and-dirty applications. But if there is risk of nontrivial harm such as bias, unfairness, privacy violation, or malevolent uses that outweigh beneficial uses, we have a responsibility to evaluate our systems’ performance carefully and ensure that they’re safe before we deploy them widely.
What ideas do you have for prompt-based applications? If you brainstorm a few different ways such applications could be useful to you or your company, I hope you’ll implement many of them (safely and responsibly) and see if some can add value!
Keep learning, Andrew
P.S. We just announced a new short course today, LangChain: Chat with Your Data, built in collaboration with Harrison Chase, creator of the open-source LangChain framework. In this course, you’ll learn how to build one of the most-requested LLM-based applications: Answering questions based on information in a document or collection of documents. This one-hour course teaches you how to do that using retrieval augmented generation (RAG). It also covers how to use vector stores and embeddings to retrieve document chunks relevant to a query.
NewsThe Secret Life of Data LabelersThe business of supplying labeled data for building AI systems is a global industry. But the people who do the labeling face challenges that impinge on the quality of both their work and their lives. What’s new: The Verge interviewed more than two dozen data annotators, revealing a difficult, precarious gig economy. Workers often find themselves jaded by low pay, uncertain schedules, escalating complexity, and deep secrecy about what they’re doing and why.
What they’re saying: “AI doesn’t replace work. But it does change how work is organized.” —Erik Duhaime, CEO, Centaur Labs Behind the news: Stanford computer scientist Fei-Fei Li was an early pioneer in crowdsourcing data annotations. In 2007, she led a team at Princeton to scale the number of images used to train an image recognizer from tens of thousands to millions. To get the work done, the team hired thousands of workers via Amazon’s Mechanical Turk platform. The result was ImageNet, a key computer vision dataset. Why it matters: Developing high-performance AI systems depends on accurately annotated data. Yet the harsh economics of annotating at scale encourages service providers to automate the work and workers to either cut corners or drop out. Notwithstanding recent improvements — for instance, Google raised its base wage for contractors who evaluate search results and ads to $15 per hour — everyone would benefit from treating data annotation less like gig work and more like a profession.
Making Government MultilingualAn app is bridging the language gap between the Indian government and its citizens, who speak a wide variety of languages. What’s new: Jugalbandi helps Indians learn about government services, which typically are described online in English and Hindi, in their native tongues. The project is a collaboration between Microsoft and open-source developers AI4Bharat and OpenNyAI.
Behind the news: While language models are helping citizens understand their governments, they’re also helping governments understand their citizens. In March, Romania launched ION, an AI system that scans social media comments on government officials and policy and summarizes them for ministers to read. Why it matters: India is a highly multilingual society, and around a quarter of its 1.4 billion residents are illiterate. Consequently, many people in India struggle to receive government benefits and interact with central authorities. This approach may enable Indians to use their own language via WhatsApp, which has more than 400 million users in that country. We’re thinking: In February, Microsoft researchers showed that large language models are approaching state-of-the-art results in machine translation. Indeed, machine translation is headed toward a revolution as models like GPT 3.5 (used in the study) and GPT-4 (which is even better) make translations considerably easier and more accurate.
A MESSAGE FROM DEEPLEARNING.AIChatting with data is a highly valuable use case for large language models. In this short course, you’ll use the open source LangChain framework to build a chatbot that interacts with your business or personal data. Enroll in "LangChain: Chat with Your Data” today for free!
Letting Chatbots See Your DataA new coding framework lets you pipe your own data into large language models. What’s new: LlamaIndex streamlines the coding involved in enabling developers to summarize, reason over, and otherwise manipulate data from documents, databases, and apps using models like GPT-4.
Behind the news: Former Uber research scientist Jerry Liu began building LlamaIndex (originally GPT Index) in late 2022 and co-founded a company around it earlier this year. The company, which recently received $8.5 million in seed funding, plans to launch an enterprise version later this year.
Bug FinderOne challenge to making online education available worldwide is evaluating an immense volume of student work. Especially difficult is evaluating interactive computer programming assignments such as coding a game. A deep learning system automated the process by finding mistakes in completed assignments. What’s new: Evan Zheran Liu and colleagues at Stanford proposed DreamGrader, a system that integrates reinforcement and supervised learning to identify errors (undesirable behaviors) in interactive computer programs and provide detailed information about where the problems lie. Key insight: A reinforcement learning model can play a game, randomly at first, and — if it receives the proper rewards — learn to take actions that bring about an error. A classifier can learn to recognize that the error occurred, randomly at first, and reward the RL model when it triggers the error. In this scheme, training requires a small number of student submissions that have been labeled with a particular error that is known to occur. The two models learn in an alternating fashion: The RL model plays for a while and does or doesn’t bring about the error; the classifier classifies the RL model’s actions (that is, it applies the model’s label to actions that trigger the error and, if so, dispenses a reward), then the RL model plays more, and so on. By repeating this cycle, the classifier learns to recognize an error reliably. How it works: DreamGrader was trained on a subset of 3,500 anonymized student responses to an assignment from the online educational platform Code.org. Students were asked to code Bounce, a game in which a single player moves a paddle along a horizontal axis to send a ball into a goal. The authors identified eight possible errors (such as the ball bouncing out of the goal after entering and no new ball being launched after a goal was scored) and labeled the examples accordingly. The system comprised two components for each type of error: (i) a player that played the game (a double dueling deep Q-network) and (ii) a classifier (an LSTM and vanilla neural network) that decided whether the error occurred.
Results: The authors evaluated DreamGrader on a test set of Code.org student submissions. For comparison, they modified the previous Play to Grade, which had been designed to identify error-free submissions, to predict the presence of a specific error. DreamGrader achieved 94.3 percent accuracy — 1.5 percent short of human-level performance — while Play to Grade achieved 75.5 percent accuracy. It evaluated student submissions in around 1 second each, 180 times faster than human-level performance. Yes, but: DreamGrader finds only known errors. It can’t catch bugs that instructors haven’t already seen. Why it matters: Each student submission can be considered a different, related task. The approach known as meta-RL aims to train an agent that can learn new tasks based on experience with related tasks. Connecting these two ideas, the authors trained their model following the learning techniques expressed in the meta-RL algorithm DREAM. Sometimes it’s not about reinventing the wheel, but reframing the problem as one we already know how to solve. We’re thinking: Teaching people how to code empowers them to lead more fulfilling lives in the digital age, just as teaching them to read has opened doors to wisdom and skill since the invention of the printing press. Accomplishing this on a global scale requires automated systems for education (like Coursera!). It’s great to see AI research that could make these systems more effective.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and LandingAI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|