Dear friends,
Is prompt engineering — the art of writing text prompts to get an AI system to generate the output you want — going to be a dominant user interface for AI? WIth the rise of text generators like GPT-3 and AI21’s Jurassic and image generators such as DALL·E, Midjourney, and Stable Diffusion, which take text input and produce output to match, there has been growing interest in how to craft prompts to get the output you want. For example, when generating an image of a panda, how does adding an adjective such as “beautiful” or a phrase like “trending on artstation” influence the output? The response to a particular prompt can be hard to predict and varies from system to system.
Some people have predicted that prompt engineering jobs would be plentiful in the future. I do believe that text prompts will be an important way to tell machines what we want — after all, they’re a dominant way to tell other humans what we want. But I think that prompt engineering will be only a small piece of the puzzle, and breathless predictions about the rise of professional prompt engineers are missing the full picture. Take speech synthesis (also called text-to-speech). Researchers have developed systems that allow users to specify which part of a sentence should be spoken with what emotion. Virtual knobs allow you to dial up or down the degree of different emotions. This provides fine control over the output that would be difficult to express in language. Further, by examining an output and then fine-tuning the controls, you can iteratively improve the output until you get the effect you want.
Keep learning! Andrew
NewsAI Chips Spark International TensionNew U.S. restrictions on chip sales aim to hamper China’s AI efforts. What’s new: The U.S. government published sweeping limits on sales of processors that involve U.S. designs and technology to Chinese businesses. U.S. officials stated that the restrictions are meant to prevent China from militarizing AI. New rules: The rules block sales of certain processors as well as U.S.-made equipment used to design and manufacture them. This includes high-end graphics processing units (GPUs) and other processors optimized for machine learning.
China’s response: A spokesperson for China’s foreign ministry accused the U.S. of abusing export-control measures to target Chinese firms, stating that it would hinder global cooperation and supply chains.
Why it matters: China has announced its ambition to become the global leader in AI by 2030, and this requires access to cutting-edge processing power. The most advanced chips are manufactured in Taiwan and South Korea using chip-fabrication equipment made by U.S. companies, and the leading chip designers and makers of chip-design software reside in the U.S. This gives U.S. authorities a tight grip on other countries’ ability to buy and make chips. China’s effort to build domestic capacity to produce advanced semiconductors — which are hampered by the sheer difficulty and expense of etching features on silicon measured in nanometers — now faces additional hardware, software, business, and talent hurdles. We’re thinking: International cooperation has been essential to recent progress in AI. As barriers rise between the U.S. and China, the AI community must navigate a world where geography will have a much bigger impact on access to ideas and resources.
Smarts for FarmsThe next green revolution may be happening in the server room. What’s new: Microsoft open-sourced AI tools designed to help farmers cut costs and improve yields. How it works: FarmVibes-AI includes systems that analyze overhead imagery and sensor data to guide farm operations.
Behind the news: Nonprofits and academic institutions provide other open-source AI systems to increase food production in collaboration with large agribusiness firms, independent farmers, and rural communities.
Why it matters: The emerging practice of precision agriculture, which seeks to take into account not only entire fields but also local conditions down to the level of individual plants, could help farmers sow seeds, grow crops, fight pests, and harvest produce more efficiently. Off-the-shelf systems may not serve farmers who work in different parts of the world or grow niche crops. Open-source projects can expand their options effectively and inexpensively. We’re thinking: Farmers tend to welcome innovations that improve yields and cut costs. They’re also famously self-sufficient, performing repairs and installing upgrades to their equipment. As self-driving tractors and precision-ag systems take root, they’re great candidates to become early adopters of industry-focused platforms that make it easy for anyone to build useful AI applications.
A MESSAGE FROM DEEPLEARNING.AILooking for a career that inspires you? Break into AI! The Machine Learning Specialization teaches foundational AI concepts through an intuitive visual approach. This beginner-friendly program, created by Andrew Ng and Stanford Online, makes it easier than ever to start your AI career. Learn more
All Synthetic, All the TimeJoe Rogan meets Steve Jobs in an AI-generated podcast. What’s new: For the debut episode of a new podcast series, Play.ht synthesized a 19-minute interview between the rock-star podcaster and late Apple CEO. You can hear it here and propose computer-generated participants in future episodes here. How it works: The Dubai-based text-to-speech startup created the podcast using text generation and voice cloning.
Behind the news: Rogan was also the subject of an early experiment in voice cloning. In 2019, Toronto-based Dessa released ersatz Rogan audio clips — the first of a parade of fake celebrity voices.
Why it matters: The declamation is occasionally stilted and the script meandering (with occasional lapses into incoherence), but the rapid progress of generative audio combined with the entertainment world’s appetite for novelty suggests that satisfying synthetic productions may not be far off.
Massively Multilingual TranslationSentence pairs that have equivalent meanings in different languages — typically used to train machine translation systems — have been available in sufficient quantities for only around 100 languages. New work doubled that number and produced a more capable model. What’s new: Marta R. Costa-jussà and colleagues at Meta, Johns Hopkins, and UC Berkeley developed an automated process for scraping multilingual sentence pairs from the web. They released No Language Left Behind (NLLB-200), a machine translation model that handles 200 languages. They also released the models, code, and data used to build it. Key insight: The web is full of text in various languages, including sentences that have the same meaning in different languages. For instance, unrelated pages in different languages may say the equivalent of, “Manchester United defeated Melbourne in yesterday’s match,” or “A long time ago in a galaxy far, far away.” An automated system can recognize such parallel sentences by learning to produce similar representations of sentences that have similar meaning regardless of their language. A teacher/student arrangement — with a multilingual teacher trained on languages with plentiful data to produce embeddings, and a separate monolingual student for each language scraped from the web — can align representations produced by the students. How they built the dataset: The authors identified languages in text scraped from the web, trained a teacher model on pre-existing multilingual data, and used it to train a student model to produce similar representations for similar meanings in the web text.
How they built the translator: NLLB-200 is a transformer encoder-decoder that comprises 54.5 billion parameters.
Results: The authors’ NLLB-200 model achieved 24.0 average spBLEU across all 202 languages, while the earlier DeltaLM achieved a 101-language average 16.7 spBLEU (which measures the overlap of word fragments between machine translations and ground truth, higher is better). A sparse NLLB-200 that used MoE rather than fully connected layers generally performed better than a dense version. For example, evaluated on Akan, a language spoken in Ghana for which little training data was available, the sparse model scored 36.2 chrF, while a dense version scored 35.6 chrF (which measures overlapping groups of consecutive characters between machine translations and ground truth, higher is better). NLLB-200 performed inconsistently compared to bilingual models: It achieved 36.2 chrF compared to an English-to-Akan model’s 16.8 chrF, but 51.4 chrF compared to an English-to-Gujarati model’s 51.7 chrF. A possible explanation: Languages that are dissimilar to other languages in the training data may not benefit as much from multilingual training. Why it matters: Faced with an apparent scarcity of data, the authors extracted it from the web. The data didn’t need to be perfect: To compensate for flaws such as typographical and grammatical errors, the model learned to convert its own translations — of flawed sentences but presumably many more correct ones — into good sentences. We’re thinking: University of Texas machine learning professor Raymond Mooney said, “You can’t cram the meaning of a whole %&!$# sentence into a single $&!#* vector.” Apparently these researchers did it!
Work With Andrew Ng
Senior Controller: Woebot Health seeks a controller to help drive the development of processes, technology, compliance, and reporting and ensure that data is available for decision-making. The ideal candidate has 10-plus years of experience. MBA and CPA preferred. Apply here
Director of Finance and Accounting: Landing AI seeks a finance leader to oversee and manage all aspects of financial operations and tax planning. The ideal candidate has a finance and accounting background, experience with managing debt and raising equity, and knowledge of product-led growth strategies in business-to-business software as a service. Apply here
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|