Dear friends,
While working on Course 3 of the Machine Learning Specialization, which covers reinforcement learning, I was reflecting on how reinforcement learning algorithms are still quite finicky. They’re very sensitive to hyperparameter choices, and someone experienced at hyperparameter tuning might get 10x or 100x better performance. Supervised deep learning was equally finicky a decade ago, but it has gradually become more robust with research progress on systematic ways to build supervised models. My collaborators and I have applied RL to cars, helicopters, quadrupeds, robot snakes, and many other applications. Yet today’s RL algorithms still feel finicky. Whereas poorly tuned hyperparameters in supervised deep learning might mean that your algorithm trains 3x or 10x more slowly (which is bad), in reinforcement learning, it feels like they might result in training 100x more slowly — if it converges at all! Similar to supervised learning a decade ago, numerous techniques have been developed to help RL algorithms converge (like double Q learning, soft updates, experience replay, and epsilon-greedy exploration with slowly decreasing epsilon). They’re all clever, and I commend the researchers who developed them, but many of these techniques create additional hyperparameters that seem to me very hard to tune.
Keep learning! Andrew
DeepLearning.AI ExclusiveQ&A With Andrew NgAndrew answers questions about the new Machine Learning Specialization including:
NewsWhere Drones Fly FreeAutonomous aircraft in the United Kingdom are getting their own superhighway. What’s new: The UK government approved Project Skyway, a 165-mile system of interconnected drone-only flight routes. The airspace is scheduled to open by 2024.
Behind the news: Project Skyway is the largest proposed designated drone flight zone, but it’s not the only one.
Yes, but: Although Skyway includes a collision-avoidance system, it’s not designed to prevent accidents during takeoff and landing, when they’re most common. Moreover, it's not yet clear whether the plan includes designated takeoff and landing sites. “The problem is what happens when you're 10 feet away from people,” one aerospace engineer told the BBC.
Large Language Models UnboundA worldwide collaboration produced the biggest open source language model to date. What’s new: BLOOM is a family of language models built by the BigScience Research Workshop, a collective of over 1,000 researchers from 250 institutions around the globe.
Behind the news: BigScience began in May 2021 as a year-long series of workshops aimed at developing open source AI models that are more transparent, auditable, and representative of people from diverse backgrounds than their commercial counterparts. Prior to BLOOM, the collaboration released the T0 family of language models, which were English-only and topped out at 11 billion parameters.
A MESSAGE FROM DEEPLEARNING.AICourse 3 of the Machine Learning Specialization, “Unsupervised Learning, Recommender Systems, and Reinforcement Learning,” is available! Learn unsupervised techniques for anomaly detection, clustering, and dimensionality reduction. Build a recommender system, too! Enroll now
Protection for PollinatorsA machine learning method could help chemists formulate pesticides that target harmful insects but leave bees alone. What’s new: Researchers at Oregon State University developed models that classify whether or not a chemical is fatally toxic to bees. The authors believe their approach could be used to screen pesticide formulations for potential harm to these crucial pollinators.
Results: The two models performed similarly. They accurately classified 81 to 82 percent of molecules as lethal or nonlethal to bees. Of the molecules classified as lethal, 67 to 68 percent were truly lethal. Behind the news: Bees play a crucial role in pollinating many agricultural products. Without them, yields of important crops like cotton, avocados, and most fruit would drop precipitously. Numerous studies have shown that pesticides are harmful to bees. Pesticides have contributed to increased mortality among domesticated honey bees as well as a decline in the number of wild bee species. Why it matters: Pesticides, herbicides, and fungicides have their dangers, but they help enable farms to produce sufficient food to feed a growing global population. Machine learning may help chemists engineer pesticides that are benign to all creatures except their intended targets.
Choose the Right AnnotatorsClassification isn’t always cut and dried. While the majority of doctors are men and nurses women, that doesn't mean all men who wear scrubs are doctors or all women who wear scrubs are nurses. A new method attempts to account for biases that may be held by certain subsets of labelers. What's new: Mitchell L. Gordon and colleagues at Stanford introduced a method to control bias in machine learning model outputs. Their jury learning approach models a user-selected subset of the annotators who labeled the training data. Key insight: A typical classifier mimics how an average labeler would annotate a given example. Such output inevitably reflects biases typically associated with an annotator’s age, gender, religion, and so on, and if the distribution of such demographic characteristics among labelers is skewed, the model’s output will be skewed as well. How to correct for such biases? Instead of predicting the average label, a classifier can predict the label likely to be applied by each individual in a pool of labelers whose demographic characteristics are known. Users can choose labelers who have the characteristics they desire, and the model can emulate them and assign a label accordingly. This would enable users to correct for biases (or select for them). How it works: The authors used jury learning to train a classifier to mimic the ways different annotators label the toxicity of social media comments. The dataset comprised comments from Twitter, Reddit, and 4Chan.
Results: The authors evaluated their model’s ability to predict labels assigned by individual annotators. It achieved 0.61 mean average error, while a BERTweet fine-tuned on the dataset achieved 0.9 mean average error (lower is better). The authors’ model achieved fairly consistent error rates when estimating how annotators of different races would label examples: Asian (0.62), Black (0.65), Hispanic (0.57), White (0.60). In contrast, BERTweet’s error rate varied widely with respect to Black annotators: Asian (0.83), Black (1.12), Hispanic (0.87), White (0.87). The authors’ model, which focused on estimating labels assigned by individuals, also outperformed a similar model that was trained to predict decisions by demographic groups, which scored 0.81 mean average error. Why it matters: Users of AI systems may assume that data labels are objectively true. In fact, they’re often messy approximations, and they can be influenced by the circumstances and experiences of individual annotators. The jury method gives users a way to account for this inherent subjectivity. We're thinking: Selecting a good demographic mix of labelers can reduce some biases and ensure that diverse viewpoints are represented in the resulting labels — but it doesn’t reduce biases that are pervasive across demographic groups. That problem requires a different approach.
Work With Andrew Ng
Tech Lead: Workhelix, an AI Fund portfolio company that provides data and tools for companies to manage human capital, seeks a tech lead to produce scalable software solutions for enterprise customers. You'll be part of a co-founding team responsible for the full software development life cycle. Apply here
Senior IT Manager: Woebot Health is looking for a senior IT manager to support onboarding, maintenance, and offboarding. The ideal candidate can work with engineering to ensure that technology needed to build, maintain, and run products is operational and integrated seamlessly into the overall Woebot Health IT infrastructure. Apply here Product Marketing Manager: DeepLearning.AI seeks a product marketing manager who can bring its products to life across multiple channels and platforms including social, email, and the web. The ideal candidate is a creative self-starter who can work collaboratively and independently to execute new ideas and projects, thrives in a fast-paced environment, and has a passion for AI and/or education. Apply here
Data Engineer (Latin America): Factored seeks top data engineers with experience in data structures and algorithms, operating systems, computer networks, and object-oriented programming. Experience with Python and excellent English skills are required. Apply here
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|