Dear friends,
I’ve seen many friends transition from an academic or research role to a corporate role. The most successful ones adjusted to corporate work by shifting their mindset in a few crucial ways.
The shift in mindset between academia and industry is significant, but knowing the key differences in advance can make it easier to shift appropriately. I’ve enjoyed roles in both domains, and both offer valuable ways to move the world forward.
NewsCrawl the Web, Absorb the BiasThe emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue. What’s new: Abeba Birhane and colleagues at University College Dublin and University of Edinburgh audited the LAION-400M dataset, which was released in September. It comprises data scraped from the open web, from which inaccurate entries were removed by a state-of-the-art model for matching images to text. The automated curation left plenty of worrisome examples among the remaining 400 million examples — including stereotypes, racial slurs, and sexual violence — raising concerns that models trained on LAION-400M would inherit its shortcomings. Key insight: The compilers of LAION-400M paired images and text drawn from Common Crawl, a large repository of web data. To filter out low-quality pairs, they used CLIP to score the correspondence between them and discarded those with the lowest scores. But CLIP itself is trained on a massive trove of web data. Thus it’s bound to find a high correspondence between words and pictures that are frequently associated with one another on the web, even if the associations are spurious or otherwise undesirable. NSFT (not safe for training): The authors entered text queries into LAION-400M’s search function, which returned matching images.
Behind the news: The LAION-400M team, a loosely knit collective led by Christoph Schuhmann at University of Vienna, aims to re-create Google’s Wikipedia-based Image Text dataset and ultimately use it to train open-source analogs of OpenAI’s CLIP and DALL·E. The group was inspired by EleutherAI’s community effort to build an open source version of GPT-3. Why it matters: It’s enormously expensive to manually clean a dataset that spans hundreds of millions of examples. Automated curation has been viewed as a way to ensure that immense datasets contain high-quality data. This study reveals serious flaws in that approach. We’re thinking: Researchers have retracted or amended several widely used datasets to address issues of biased and harmful data. Yet, as the demand for data rises, there’s no ready solution to this problem. Audits like this make an important contribution, and the community — including large corporations that produce proprietary systems — would do well to take them seriously.
Transformer Speed-Up Sped UpThe transformer architecture is notoriously inefficient when processing long sequences — a problem in processing images, which are essentially long sequences of pixels. One way around this is to break up input images and process the pieces separately. New work improves upon this already-streamlined approach. What’s new: Zizhao Zhang and colleagues at Google and Rutgers University simplified an earlier proposal for using transformers to process images. They call their architecture NesT. Key Insight: A transformer that processes parts of an image and then joins them can work more efficiently than one that looks at the entire image at once. However, to relate the parts to the whole, it must learn how the pixels in different regions relate to one another. A recent model called Swin does this by shifting region boundaries in between processing regions and merging them together — a step that nonetheless consumes compute cycles. Using convolutions to process both within and across regions can enable a model to learn such relationships without shifting region boundaries, saving that computation. How it works: The authors trained NesT to classify images in ImageNet.
Results: A 38 million-parameter NesT achieved 83.3 accuracy on ImageNet. This performance matched that of an 88-million parameter Swin-B — a 57 percent saving in the compute budget. Why it matters: Transformers typically bog down when processing images. NesT could help vision applications take fuller advantage of the architecture’s strengths. We’re thinking: Computational efficiency for the Swin!
A MESSAGE FROM DEEPLEARNING.AIWe’re updating our Natural Language Processing Specialization to reflect the latest advances! Join instructor Younes Bensouda Mourri and Hugging Face engineer Lewis Tunstall for a live Ask Me Anything session on November 3, 2021. Get answers to all your NLP-related questions!
Search Goes MultimodalGoogle will upgrade its search engine with a new model that tracks the relationships between words, images, and, in time, videos — the first fruit of its latest research into multimodal machine learning and multilingual language modeling. What’s new: Early next year, Google will integrate a new architecture called Multitask Unified Model (MUM) into its traditional Search algorithm and Lens photo-finding system, VentureBeat reported. The new model will enable the search engines to break down complex queries (“I’ve hiked Mt. Adams and now I want to hike Mt. Fuji next fall. What should I do differently to prepare?”) into simpler requests (“prepare to hike Mt. Adams,” “prepare to hike Mt. Fuji,” “Mt. Fuji next fall”). Then it can combine results of the simpler requests into coherent results. How it works: Announced in May, MUM is a transformers-based natural language model. It’s based on Google’s earlier T5 that comprises around 110 billion parameters (compared to BERT’s 110 million, GPT-3’s 175 billion, and Google’s own Switch Transformer at 1.6 trillion). It was trained on a dataset of text and image documents drawn from the web from which hateful, abusive, sexually explicit, and misleading images and text were removed.
Behind the news: In 2019, Google Search integrated BERT. The change improved the results of 10 percent of English-language queries, the company said, particularly those that included conversational language or prepositions like “to” (the earlier version couldn’t distinguish the destination country in a phrase like “brazil traveler to usa”). BERT helped spur a trend toward larger, more capable transformer-based language models. Why it matters: Web search is ubiquitous, but there’s still plenty of room for improvement. This work takes advantage of the rapidly expanding capabilities of transformer-based models. We’re thinking: While we celebrate any advances in search, we found Google’s announcement short on technical detail. Apparently MUM really is the word.
Roll Over, BeethovenLudwig van Beethoven died before he completed what would have been his tenth and final symphony. A team of computer scientists and music scholars approximated the music that might have been. What’s new: The Beethoven Orchestra in Bonn performed a mock-up of Beethoven’s Tenth Symphony partly composed by an AI system, the culmination of an 18-month project. You can view and hear the performance here. How it works: The master left behind around 200 fragmentary sketches of the Tenth Symphony, presumably in four movements. A human composer in 1988 completed two movements, for which more source material was available, so the team set out to compose two more.
Everyone’s a critic: Composer Jan Swafford, who wrote a 2014 biography of Beethoven, described the finished work as uninspired and lacking Beethovenian traits such as rhythms that build to a sweeping climax. Behind the news: In 2019, Huawei used AI powered by its smartphone processors to realize the final two movements of Franz Schubert’s unfinished Eighth Symphony. The engineers trained their model on roughly 90 pieces of Schubert’s work as well as pieces written by composers who influenced him. A human composer cleaned up the output, organized it into sections, and distributed the notes among various instruments. Why it matters: AI is finding its way into the arts in a variety of roles. As a composer, generally the technology generates short passages that humans can assemble and embellish. It’s not clear how much the team massaged the model’s output in this case, but the ambition clearly is to build an end-to-end symphonic composer. We’re thinking: Elgammal has published work on generative adversarial networks. Could one of his GANs yield Beethoven’s Eleventh?
Work With Andrew Ng
Senior Technical Program Manager: Landing AI is looking for a program manager to bridge the engineering team and business partners. The ideal candidate has a strong background in relationship management with at least three years in a direct program management position and at least two years in a technical role. Apply here
Marketing Manager, Community & Events: DeepLearning.AI is looking for a marketing manager to spearhead events and experiential marketing. The ideal candidate is a talented leader, communicator, and creative producer who is ready to create world-class events that keep DeepLearning.AI community members connected and engaged with each other. Apply here
Marketing Operations Manager: DeepLearning.AI seeks a marketing operations expert to oversee its data and analytics strategy, manage its marketing technology stack, and optimize workflows and processes. The ideal candidate is a strong project manager, communicator, and technical wizard who can help the company manage its community of learners. Apply here
Data Engineer (LatAm): Factored is looking for top data engineers with experience in data structures and algorithms, operating systems, computer networks, and object-oriented programming. You must have experience with Python and excellent skills in English. Apply here
Software Development Engineer: Landing AI seeks software development engineers to build scalable AI applications and deliver optimized inference software. A strong background in Docker, Kubernetes, infrastructure, network security, or cloud-based development is preferred. Apply in North America or Latin America
Sales Development Representative (North America): Landing AI is looking for a salesperson to generate new business opportunities through calls, strategic preparation, and delivering against quota. Experience with inside sales and enterprise products and a proven track record of achieving quotas is preferred. Apply here
Machine Learning Engineer Intern (LatAm, North America, or Taiwan): Landing AI is looking for an intern to help build end-to-end machine learning applications. Strong machine learning and deep learning background along with technical deep-dive experience in at least two projects is preferred. Apply here
Product Design Lead: Workera seeks a talented product designer to understand complex problems and find effective solutions. This position will help define UX processes and develop a nimble UX team to deliver high-quality work. Apply here
Head of Engineering (Full-Time Remote): Zest is looking for an engineering leader to provide thought leadership and establish a technical vision. Experience in developing and shipping consumer-facing mobile apps for iOS and Android is a must. Apply here
Software Engineers (Remote): Workera, a precision upskilling company that enables individuals and organizations to identify, measure, interpret, and develop AI skills, is looking for software engineers of all levels. You’ll own the mission-critical effort of implementing and deploying innovative learning technologies. Apply here
Solutions Consultant: Workera is looking for a solutions architect to empower its go-to-market team, create a streamlined sales-enabling environment, and accelerate business opportunities. Apply here
Various Roles: Workera seeks an enterprise lead product manager, product design lead, compliance and risk manager, financial planning and analysis manager, and senior data engineer. Apply here
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|