Dear friends,
If you want to build a career in AI, it’s no longer necessary to be located in one of a few tech hubs such as Silicon Valley or Beijing. Tech hubs are emerging in many parts of the world, and cities large and small offer opportunities both for local talent and companies worldwide.
Our fastest-growing Latin American team is Factored, which helps companies build world-class AI and data engineering teams. Factored’s Latin American operation grew from 24 engineers to well over 100 in the past year. Its projects have ranged from developing MLOps pipelines for one of the largest financial-tech companies in Silicon Valley to presenting papers at NeurIPS.
The rise of opportunities in Latin America is part of the broader trend toward working from home. I can collaborate as easily with someone in Palo Alto, California, as with someone in Buenos Aires, Argentina. In fact, I’ve been spending more time in Washington State (where I enjoy the benefit of free babysitting by my wonderful in-laws) instead of my Palo Alto headquarters.
Keep learning! Andrew
News
Machine Learning Jobs on the RiseJobs for machine learning engineers are growing fast, according to an analysis by LinkedIn. What the data says: LinkedIn analyzed job openings listed on its site between January 2017 and July 2021 and ranked those that showed consistent growth over the entire period. The analysis counted open positions at different levels of seniority as a single position. It didn’t count positions occupied by interns, volunteers, or students.
Behind the news: While LinkedIn’s analysis was confined to the U.S., evidence suggests that machine learning jobs are growing worldwide.
Why it matters: North America is the world’s largest AI market, accounting for around 40 percent of AI revenue globally. The fact that remote work is an option for one in five U.S. machine learning jobs suggests a huge opportunity for applicants located in other parts of the world.
Let the Model Choose Your OutfitAmazon’s first brick-and-mortar clothing store is getting ready to deliver automated outfit recommendations. What’s new: The ecommerce giant announced plans to open a flagship Amazon Style location at a Los Angeles-area mall this year. How it works: The 30,000 square-foot store will feature aisles and racks like a traditional clothing store, but customers will be able to scan QR codes using their phones to see variations in color and size as well as items recommended by machine learning models. A touchscreen in each fitting room will enable customers to request such items to try on. Proposed innovations: Research papers provide glimpses of Amazon’s ideas for AI-driven fashion retailing. The company declined to comment on whether it plans to implement them. For instance:
Behind the news: Last summer, Amazon opened its first brick-and-mortar grocery store, where customers can take merchandise off a shelf and exit without interacting with a clerk for payment. Computer vision identifies them at the door and identifies the products to charge their account automatically. Why it matters: The fashion retailing market is crowded, but Amazon’s considerable AI expertise puts it at the forefront of low-friction retailing. We’re thinking: Fashion companies such as Stitch Fix and Wantable have used AI to recommend clothing and build valuable businesses. There are good reasons to believe that future fashion leaders will be sophisticated AI players.
High Accuracy at Low PowerEquipment that relies on computer vision while unplugged — mobile phones, drones, satellites, autonomous cars — need power-efficient models. A new architecture set a record for accuracy per computation. What's new: Yinpeng Chen and colleagues at Microsoft devised Mobile-Former, an image recognition system that efficiently weds a MobileNet’s convolutional eye for detail with a Vision Transformer’s attention-driven grasp of the big picture. Key insight: Convolutional neural networks process images in patches, which makes them computationally efficient but ignores global features that span multiple patches. Transformers represent global features but they’re inefficient. A transformer’s self-attention mechanism compares each part of an input to each other part, so the amount of computation requires grows quadratically with the size of the input. Mobile-Former combines the two architectures, but instead of using self-attention, its transformers compare each part of an input to a small learned vector. This gives the system information about global features without the computational burden. How it works: Mobile-Former is a stack of layers, each made up of three components: a MobileNet block and transformer block joined by a two-way bridge of two attention layers (one for each direction of communication). The MobileNet blocks refine an image representation, the transformer blocks refine a set of six tokens (randomly initiated vectors that are learned over training), and the bridge further refines the image representation according to the tokens and vice versa. The authors trained the system on ImageNet.
Results: Mobile-Former beat competitors at a similar computational budget and at much larger budgets as well. In ImageNet classification, it achieved 77.9 percent accuracy using 294 megaflops (a measure of computational operations), beating transformers that required much more computation. The nearest competitor under 1.5 gigaflops, Swin, scored 77.3 percent using 1 gigaflop. At a comparable budget of 299 megaflops, a variation on the ShuffleNetV2 convolutional network scored 72.6 percent accuracy. Yes, but: The system is not efficient in terms of the number of parameters and thus memory requirements. Mobile-Former-294M encompasses 11.4 million parameters, while Swin has 7.3 million and ShuffleNetV2 has 3.5 million. One reason: Parameters in the MobileNet blocks, transformer blocks, and bridge aren’t shared. Why it matters: Transformers have strengths that have propelled them into an ever wider range of applications. Integrating them with other architectures makes it possible to take advantage of the strengths of both. We're thinking: Using more than six tokens didn’t result in better performance. It appears that the need for attention in image tasks is limited — at least for images of 224x224 resolution. A MESSAGE FROM DEEPLEARNING.AI
We’re highlighting our global deep learner community. Read their stories, get inspired to take the next step in your AI journey, and #BeADeepLearner! Learn more
Standards for Hiring AlgorithmsSome of the world’s largest corporations will use standardized criteria to evaluate AI systems that influence hiring and other personnel decisions. What’s new: The Data and Trust Alliance, a nonprofit group devoted to mitigating tech-induced bias in workplaces, introduced resources for evaluating fairness in algorithms for personnel management. Twenty-two companies have agreed to use them worldwide including IBM, Meta, and Walmart. What it says: Algorithmic Bias Safeguards for Workforce includes a questionnaire for evaluating AI system vendors, a scoring system for comparing one vendor to another, and materials for educating human-resources teams about AI.
Behind the news: Algorithms for hiring and managing employees have been at the center of several high-profile controversies.
Why it matters: Companies need ways to find and retain top talent amid widening global competition. However, worries over biased AI systems have spurred laws that limit algorithmic hiring in New York City and the United Kingdom. Similar regulations in China, the European Union, and the United States may follow. We’re thinking: We welcome consistent standards for AI systems of all kinds. This looks like a good first step in products for human resources.
Transformers See in 3DVisual robots typically perceive the three-dimensional world through sequences of two-dimensional images, but they don’t always know what they’re looking at. For instance, Tesla’s self-driving system has been known to mistake a full moon for a traffic light. New research aims to clear up such confusion. What's new: Aljaž Božic and colleagues at Technical University of Munich released TransformerFusion, which set a new state of the art in deriving 3D scenes from 2D video. Key Insight: The authors teamed two architectures and a novel approach to estimating the positions of points in space:
How it works: Given a series of 2D frames, TransformerFusion learned to reconstruct the 3D space they depicted by classifying whether each 3D pixel, or voxel, belonged (or was very close) to an object’s surface. The authors trained the system on ScanNet, a dataset that contains RGB-D (video plus depth) clips shot in indoor settings like bedrooms, offices, and libraries; object segmentations; and 3D scene reconstructions.
Results: The authors measured distances between TransformerFusion’s estimated points in space and ground truth. They considered an estimation correct if it matched ground truth within 5 centimeters. The system achieved an F-1 score, a balance of precision and recall where higher is better, of 0.655. The best competing method, Atlas, achieved 0.636. Without the 3D CNNs, TransformerFusion achieved 0.361. Yes, but: Despite setting a new state of the art, TransformerFusion’s ability to visualize 3D scenes falls far short of human-level performance. Its scene reconstructions are distorted, and it has trouble recognizing transparent objects. Why it matters: Transformers have gone from strength to strength — in language, 2D vision, molecular biology, and other areas — and this work shows their utility in a new domain. Yet, despite their capabilities, they can’t do the whole job. The authors took advantage of transformers where they could do well and then refined their output using an architecture more appropriate to 3D modeling. We're thinking: Training systems on both low- and high-resolution versions of an image could improve other vision tasks as well.
Work With Andrew Ng
Senior Technical Program Manager: Landing AI seeks a program manager to bridge its team and business partners while executing its engineering programs. The ideal candidate has a strong background in customer relationship management, three years in a direct program-management position, and two years in a technical role. Apply here
Senior Insights Analyst: Woebot Health is looking for an analyst to discover insights among the 2 million messages exchanged with Woebot every week. The ideal candidate has five years of applied experience, ideally working with large amounts of data, as well as mastery of SQL and experience using Python and/or R. Apply here
Senior iOS Engineer: Woebot Health seeks an iOS engineer. The ideal candidate has five years of experience in iOS development including Swift, managing apps in the Apple App Store, leading the implementation of complex iOS features, and mentoring other engineers. Apply here
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|