Dear friends,
Large language models, or LLMs, have transformed how we process text. Large vision models, or LVMs, are starting to change how we process images as well. But there is an important difference between LLMs and LVMs:
This week, Dan Maloney and I announced Landing AI's work on developing domain-specific LVMs. You can learn more about it in this short video (4 minutes).
While some pathology and some manufacturing images can be found on the internet, their relative scarcity means that most generic LVMs do poorly at recognizing the most important features in such images.
In experiments conducted by Landing AI's Mark Sabini, Abdelhamid Bouzid, and Bastian Renjifo, LVMs adapted to images of a particular domain, such as pathology or semiconductor wafer inspection, do much better at finding relevant features in images of that domain. Building these LVMs can be done with around 100,000 unlabeled images from that domain, and larger datasets likely would result in even better models.
Further, if you use a pretrained LVM together with a small labeled dataset to tackle a supervised learning task, a domain specific LVM needs significantly less (around 10 percent to 30 percent as much) labeled data to achieve performance comparable to using a generic LVM.
Keep learning! Andrew
News
Amazon Joins Chatbot FrayAmazon launched a chatbot for large companies even as internal tests indicated potential problems. What’s new: Amazon introduced Q, an AI-powered assistant that enables employees to query documents and corporate systems. Days later, the tech newsletter Platformer obtained internal documents that indicate the model can generates falsehood and leak confidential information. (Amazon Q is not to be confused with OpenAI Q*.) How it works: Currently available as a free preview, Q analyzes private documents, databases, and code to answer questions, generate content, and take actions. Amazon plans to offer two tiers of service: a basic chatbot ($20 per month) and the chatbot plus code generation, troubleshooting, security evaluation, and human assistance from Amazon Web Services ($25 per month). Amazon promises not to train machine learning models on Q users’ data. The issues: Three days after Amazon unveiled Q, employees began to flag issues on internal Slack and security reporting channels.
Behind the news: Amazon is not the only major AI company whose chatbot has leaked private information. Google researchers recently found that they could prompt OpenAI’s ChatGPT to divulge personal information found in its training data. Why it matters: For Amazon, issues with a newly released system are a bump in the road to competing effectively against competitors like Microsoft Copilot and ChatGPT Enterprise. For developers, it’s a sobering reminder that when you move fast, what breaks may be your own product. We’re thinking: In developing an AI system, often it’s necessary to launch — in a safe and responsible way — and make improvements based on real-world performance. We congratulate the Q team on getting the product out and look forward to seeing where they take it.
Seeing Darker-Skinned PedestriansIn a study, models used to detect people walking on streets and sidewalks performed less well on adults with darker skin and children of all skin tones. What’s new: Xinyui Li, Zhenpeng Chen, and colleagues at Peking University, University College London, and King’s College London evaluated eight widely used object detectors for bias with respect to skin color, age, and gender. Key insight: When it comes to detecting pedestrians, biases with respect to demographic characteristics can be a life-and-death matter. Evaluating them requires a dataset of pedestrians labeled according to characteristics that might influence detection. Skin color, age, and gender are important human differences that can affect a vision model’s performance, especially depending on lighting conditions. How it works: The authors collected over 8,000 photos from four datasets of street scenes. They annotated each image with labels for skin tone (light or dark), age group (child or adult), and gender (male or female). They tested four general-purpose object detectors: YOLOX, RetinaNet, Faster R-CNN, and Cascade R-CNN — and four pedestrian-specific detectors — ALFNet, CSP, MGAN, and PRNet — on their dataset. They evaluated performance between perceived skin tone, age, and gender groups and under different conditions of brightness, contrast, and weather. Results: The study revealed significant fairness issues related to skin tone and age.
Behind the news: Previous work has shown that computer vision models can harbor biases that make them less likely to recognize individuals of certain types. In 2019, MIT showed that commercial face recognition performed worse on women and darker skinned individuals. A plethora of work evaluates bias in datasets typically used to train vision models. Why it matters: As more road vehicles gain self-driving capabilities and as expanded robotaxi services come to major cities, a growing number of pedestrians’ lives are in the hands of computer vision algorithms. Auto makers don’t disclose what pedestrian detection systems they use or the number of real-world accidents involving self-driving cars. But co-author Jie Zhang claims that the proprietary systems used in self-driving cars are “usually built upon the existing open-source models,” and “we can be certain that their models must also have similar issues.” We’re thinking: Computer vision isn’t the only technology used by self-driving cars to detect objects. Most self-driving car manufacturers rely on lidar and radar in addition to cameras. Those technologies are blind to color and gender differences and, in the view of many engineers, make better choices for this application.
A MESSAGE FROM DEEPLEARNING.AIWant to learn how to fine-tune large language model-based agents? In our upcoming webinar with Weights and Biases, you’ll gain insights and techniques to enhance agent performance and specificity in automating applications. Register now
Limits on AI in Life InsuranceThe U.S. state of Colorado started regulating the insurance industry’s use of AI. What’s new: Colorado implemented the first law that regulates use of AI in life insurance and proposed extending the limits to auto insurers. Other states have taken steps to rein in both life and auto insurers under earlier statutes. How it works: States are responsible for regulating the insurance industry in the U.S. Colorado’s rules limit kinds of data life insurers can use and how they can use it. They took effect in November based on a law passed in 2021.
Other states: California ordered all insurers to notify regulators when their algorithm results in an increase to a customer’s premium; regulators can then evaluate whether the effect of the rate increase is excessive and/or discriminatory. Agencies in Connecticut and New York ordered all insurers to conform their use of AI with laws against discrimination. Washington D.C. opened an investigation to determine whether auto insurers’ use of data resulted in outcomes that discriminated against certain groups. Behind the news: Colorado shared an initial draft of its life-insurance regulations earlier this year before revising it. Among other changes, the initial draft prohibited AI models that discriminate not only on the basis of race but with respect to all protected classes; prevent unauthorized access to models; create a plan to respond to unforeseen consequences of their models; and engage outside experts to audit their models. The final draft omits these requirements. Why it matters: Regulators are concerned that AI could perpetuate existing biases against marginalized groups, and Colorado’s implementation is likely to serve as a model for further regulation. Insurance companies face a growing number of lawsuits over claims that their algorithms wrongfully discriminate by age or race. Regulation could mitigate potential harms and ease customers’ concerns. We’re thinking: Reporting of models that use social posts, purchases, and the like is a good first step, although we suspect that further rules will be needed to govern the complexities of the insurance business. Other states’ use of Colorado's regulations as a blueprint would avoid a state-by-state patchwork of contradictory regulations.
Robot, Find My KeysResearchers proposed a way for robots to find objects in households where things get moved around. What's new: Andrey Kurenkov and colleagues at Stanford University introduced Node Edge Predictor, a model that learned to predict where objects were located in houses. Key insight: A popular way to represent objects and their locations is a graph, in which each node is either an object or its location and an edge connects the two. A recurrent model could predict the locations of objects using a separate graph to represent each time step, but that would require many graphs. Instead, the model can predict locations using a single graph in which each edge includes the time elapsed since the associated object was seen in the associated location. The model learns to predict the next most likely place to find an object based on the object’s most recent, frequent, and longstanding locations. How it works: The authors simulated a robot looking for things in a household. They built (i) a simulator of houses, object locations, and when and where they moved; (ii) a graph that represented a house containing objects; and (iii) a machine learning system that predicted where objects might be found.
Results: The authors tested their system’s ability to find a single object in a house versus a few baseline methods. The baselines included random guessing, always guessing the piece of furniture where the object was last seen, and a Bayesian model that guessed whether the object was on/in a given piece of furniture based on the percentage of times it had been seen there. On average, their system found the object in 3.2 attempts, while the next best model (Bayesian) took 3.6 attempts. Guessing the last-seen location required 6.0 attempts, and random guessing required 8.8 attempts. Why it matters: Feature engineering is about figuring out the best way to represent data so a model can learn from it. In this work, engineering time-related features (such as the time elapsed since an object was on a piece of furniture or the number of times an object was observed on a piece of furniture over time) enabled a non-recurrent model to learn how graphs change over time. We’re thinking: A physical robot likely would use object detection on its camera feed instead of a simulator that told it directly which objects were associated with which pieces of furniture. We look forward to future work that proves the concept using this more realistic setup.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|