Dear friends,
Trump and the Republican party chalked up huge wins this week. Did manipulation of social media by generative AI play any role in this election? While many have worried about AI creating fake or misleading content that influences people, generative AI has probably not been the primary method of manipulation in this election cycle. Instead, I think a bigger impact might have been the “amplification effect” where software bots — which don’t have to rely heavily on generative AI — create fake engagement (such as likes/retweets/reshares), leading social media companies’ recommendation algorithms to amplify certain content to real users, some of whom promote it to their own followers. This is how fake engagement leads to real engagement.
The bottleneck to disinformation is not creating it but disseminating it. It is easy to write text that proposes a certain view, but hard to get many people to read it. Rather than generating a novel message (or using deepfakes to generate a misleading image) and hoping it will go viral, it might be easier to find a message written by a real human that supports a point of view you want to spread, and use bots to amplify that.
Keep learning! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn the principles of effective data engineering in this four-course professional certificate taught by Joe Reis. Develop your skills in the data engineering lifecycle and gain hands-on experience building data systems on Amazon Web Services. Earn a certificate upon completion! Enroll today
NewsClaude Controls ComputersAPI commands for Claude Sonnet 3.5 enable Anthropic’s large language model to operate desktop apps much like humans do. Be cautious, though: It’s a work in progress. What’s new: Anthropic launched API commands for computer use. The new commands prompt Claude Sonnet 3.5 to translate natural language instructions into commands that tell a computer to open applications, fetch data from local files, complete forms, and the like. (In addition, Anthropic improved Claude Sonnet 3.5 to achieve a state-of-the-art score on the SWE-bench Verified coding benchmark and released the faster, cheaper Claude Haiku 3.5, which likewise shows exceptional performance on coding tasks.) How it works: The commands for computer use don’t cost extra on a per-token basis, but they may require up to 1,200 additional tokens and run repeatedly until the task at hand is accomplished, consuming more input tokens. They’re available via Anthropic, Amazon Bedrock, and Google Vertex.
Yes, but: The current version of computer use is experimental, and Anthropic acknowledges various limitations. The company strongly recommends using these commands only in a sandboxed environment, such as a Docker container, with limited access to the computer’s hard drive and the web to protect sensitive data and core system files. Anthropic restricts the ability to create online accounts or post to social media or other sites (but says it may lift this restriction in the future). Behind the news: Several companies have been racing to build models that can control desktop applications. Microsoft researchers recently released OmniParser, a tool based on GPT-4V that identifies user-interface elements like windows and buttons within screenshots, potentially making it easier for agentic workflows to navigate computers. In July, Amazon hired staff and leaders from Adept, a startup that trained models to operate computer applications. (Disclosure: Andrew Ng sits on Amazon’s board of directors.) Open Interpreter is an open-source project that likewise uses a large language model to control local applications like image editors and web browsers. Why it matters: Large multimodal models already use external tools like search engines, web browsers, calculators, calendars, databases, and email. Giving them control over a computer’s visual user interface may enable them to automate a wider range of tasks we use computers to perform, such as creating lesson plans and — more worrisome — taking academic tests. We’re thinking: Controlling computers remains hard. For instance, using AI to read a screenshot and pick the right action to take next is very challenging. However, we’re confident that this capability will be a growth area for agentic workflows in coming years.
Robots On the Loading DockShipping ports are the latest front in the rising tension between labor unions and AI-powered automation. What’s new: Autonomous vehicles, robotic cranes, and computer vision systems increasingly manage the flow of goods in and out of ports worldwide. Dockworkers in the United States are worried that such technology threatens their livelihoods, The Wall Street Journal reported. How it works: Automation boosts the number of containers a port can move per hour from vessel to dock. For instance, Shanghai’s Yangshan Deep Water Port, one of the world’s most automated ports, moves more than 113 containers per hour, while Oakland, California’s less-automated port moves around 25 containers per hour, according to a report by S&P Global Market Intelligence for the World Bank.
Dockworkers disagree: Harold Daggett, leader of the International Longshoremen’s Association, a union that negotiates on behalf of dockworkers, vowed to fight port automation, which he sees as a pretext to eliminate jobs. He has proposed that members of unions internationally refuse work for shipping companies that use automated equipment. Fresh from a three-day strike in early October, longshoremen will return to negotiations with shipping companies in mid-January. Why it matters: Ports are one of many work environments where AI is bringing down costs while improving throughput. In many such situations, humans can continue to perform tasks that machines don’t do well. But where human jobs are at risk, society must determine the most productive path. Dockworkers, through their unions, have significant power in this equation. A protracted U.S. dockworker strike risks economic losses of up to $7.5 billion a week. On the other hand, automation could bring tremendous gains in safety, speed, and economic efficiency. We’re thinking: We are very sympathetic to workers’ rights. Yet we also believe that more-efficient ports will boost commerce, creating many new jobs. As traditional roles change, workers need opportunities to learn new skills and adapt to the evolving job market. Society has a responsibility to provide a safety net as well as training and education for those whose jobs are threatened by automation.
Does Your Model Comply With the AI Act?A new study suggests that leading AI models may meet the requirements of the European Union’s AI Act in some areas, but probably not in others. What’s new: The Zurich-based startup LatticeFlow, working with research institutions in Bulgaria and Switzerland, developed COMPL-AI, an unofficial framework designed to evaluate large language models’ likely compliance with the AI Act. A leaderboard ranks an initial selection of models. (LatticeFlow does not work for the European Commission or have legal standing to interpret the AI Act.) How it works: A paper explains how COMPL-AI maps the AI Act’s requirements to specific benchmarks. It evaluates each requirement using new or established tests and renders an aggregate score. These scores are relative measures, and the authors don’t propose thresholds for compliance. The assessment covers five primary categories:
Results: The authors evaluated nine open models and three proprietary ones on a scale between 0 and 1. Their reports on each model reveal considerable variability. (Note: The aggregate scores cited in the reports don’t match those in the paper.)
Yes, but: The authors note that some provisions of the AI Act, including explainability, oversight (deference to human control), and corrigibility (whether an AI system can be altered to change harmful outputs, which bears on a model’s risk classification under the AI Act), are defined ambiguously under the law and can’t be measured reliably at present. These areas are under-explored in the research literature and lack benchmarks to assess them. Why it matters: With the advent of laws that regulate AI technology, developers are responsible for assessing a model’s compliance before they release it or use it in ways that affect the public. COMPL-AI takes a first step toward assuring model builders that their work is legally defensible or else alerting them to flaws that could lead to legal risk if they’re not addressed prior to release. We’re thinking: Thoughtful regulation of AI is necessary, but it should be done in ways that don’t impose an undue burden on developers. While the AI Act itself is overly burdensome, we’re glad to see a largely automated path to demonstrating compliance of large language models.
When Agents Train AlgorithmsCoding agents are improving, but can they tackle machine learning tasks? What’s new: Chan Jun Shern and colleagues at OpenAI introduced MLE-bench, a benchmark designed to test how well AI coding agents do in competitions hosted by the Kaggle machine learning contest platform. The benchmark is available here. Agentic framework basics: An agentic framework or scaffold consists of a large language model (LLM) and code to prompt the model to follow a certain procedure. It may also contain tools the LLM can use, such as a Python console or web browser. For example, given a problem to solve, a framework might prompt the model to generate code, run the code in the Python console, generate evaluation code, run evaluation code, change the solution based on the console’s output, and repeat until the problem is solved. How it works: MLE-bench is an offline competition environment that contains 75 Kaggle competitions selected manually by the authors, such as contests to identify toxic comments and predict volcanic eruptions. Each competition includes a description, training and testing datasets, code to grade submissions, a leaderboard of human contestants for comparison with an agent’s performance, and a “complexity” rating (produced by OpenAI): low (takes an experienced human less than two hours to code a solution, not including training time), medium (between two and 10 hours), or high (more than 10 hours). Given a competition, an agent must produce a submission by (i) generating code to train a machine learning model and (ii) running the model on the test set. Users grade the submission to evaluate the agent’s performance.
Results: The authors evaluated agent performance according to Kaggle’s standards for awarding medals to human contestants (described in the final bullet below).
Yes, but: The percentage of medals won by agents in this study is not comparable to percentages of medals won by humans on Kaggle. The authors awarded medals for excellent performance in all competitions included in the benchmark, but Kaggle does not. The authors didn’t tally the agents’ win rate for only competitions in which Kaggle awarded medals. Why it matters: It’s important to evaluate the abilities of coding agents to solve all kinds of programming problems. Machine learning tasks are especially valuable as they bear on the ability of software to analyze unstructured data and adapt to changing conditions. We’re thinking: We’re glad to see machine learning catching on among humans and machines alike!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|