The Batch top banner - February 14, 2024

Dear friends,

The rise of cloud-hosted AI software has brought much discussion about the privacy implications of using it. But I find that users, including both consumers and developers building on such software, don’t always have a sophisticated framework for evaluating how software providers store, use, and share their data. For example, does a company’s promise “not to train on customer data” mean your data is private?

Here is a framework for thinking about different levels of privacy on cloud platforms, from less to more:

No Guarantees: The company provides no guarantees that your data will be kept private. For example, an AI company might train on your data and use the resulting models in ways that leak it. Many startups start here but add privacy guarantees later when customers demand them.
No Outside Exposure: The company does not expose your data to outsiders. A company can meet this standard by not training on your data and also by not posting your data online. Many large startups, including some providers of large language models (LLMs), currently operate at this level.
Limited Access: In addition to safeguards against data leakage, no humans (including employees, contractors, and vendors of the company) will look at your data unless they are compelled via a reasonable process (such as a subpoena or court order, or if the data is flagged by a safety filter). Many large cloud companies effectively offer this level of privacy, whether or not their terms of service explicitly say so.
No Access: The company cannot access your data no matter what. For example, data may be stored on the customer’s premises, so the company doesn’t have access to it. If I run an LLM on my private laptop, no company can access my prompts or LLM output. Alternatively, if data is used by a SaaS system, it might be encrypted before it leaves the customer’s facility, so the provider doesn’t have access to an unencrypted version. For example, when you use an end-to-end encrypted messaging app such as Signal or WhatsApp, the company cannot see the contents of your messages (though it may see “envelope” information such as sender and recipient identities and the time and size of the message).

These levels may seem clear, but there are many variations within a given level. For instance, a promise not to train on your data can mean different things to different companies. Some forms of generative AI, particularly image generators, can replicate their training data, so training a generative AI algorithm on customer data may run some risk of leaking it. On the other hand, tuning a handful of an algorithm’s hyperparameters (such as learning rate) to customer data, while technically part of the training process, is very unlikely to result in any direct data leakage. So how the data is used in training will affect the risk of leakage.

Similarly, the Limited Access level has its complexities. If a company offers this level of privacy, it’s good to understand exactly under what circumstances its employees may look at your data. And if they might look at your data, there are shades of gray in terms of how private the data remains. For example, if a limited group of employees in a secure environment can see only short snippets that have been disassociated from your company ID, that’s more secure than if a large number of employees can freely browse your data.

In outlining levels of privacy, I am not addressing the question of security. To trust a company to deliver a promised level of privacy is also to trust that its IT infrastructure is secure enough to keep that promise.

Over the past decade, cloud hosted SaaS software has gained considerable traction. But some customers insist on running on-prem solutions within their own data centers. One reason is that many SaaS providers offer only No Guarantees or No Outside Exposure, but many customers’ data is so sensitive that it requires Limited Access.

I think it would be useful for our industry to have a more sophisticated way to talk about privacy and help users understand what guarantees providers do and do not deliver.

As privacy becomes a global topic, regulators are stepping in, and this is adding further complexity to tech businesses. For example, if one jurisdiction changes the definition of a child from someone under 13 to anyone under 18, that might require changes to how you store data of individuals age 13 to 18; but who has time to keep track of such changes?

I've been delighted to see that here, AI can help. Daphne Li, CEO of Commonsense Privacy (disclosure: a portfolio company of AI Fund), is using large language models to help companies systematically evaluate, and potentially improve, their privacy policies as well as keep track of global regulatory changes. In the matter of privacy, as in other areas, I hope that the title of my TED AI talk — “AI Isn’t the Problem, It’s the Solution” — will prove to be true.

Keep learning!

Andrew

P.S. Check out our new short course with Amazon Web Services on “Serverless LLM Apps With Amazon Bedrock,” taught by Mike Chambers. A serverless architecture enables you to quickly deploy applications without needing to set up and manage compute servers to run your applications on, often a full-time job in itself. In this course, you’ll learn how to implement serverless deployment by building event-driven systems. We illustrate this approach via an application that automatically detects incoming customer inquiries, transcribes them with automatic speech recognition, summarizes them with an LLM using Amazon Bedrock, and runs serverless with AWS Lambda. We invite you to enroll here!

News

Ancient Scrolls Recovered

Three researchers decoded scrolls that had gone unread since they were turned into charcoal by the eruption of Mount Vesuvius in the year 79.

What’s new: Youssef Nader, Luke Farritor, and Julian Schilliger used neural networks to win the $700,000 grand prize in the Vesuvius Challenge, a competition to translate charred papyrus scrolls found in the ruins of a villa at the Roman town of Herculaneum in southern Italy.

How it works: The volcanic eruption covered Herculaneum in ash. It also transformed into carbon the papyrus scrolls, which originally would have unrolled to lengths as long as 30 feet.

Competitors were given extremely high-resolution, three-dimensional X-ray scans of four intact scrolls; like CT scans, each scan comprised a series of 2D cross sections. An application developed by researchers who have been working to decipher the scrolls virtually unwrapped the 3D scans into 2D images of the scroll surfaces and segmented them into individual papyrus sheets.
Examining the resulting images by eye, a member of a different team noticed faint patterns of cracks and lines that suggested Greek letters. He uploaded his findings, which prompted Farritor to take up the search.
Having identified traces of ink in one of the scrolls, Farritor trained a ResNet to recognize 64x64-pixel patches of the sheet images that showed similar traces. The initial model revealed more ink traces, which were added to the training set; the retrained model found more, which joined the training set, and so on. The model enabled Farritor to render 10 legible letters, winning an intermediate prize.
Building on Farritor’s approach, the team trained three models on fragments of other scrolls to recognize patches that showed signs of ink. They selected the 3D architectures TimeSformer, Resnet3D-101, and I3D to capture ink residue that rose above the carbonized papyrus surface. The clearest images came from TimeSformer. The team manually compared TimeSformer’s images with those produced by the other two models to ensure that TimeSformer didn’t misclassify patches as having ink when it wasn’t there.
Working on one of the four scrolls (the other three having proven more difficult to scan, unwrap, and segment), the team rendered readable 85 percent of the presumed characters in four 140-character passages — thus satisfying the grand-prize criteria. They also rendered 11 additional passages for a total of more than 2,000 characters, or roughly 5 percent of the scroll. The rendered text appears to express Epicurean philosophy that praises the virtues of pleasure.

Behind the news: The Vesuvius Challenge launched in March 2023 with funding provided by GitHub CEO Nat Friedman.

Smaller prizes were awarded to researchers who deciphered single words and shorter passages. Notably, these early prizewinners included Nader and Farritor, who then teamed with Schilliger.
In its next round, the competition is offering $100,000 to the first team to decipher 90 percent of all four scrolls that have been imaged so far.
The library at Herculaneum includes 800 scrolls already recovered and potentially thousands more still to be excavated. Reading them all would make this library one of the largest collections of texts recovered from the ancient world.

Why it matters: The winning team’s achievement testifies to the ability of deep learning to help solve difficult problems. And their work may have broader significance: Recovering the entire Herculaneum library could provide insights into literature, philosophy, history, science, and art at the time of Caesar.

We’re thinking: University of Kentucky computer scientist Brent Seales, who helped design the contest as well as pioneering the use of medical imaging and machine learning to read ancient texts, reckons that over 1,000 teams worked on the problem, amounting to 10 person-years and two compute-years. It's a great example of the power of global collaboration and open resources — central facets of the AI community — to find solutions to hard problems.

U.S. Restricts AI Robocalls

The United States outlawed unsolicited phone calls that use AI-generated voices.

What’s new: The Federal Communications Commission ruled that the current legal restriction on voice communications that use “artificial or prerecorded voices” covers AI-powered voice generation. The ruling followed an incident in which calls that featured the cloned voice of U.S. President Biden were delivered with the apparent intent of interfering with an election.

How it works: The ruling interprets the 1991 Telephone Consumer Protection Act, which controls automated calls, or robocalls. The law gives state attorneys general the power to prosecute robocallers. The FCC had proposed the move in January.

The ruling restricts calls to residential phone numbers that feature synthesized voices.
It covers all such calls whether or not they are deceptive or annoying, and applies equally to synthetic voices that mimic real voices and those that don’t.
Synthetic voices are allowed only if the recipient has given prior consent or in an emergency. Calls that convey an advertisement or marketing message must provide an opportunity to opt out, and those that feature artificial voices must identify the entity that initiated the call.

Behind the news: In January, two days before a presidential primary election, thousands of members of the Democratic Party in the state of New Hampshire received phone calls in which a cloned voice of President Biden, who is a Democrat, urged them not to vote in a presidential primary election. The call used voice cloning software from Eleven Labs, according to researchers cited by Wired. New Hampshire investigated the calls as a case of illegal voter suppression. It traced them to two telecommunications companies in the state of Texas and issued cease-and-desist orders and subpoenas to both firms. One, Lingo Telecom, said it is cooperating with federal and state investigators.

Why it matters: For all its productive uses, generative AI offers fresh opportunities to scammers to deceive their marks into handing over things of value. Voice cloning can elevate their appeal by simulating personal, business, political, and other relationships. Election officials are especially concerned about AI’s potential to influence voting, especially as we enter a year that will see over 100 elections in seven of the 10 most populous nations.

We’re thinking: The question of how to safeguard elections against manipulations like the Biden robocall is an urgent one. Devising a tamper-resistant watermark that identifies generated output would discourage misuse. However, providers will have a financial incentive not to apply such watermarks unless regulators require it.

A MESSAGE FROM DEEPLEARNING.AI

Serverless LLM Apps with Amazon Bedrock short course promotional banner

In our latest course, built in collaboration with Amazon Web Services, you’ll learn to deploy applications based on large language models using a serverless architecture. This enables rapid deployment and liberates you from the complexities of managing and scaling infrastructure. Start today!

GPU Data Centers Strain Grid Power

The AI boom is taxing power grids and pushing builders of data centers to rethink their sources of electricity.

What’s new: New data centers packed with GPUs optimized for AI workloads are being approved at a record pace, The Information reported. The extreme energy requirements of such chips are pushing builders to place data centers near inexpensive power sources, which may be far away from where users live.

How it works: The coming generation of GPU data centers promises to supply processing power for the burgeoning AI era. But builders aren’t always able to find electricity to run them.

In the data center hub of Northern Virginia, power company Dominion Energy temporarily ceased connecting new data centers for three months in 2022. It warned that future connections would be in question until 2026.
Although many data center operators pledged to rely on energy sources other than fossil fuels, their rising demand for power has made that difficult, Bloomberg reported. Regulators in Virginia considered allowing data centers to use diesel generators before they abandoned that plan under pressure from environmental groups. In Kansas City, Missouri, Meta’s apparent plan to build a giant data center helped convince one utility to postpone the planned retirement of a coal plant.
Some companies that rely on data centers are looking into less conventional power sources. Microsoft is considering small, modular nuclear reactors that, while largely speculative, promise to be less expensive and more flexible than traditional nuclear power plants. Microsoft recently appointed a director of nuclear technologies.

What they’re saying: “We still don’t appreciate the energy needs of [AI] technology. There's no way to get there without a breakthrough.” — Sam Altman, CEO, OpenAI, on January 16, 2024, quoted by Reuters.

Behind the news: Data centers alone account for 1 to 1.5 percent of global demand for electricity. It’s unclear how much of that figure is attributable to AI, but the share is likely to grow.

Why it matters: The world needs innovation in both energy resources and power-efficient machine learning. The dawning era of pervasive AI brings with it the challenge of producing energy to develop and deploy the technology, which can contribute to pollution that disrupts ecosystems and accelerates climate change. Fortunately, AI can shrink the environmental footprint of some energy-intensive activities; for example, searching the web for information generates far less CO² emissions than driving to a library.

We’re thinking: Climate change is a slow-motion tragedy. We must push toward AI infrastructure that uses less energy (for example, by using more efficient algorithms or hardware) and emits less carbon (for example, by using renewable sources of energy). That said, concentrating computation in a data center creates a point of significant leverage for optimizing energy usage. For example, it’s more economical to raise the energy efficiency of 10,000 servers in a data center than 10,000 PCs that carry out the same workload in 10,000 homes.

Better Images, Less Training

The longer text-to-image models train, the better their output — but the training is costly. Researchers built a system that produced superior images after far less training.

What's new: Independent researcher Pablo Pernías and colleagues at Technische Hochschule Ingolstadt, Université de Montréal, and Polytechnique Montréal built Würstchen, a system that divided the task of image generation between two diffusion models.

Diffusion model basics: During training, a text-to-image generator based on diffusion takes a noisy image and a text embedding. The model learns to use the embedding to remove the noise in successive steps. At inference, it produces an image by starting with pure noise and a text embedding, and removing noise iteratively according to the text embedding. A variant known as a latent diffusion model uses less processing power by removing noise from a noisy image embedding instead of a noisy image.

Key insight: A latent diffusion model typically learns to remove noise from an embedding of an input image based solely on a text prompt. It can learn much more quickly if, in addition to the text prompt, a separate model supplies a smaller, noise-free version of the image embedding. Working as a system, the two models can learn their tasks in a fraction of the usual time.

How it works: Würstchen involves three components that required training: the encoder-decoder from VQGAN, a latent diffusion model based on U-Net, and another latent diffusion model based on ConvNeXt. The authors trained the models separately on subsets of LAION-5B, which contains matched images and text descriptions scraped from the web.

The authors trained VQGAN to reproduce input images. The encoder produced embeddings, to which the authors added noise.
To train U-Net, the authors used EfficientNetV2 (a convolutional neural network pretrained on ImageNet) to produce embeddings one-tenth the size of the VQGAN embeddings. Given this smaller embedding, a noisy VQGAN embedding, and a text description, U-Net learned to remove noise from the VQGAN embedding.
To train ConvNeXt, EfficientNetV2 once again produced small embeddings from input images, to which the authors added noise. Given a noisy EfficientNetV2 embedding and a text description, ConvNeXt learned to remove the noise.
At inference, the components worked in opposite order of training: (i) Given noise and a text prompt, ConvNeXt produced a small EfficientNetV2-sized image embedding. (ii) Given that embedding, noise, and the same text prompt, U-Net produced a larger VQGAN-sized embedding. (iii) Given the larger embedding, VQGAN produced an image.

Results: The authors compared Würstchen (trained on subsets of LAION-5B for 25,000 GPU hours) to Stable Diffusion 2.1 (trained on subsets of LAION-5B for 200,000 GPU hours). The authors generated images based on captions from MS COCO (human-written captions of pictures scraped from the web) and Parti-prompts (human-written captions designed to reflect common prompts for generative models). They asked 90 people which output they preferred. The judges expressed little preference regarding renderings of MS COCO captions: They chose Würstchen 41.3 percent of the time, Stable Diffusion 40.6 percent of the time, and neither 18.1 percent of the time. However, presented with the results of Parti-prompts, they preferred Würstchen 49.5 percent of the time, Stable Diffusion’s 32.8 percent of the time, and neither 17.7 percent of the time.

Why it matters: Training a latent diffusion model to denoise smaller embeddings accelerates training, but this tends to produce lower-quality images. Stacking two diffusion models — one to generate smaller embeddings, and the other to generate larger embeddings based on the smaller ones — enabled Würstchen to match or exceeded the output quality of models with large embeddings, while achieving the training speed of models with small embeddings.

We're thinking: Stacking models in this fashion could be useful in training video generators. They could use a speed boost because video is much more data-intensive than still images.

A MESSAGE FROM FOURTHBRAIN

FourthBrain LLMOps workshop promo banner

Join us for an intensive three-week workshop where you’ll learn to operationalize large language models at scale, from learning advanced retrieval augmented generation (RAG) to deploying multimodal applications on the cloud. Register today!

Work With Andrew Ng

Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.

Subscribe and view previous issues here.

Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.