New course with Google Cloud: Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback course promo banner

Dear Chris,

We're thrilled to announce a new course on Reinforcement Learning from Human Feedback (RLHF) built in collaboration with Google Cloud.

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences. Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning LLMs to make them more helpful, honest, and safe.

In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will:

Explore the two datasets (“preference” and “prompt”) that are used in RLHF training.
Use the open source Google Cloud Pipeline Components Library to fine-tune the Llama 2 model with RLHF.
Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Join instructor Nikita Namjoshi, Developer Advocate for Generative AI at Google Cloud, to learn this exciting technique you can use in building your own applications.

Enroll now!

Keep learning,

The DeepLearning.AI team