糖心vlog官网观看

Reinforcement Learning from Human Feedback (RLHF)

Video placeholder

Loading...

Generative AI Advance Fine-Tuning for LLMs

4.3 (77 ratings)

听触听

7.6K Students Enrolled

Course 14 of 16 in the IBM Generative AI Engineering Professional Certificate

Enroll for Free

"Fine-tuning large language models (LLMs) is essential for aligning them with specific business needs, improving accuracy, and optimizing performance. In today鈥檚 AI-driven world, organizations rely on fine-tuned models to generate precise, actionable insights that drive innovation and efficiency. This course equips aspiring generative AI engineers with the in-demand skills employers are actively seeking. You鈥檒l explore advanced fine-tuning techniques for causal LLMs, including instruction tuning, reward modeling, and direct preference optimization. Learn how LLMs act as probabilistic policies for generating responses and how to align them with human preferences using tools such as Hugging Face. You鈥檒l dive into reward calculation, reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), the PPO trainer, and optimal strategies for direct preference optimization (DPO). The hands-on labs in the course will provide real-world experience with instruction tuning, reward modeling, PPO, and DPO, giving you the tools to confidently fine-tune LLMs for high-impact applications. Build job-ready generative AI skills in just two weeks! Enroll today and advance your career in AI!"

Skills You'll Learn

Prompt Engineering, Natural Language Processing, Performance Tuning, Generative AI, Large Language Modeling, Reinforcement learning

Reviews

4.3 (77 ratings)

5 stars
71.42%
4 stars
9.09%
3 stars
5.19%
2 stars
5.19%
1 star
9.09%

MS

Mar 11, 2025

The course gave me a good understanding of fine-tuning LLMs. It made complex topics easy to learn.

AV

Mar 11, 2025

Very Informative 鈥� Covers advanced fine-tuning techniques in a clear and structured way

From the lesson

Fine-Tuning Causal LLMs with Human Feedback and Direct Preference

In this module, you鈥檒l describe the applications of large language models (LLMs) to generate policies and probabilities for generating responses based on the input text. You鈥檒l also gain insights into the relationship between the policy and the language model as a function of omega to generate possible responses. Further, this module will demonstrate how to calculate rewards using human feedback incorporating reward function, train response samples, and evaluate agent鈥檚 performance. You鈥檒l also define the scoring function for sentiment analysis using PPO with Hugging Face. You鈥檒l also explain the PPO configuration class for specific models and learning rate for PPO training and how the PPO trainer processes the query samples to optimize the chatbot鈥檚 policies to get high-quality responses. This module delves into direct preference optimization (DPO) concepts to provide optimal solutions for the generated queries based on human preferences more directly and efficiently using Hugging Face. The labs in this module provide hands-on practice on human feedback and DPO. Methods like PPO and reinforcement learning are quite involved and could be considered subjects of study on their own. While we have provided some references for those interested, you are not expected to understand them in depth for this course

Large Language Models (LLMs) as Distributions7:15

From Distributions to Policies3:53

Reinforcement Learning from Human Feedback (RLHF)7:32

Proximal Policy Optimization (PPO)5:05

PPO with Hugging Face3:45

PPO Trainer5:43

Taught By

Joseph Santarcangelo
Ph.D., Data Scientist at IBM
Ashutosh Sagar
Wojciech 'Victor' Fulmyk
Fateme Akbari

Try the Course for Free

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.