ÌÇÐÄvlog¹ÙÍø¹Û¿´

Reinforcement Learning from Human Feedback (RLHF)

Video placeholder
Loading...
View Syllabus

Skills You'll Learn

Proximal policy optimization (PPO), Direct preference optimization (DPO), Hugging Face, Instruction-tuning, Reinforcement learning

Reviews

4.3 (72 ratings)

  • 5 stars
    73.61%
  • 4 stars
    6.94%
  • 3 stars
    4.16%
  • 2 stars
    5.55%
  • 1 star
    9.72%

MS

Mar 11, 2025

The course gave me a good understanding of fine-tuning LLMs. It made complex topics easy to learn.

AV

Mar 11, 2025

Very Informative – Covers advanced fine-tuning techniques in a clear and structured way

From the lesson

Fine-Tuning Causal LLMs with Human Feedback and Direct Preference

Taught By

  • Joseph Santarcangelo

    Joseph Santarcangelo

    Ph.D., Data Scientist at IBM

  • Ashutosh Sagar

    Ashutosh Sagar

  • Wojciech 'Victor' Fulmyk

    Wojciech 'Victor' Fulmyk

  • Fateme Akbari

    Fateme Akbari

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.