Hands-On-Large-Language-Models/bonus/8_deepseek_r1.md at ca8d83e0f04caa830c7bd0aa237ed0df94365de5

nandigama/Hands-On-Large-Language-Models

Files

Maarten Grootendorst ca8d83e0f0 Add Agents bonus content and restructure bonus folder (#58 )

2025-04-20 11:23:48 +02:00

2.9 KiB

Raw Blame History

# The Illustrated DeepSeek-R1 (Extends Chapters 12)

In Chapter 12, we go through common techniques for creating and fine-tuning a model, namely language modeling, supervised fine-tuning and preference tuning. This chapter focuses on non-reasoning models and shows how you can fine-tune a model yourself.

The impact of DeepSeek-R1 has been phenomenol as an open-weights LLM rivaling OpenAI's o1 model. DeepSeek-R1 is a reasoning LLM that was released unexpectly.

The Illustrated DeepSeek-R1 explores the model and its training process. It goes through the various steps they use to create a model with such exception capabilities.

Interestingly, the model uses rule-based verifiers to make sure that it's reasoning process follows a certain standard, such as making sure that the code can actually compile:

The architecture is that of a Mixture-of-Experts and with 256 experts (8 activated at a time), quite large:

2.9 KiB Raw Blame History

2.9 KiB

Raw Blame History