rasbt/LLMs-from-scratch

每日信息看板 · 2026-02-04

返回当天 Daily Index

开源项目

AI 总结

GitHub 项目 rasbt/LLMs-from-scratch 提供从零实现、预训练与微调 GPT 类大模型的配套代码与笔记本，帮助读者在普通硬件上理解并复现 LLM 训练流程，便于学习与实验扩展。

作为《Build a Large Language Model (From Scratch)》官方代码仓库，覆盖从数据处理到训练、生成、微调全流程
按章节提供 Jupyter notebooks 与脚本：注意力机制、GPT 实现、无监督预训练、分类微调、指令微调等
强调“从零实现”且主要基于 PyTorch，不依赖外部 LLM 库，适合建立底层机制理解
支持加载更大预训练模型权重进行微调，并包含 LoRA 等参数高效微调附录
提供大量加餐材料：BPE tokenizer、KV cache、MoE、GQA、滑窗注意力、性能与内存优化等
配套 17+ 小时视频课程与习题/测验资源，适合代码跟学与自测

#GitHub #repo #开源项目 #LLM #GPT #PyTorch #LoRA

原链接

内容摘录

Build a Large Language Model (From Scratch)

This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch).

<br>
<br>

<a href="https://amzn.to/4fqvn0D"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover.jpg?123" width="250px"></a>

<br>

In *Build a Large Language Model (From Scratch)*, you'll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I'll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.
Link to the official source code repository
Link to the book at Manning (the publisher's website)
Link to the book page on Amazon.com
ISBN 9781633437166

<a href="http://mng.bz/orYv#reviews"><img src="https://sebastianraschka.com//images/LLMs-from-scratch-images/other/reviews.png" width="220px"></a>

<br>
<br>

To download a copy of this repository, click on the Download ZIP button or execute the following command in your terminal:

<br>

(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at https://github.com/rasbt/LLMs-from-scratch for the latest updates.)

<br>
<br>
Table of Contents

Please note that this README.md file is a Markdown (.md) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, Ghostwriter is a good free option.

You can alternatively view this and other files on GitHub at https://github.com/rasbt/LLMs-from-scratch in your browser, which renders Markdown automatically.

<br>
<br>
**Tip:**
If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the README.md file located in the setup directory.

<br>
<br>

Code tests Linux
Code tests Windows
Code tests macOS

| Chapter Title | Main Code (for Quick Access) | All Code + Supplementary |
|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
| Setup recommendations <br/>How to best read this book | - | - |
| Ch 1: Understanding Large Language Models | No code | - |
| Ch 2: Working with Text Data | - ch02.ipynb<br/>- dataloader.ipynb (summary)<br/>- exercise-solutions.ipynb | ./ch02 |
| Ch 3: Coding Attention Mechanisms | - ch03.ipynb<br/>- multihead-attention.ipynb (summary) <br/>- exercise-solutions.ipynb| ./ch03 |
| Ch 4: Implementing a GPT Model from Scratch | - ch04.ipynb<br/>- gpt.py (summary)<br/>- exercise-solutions.ipynb | ./ch04 |
| Ch 5: Pretraining on Unlabeled Data | - ch05.ipynb<br/>- gpt_train.py (summary) <br/>- gpt_generate.py (summary) <br/>- exercise-solutions.ipynb | ./ch05 |
| Ch 6: Finetuning for Text Classification | - ch06.ipynb <br/>- gpt_class_finetune.py <br/>- exercise-solutions.ipynb | ./ch06 |
| Ch 7: Finetuning to Follow Instructions | - ch07.ipynb<br/>- gpt_instruction_finetuning.py (summary)<br/>- ollama_evaluate.py (summary)<br/>- exercise-solutions.ipynb | ./ch07 |
| Appendix A: Introduction to PyTorch | - code-part1.ipynb<br/>- code-part2.ipynb<br/>- DDP-script.py<br/>- exercise-solutions.ipynb | ./appendix-A |
| Appendix B: References and Further Reading | No code | ./appendix-B |
| Appendix C: Exercise Solutions | - list of exercise solutions | ./appendix-C |
| Appendix D: Adding Bells and Whistles to the Training Loop | - appendix-D.ipynb | ./appendix-D |
| Appendix E: Parameter-efficient Finetuning with LoRA | - appendix-E.ipynb | ./appendix-E |

<br>
&nbsp;

The mental model below summarizes the contents covered in this book.

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/mental-model.jpg" width="650px">

<br>
&nbsp;
Prerequisites

The most important prerequisite is a strong foundation in Python programming.
With this knowledge, you will be well prepared to explore the fascinating world of LLMs
and understand the concepts and code examples presented in this book.

If you have some experience with deep neural networks, you may find certain concepts more familiar, as LLMs are built upon these architectures.

This book uses PyTorch to implement the code from scratch without using any external LLM libraries. While proficiency in PyTorch is not a prerequisite, familiarity with PyTorch basics is certainly useful. If you are new to PyTorch, Appendix A provides a concise introduction to PyTorch. Alternatively, you may find my book, PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs, helpful for learning about the essentials.

<br>
&nbsp;
Hardware Requirements

The code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the setup doc for additional recommendations.)

&nbsp;
Video Course

A 17-hour and 15-minute companion video course where I code through each chapter of the book. The course is organized into chapters and sections that mirror the book's structure so that it can be used as a standalone alternative to the book or complementary code-along resource.

<a href="https://www.manning.com/livevi…