Lightning-AI/litgpt

每日信息看板 · 2026-02-17

返回当天 Daily Index

开源项目

AI 总结

Lightning AI 开源 LitGPT 项目，提供20+主流LLM从预训练、微调到评测与部署的一体化高性能流程，因其低抽象与高可控设计可显著降低训练推理成本并加速生产落地。

项目主打“从零实现、无抽象层”，强调可调试性、性能与企业级可控性。
支持20+模型族与多尺寸版本（如Llama、Qwen、Gemma、Phi、Mistral等），覆盖广泛场景。
提供完整工作流：pretrain、continued pretrain、finetune、evaluate、deploy、test。
集成Flash Attention、FSDP、LoRA/QLoRA/Adapter与fp4/8/16/32量化，支持低显存GPU。
支持从单卡到1000+ GPUs/TPUs扩展，并提供已验证的YAML训练配方。
采用Apache 2.0许可证，便于企业商业化使用与二次开发。

#GitHub #repo #开源项目 #LitGPT #Lightning AI #LLM #PyTorch

原链接

内容摘录

<div align="center">
⚡ LitGPT

**20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.**

<pre>
✅ From scratch implementations ✅ No abstractions ✅ Beginner friendly
 ✅ Flash attention ✅ FSDP ✅ LoRA, QLoRA, Adapter
✅ Reduce GPU memory (fp4/8/16/32) ✅ 1-1000+ GPUs/TPUs ✅ 20+ LLMs 
</pre>

---

!PyPI - Python Version
!cpu-tests license Discord

<p align="center">
 <a href="#quick-start">Quick start</a> •
 <a href="#choose-from-20-llms">Models</a> •
 <a href="#finetune-an-llm">Finetune</a> •
 <a href="#deploy-an-llm">Deploy</a> •
 <a href="#all-workflows">All workflows</a> •
 <a href="#state-of-the-art-features">Features</a> •
 <a href="#training-recipes">Recipes (YAML)</a> •
 <a href="https://lightning.ai/">Lightning AI</a> •
 <a href="#tutorials">Tutorials</a>
</p>

&nbsp;

<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-quick-start">
 <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>

&nbsp;

</div>
Looking for GPUs?
Over 340,000 developers use Lightning Cloud - purpose-built for PyTorch and PyTorch Lightning. 
GPUs from $0.19. 
Clusters: frontier-grade training/inference clusters. 
AI Studio (vibe train): workspaces where AI helps you debug, tune and vibe train.
AI Studio (vibe deploy): workspaces where AI helps you optimize, and deploy models. 
Notebooks: Persistent GPU workspaces where AI helps you code and analyze.
Inference: Deploy models as inference APIs.
Finetune, pretrain, and inference LLMs Lightning fast ⚡⚡
Every LLM is implemented from scratch with **no abstractions** and **full control**, making them blazing fast, minimal, and performant at enterprise scale.

✅ **Enterprise ready -** Apache 2.0 for unlimited enterprise use.</br>
✅ **Developer friendly -** Easy debugging with no abstraction layers and single file implementations.</br>
✅ **Optimized performance -** Models designed to maximize performance, reduce costs, and speed up training.</br>
✅ **Proven recipes -** Highly-optimized training/finetuning recipes tested at enterprise scale.</br>

&nbsp;
Quick start
Install LitGPT

Load and use any of the 20+ LLMs:

&nbsp;

✅ Optimized for fast inference</br>
✅ Quantization</br>
✅ Runs on low-memory GPUs</br>
✅ No layers of internal abstractions</br>
✅ Optimized for production scale</br>

<details>
 <summary>Advanced install options</summary>

Install from source:

</details>

Explore the full Python API docs.

&nbsp;

---
Choose from 20+ LLMs
Every model is written from scratch to maximize performance and remove layers of abstraction:

| Model | Model size | Author | Reference |
|----|----|----|----|
| Llama 3, 3.1, 3.2, 3.3 | 1B, 3B, 8B, 70B, 405B | Meta AI | Meta AI 2024 |
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | Rozière et al. 2023 |
| CodeGemma | 7B | Google | Google Team, Google Deepmind |
| Gemma 2 | 2B, 9B, 27B | Google | Google Team, Google Deepmind |
| Phi 4 | 14B | Microsoft Research | Abdin et al. 2024 |
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | Qwen Team 2024 |
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | Hui, Binyuan et al. 2024 |
| R1 Distill Llama | 8B, 70B | DeepSeek AI | DeepSeek AI 2025 |
| ... | ... | ... | ... |

<details>
 <summary>See full list of 20+ LLMs</summary>

&nbsp;
All models

| Model | Model size | Author | Reference |
|----|----|----|----|
| CodeGemma | 7B | Google | Google Team, Google Deepmind |
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | Rozière et al. 2023 |
| Falcon | 7B, 40B, 180B | TII UAE | TII 2023 |
| Falcon 3 | 1B, 3B, 7B, 10B | TII UAE | TII 2024 |
| FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | Stability AI 2023 |
| Function Calling Llama 2 | 7B | Trelis | Trelis et al. 2023 |
| Gemma | 2B, 7B | Google | Google Team, Google Deepmind |
| Gemma 2 | 9B, 27B | Google | Google Team, Google Deepmind |
| Gemma 3 | 1B, 4B, 12B, 27B | Google | Google Team, Google Deepmind |
| Llama 2 | 7B, 13B, 70B | Meta AI | Touvron et al. 2023 |
| Llama 3.1 | 8B, 70B | Meta AI | Meta AI 2024 |
| Llama 3.2 | 1B, 3B | Meta AI | Meta AI 2024 |
| Llama 3.3 | 70B | Meta AI | Meta AI 2024 |
| Mathstral | 7B | Mistral AI | Mistral AI 2024 |
| MicroLlama | 300M | Ken Wang | MicroLlama repo |
| Mixtral MoE | 8x7B | Mistral AI | Mistral AI 2023 |
| Mistral | 7B, 123B | Mistral AI | Mistral AI 2023 |
| Mixtral MoE | 8x22B | Mistral AI | Mistral AI 2024 |
| OLMo | 1B, 7B | Allen Institute for AI (AI2) | Groeneveld et al. 2024 |
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | Geng & Liu 2023 |
| Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research | Li et al. 2023 |
| Phi 3 | 3.8B | Microsoft Research | Abdin et al. 2024 |
| Phi 4 | 14B | Microsoft Research | Abdin et al. 2024 |
| Phi 4 Mini Instruct | 3.8B | Microsoft Research | Microsoft 2025 |
| Phi 4 Mini Reasoning | 3.8B | Microsoft Research | Xu, Peng et al. 2025 |
| Phi 4 Reasoning | 3.8B | Microsoft Research | Abdin et al. 2025 |
| Phi 4 Reasoning Plus | 3.8B | Microsoft Research | Abdin et al. 2025 |
| Platypus | 7B, 13B, 70B | Lee et al. | Lee, Hunter, and Ruiz 2023 |
| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | Biderman et al. 2023 |
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | Qwen Team 2024 |
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | Hui, Binyuan et al. 2024 |
| Qwen2.5 1M (Long Context) | 7B, 14B | Alibaba Group | Qwen Team 2025 |
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | An, Yang et al. 2024 |
| QwQ | 32B | Alibaba Group | Qwen Team 2025 |
| QwQ-Preview | 32B | Alibaba Group | Qwen Team 2024 |
| Qwen3 | 0.6B, 1.7B, 4B{Hybrid, Thinking-2507, Instruct-2507}, 8B, 14B, 32B | Alibaba Group | Qwen Team 2025 |
| Qwen3 MoE | 30B{Hybrid, Thinking-2507, Instruct-2507}, 235B{Hybrid, Thinking-2507, Instruct-2507} | Alibaba Group | Qwen Team 2025 |
| R1 Distill Llama | 8B, 70B | DeepSeek AI | DeepSeek AI 2025 |
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | Hugging Fa…