spmallick/learnopencv

每日信息看板 · 2026-02-10

返回当天 Daily Index

开源项目

AI 总结

GitHub仓库 spmallick/learnopencv 汇总了 LearnOpenCV 博客的大量计算机视觉、深度学习与AI教程及配套代码，重要性在于为开发者提供可复现、覆盖前沿主题的一站式学习与实践资源。

仓库定位为 LearnOpenCV 博客文章的代码集合，聚焦计算机视觉、深度学习与AI实践。
内容覆盖广泛，包括YOLO系列、VLM/LLM、RAG、3D重建、SLAM、扩散模型、医疗影像、机器人等方向。
多数文章提供对应代码链接，强调从论文解读到落地实现的可复现学习路径。
包含大量边缘部署与工程化主题，如Jetson、Arduino、Raspberry Pi、ROS2、Carla等。
仓库持续更新，反映近期热门模型与技术趋势（如SAM2/3、Qwen、Gemma、VideoRAG等）。

#GitHub #repo #开源项目 #OpenCV #YOLO #RAG #Agent

原链接

内容摘录

LearnOpenCV

This repository contains code for Computer Vision, Deep learning, and AI research articles shared on our blog LearnOpenCV.com.

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.

<a href="https://opencv.org/courses/">

<p align="center">
<img src="https://learnopencv.com/wp-content/uploads/2023/01/AI-Courses-By-OpenCV-Github.png">
</p>
</a>
List of Blog Posts

| Blog Post | Code|
| ------------- |:-------------|
|Deployment on Edge: LLM Serving on Jetson using vLLM|Code|
|Nested Learning: Is Deep Learning Architecture an Illusion?||
| How to Build a GitHub Code-Analyser Agent for Developer Productivity | Code |
| SAM 3D: Foundation Model for Single-Image 3D Reconstruction | |
| SAM-3: What’s New, How It Works, and Why It Matters | Code |
| Image-GS: Adaptive Image Reconstruction using 2D Gaussians | Code |
| Ultimate Guide to Vector Databases and RAG Pipeline | Code |
|What Makes DeepSeek OCR So Powerful|Code|
| 2D Gaussian Splatting: Geometrically Accurate Radiance Field Reconstruction | Code |
| TRM: Tiny Recursive Models | Code |
|Deploying ML Models on Arduino: From Blink to Think|Code|
| VideoRAG: Redefining Long-Context Video Comprehension | |
| AI Agent in Action: Automating Desktop Tasks with VLMs | Code |
| Top VLM Evaluation Metrics for Optimal Performance Analysis | Code |
|Getting Started with VLM on Jetson Nano|Code|
| VLM on Edge: Worth the Hype or Just a Novelty? | Code |
| AnomalyCLIP : Harnessing CLIP for Weakly-Supervised Video Anomaly Recognition | Code |
| AI_for_Video_Understanding_From_Content_Moderation_to_Summarization | Code |
| Video-RAG: Training-Free Retrieval for Long-Video LVLMs | Code |
| Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL | Code |
| LangGraph: Building Self-Correcting RAG Agent for Code Generation | Code |
| Inside Sinusoidal Position Embeddings: A Sense of Order | Code |
| Inside RoPE: Rotary Magic into Position Embeddings | Code |
| SimLingo-Vision-Language-Action-Model-for-Autonomous-Driving | Code |
| FineTuning Gemma 3n for Medical VQA on ROCOv2 | Code |
| SmolLM3 Blueprint: SOTA 3B-Parameter LLM | |
| LangGraph-A-Visual-Automation-and-Summarization-Pipeline | Code |
| Fine-Tuning AnomalyCLIP: Class-Agnostic Zero-Shot Anomaly Detection | Code |
| SigLIP 2: DeepMind’s Multilingual Vision-Language Model | |
| MedGemma: Google’s Medico VLM for Clinical QA, Imaging, and More | Code |
| Nanonets-OCR-s: Enabling Rich, Structured Markdown for Document Understanding | |
| Optimizing VJEPA-2: Tackling Latency & Context in Real-Time Video Classification Scripts | Code |
| V-JEPA 2: Meta’s Breakthrough in AI for the Physical World | Code |
| NVIDIA Cosmos Reason1: Video Understanding | Code |
| GR00T N1.5 Explained | |
| LLaVA | Code |
| SmolVLA: Affordable & Efficient VLA Robotics on Consumer GPUs | Code |
| Fine-Tuning Grounding DINO: Open-Vocabulary Object Detection | Code |
| Getting Started with Qwen3 – The Thinking Expert | Code |
| Inside the GPU: A Comprehensive Guide to Modern Graphics Architecture | |
| Distributed Parallel Training: PyTorch | Code |
| MONAI: The Definitive Framework for Medical Imaging Powered by PyTorch | |
| SANA-Sprint: The One-Step Revolution in High-Quality AI Image Synthesis | |
| FramePack-Video-Diffusion-but-feels-like-Image-Diffusion | Code |
| Model Weights File Formats in Machine Learning | |
| Unsloth: A Guide from Basics to Fine-Tuning Vision Models | Code |
| Iterative Closest Point (ICP) Algorithm Explained | Code |
| MedSAM2 Explained: One Prompt to Segment Anything in Medical Imaging | Code |
| Batch Normalization and Dropout as Regularizers | |
| DINOv2_by_Meta_A_Self-Supervised_foundational_vision_model | Code |
| Beginner's Guide to Embedding Models | |
| MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors | Code |
| Google's A2A Protocol | |
| Nvidia SANA : Faster Image Generation | |
| Fine-tuning RF-DETR | Code |
| Qwen2.5-Omni: A Real-Time Multimodal AI | |
| Vision Language Action Models: Robotic Control | Code |
| Fine-Tuning Gemma 3 VLM using QLoRA for LaTeX-OCR Dataset | Code |
| ComfyUI | Code |
| Gemma-3: A Comprehensive Introduction | |
| YOLO11 on Raspberry Pi: Optimizing Object Detection for Edge Devices | Code |
| VGGT: Visual Geometry Grounded Transformer – For Dense 3D Reconstruction | Code |
| DDIM: The Faster, Improved Version of DDPM for Efficient AI Image Generation | Code |
| Introduction to Model Context Protocol (MCP) | |
| MASt3R and MASt3R-SfM Explanation: Image Matching and 3D Reconstruction | Code |
| MatAnyone Explained: Consistent Memory for Better Video Matting | Code |
| GraphRAG: For Medical Document Analysis | Code |
| OmniParser: Vision Based GUI Agent | |
| Fine-Tuning-YOLOv12-Comparison-With-YOLOv11-And-YOLOv7-Based-Darknet | Code |
| FineTuning RetinaNet for Wildlife Detection with PyTorch: A Step-by-Step Tutorial | Code |
| DUSt3R: Geometric 3D Vision Made Easy : Explanation and Results | Code |
| YOLOv12: Attention Meets Speed | Code |
| Video Generation: A Diffusion based approach | Code |
| Agentic AI: A Comprehensive Introduction | Code |
| Finetuning SAM2 for Leaf Disease Segmentation | Code |
| Object Insertion in Gaussian Splatting: Paper Explained and Training Code for MCMC and Bilateral Grid | Code |
| Depth Pro: Sharp Monocular Metric Depth | Code |
| Fine-tuning-Stable-Diffusion-3_5-UI-images | Code |
| SimSiam: Streamlining SSL with Stop-Gradient Mechanism | Code |
| Image Captioning using ResNet and LSTM | Code |
| Molmo VLM: Paper Explanation and Demo | Code |
| 3D Gaussian Splatting Paper Explanation: Training Custom Datasets with NeRF-Studio Gsplats | Code |
| FLUX Image Generation: Experimenting with the Parameters | Code |
| Contrastive-Learning-SimCLR-and-BYOL(With Code Example) | Code |
| The Annotated NeRF : Training on Custom Dataset from Scratch in Pytorch | Code |
| Stable Diffusion 3 and 3.5: Paper Explanation and Inference | Code |
| LightRAG - Legal Document Analy…