firecrawl/firecrawl

每日信息看板 · 2026-02-04

返回当天 Daily Index

开源项目

AI 总结

Firecrawl 开源提供网页抓取/爬取与AI结构化抽取的API与SDK，可将任意网站转为LLM可用的Markdown/结构化数据，帮助AI应用稳定获取干净数据并支持云端与部分自托管。

输入URL即可自动发现并抓取可访问子页面，无需站点地图，输出Markdown/结构化数据/HTML/截图等
提供 Scrape、Crawl、Map、Search、Extract（含LLM抽取、可无schema仅用prompt）等核心接口
处理代理、反爬、JS动态渲染、解析编排等“硬问题”，并支持PDF/DOCX/图片等媒体解析
支持Actions（点击/滚动/输入/等待等，云端专有）与批量异步抓取、变更跟踪
生态集成丰富：Python/Node SDK，LangChain/LlamaIndex/Crew.ai等LLM框架及Dify/Langflow/Flowise等低代码平台
AGPL-3.0为主并含部分MIT组件；同时提供firecrawl.dev托管云版本，开源版尚未完全就绪自托管但可本地运行

#GitHub #repo #开源项目 #Firecrawl #AGPL-3.0

原链接

内容摘录

<h3 align="center">
 <a name="readme-top"></a>
 <img
 src="https://raw.githubusercontent.com/firecrawl/firecrawl/main/img/firecrawl_logo.png"
 height="200"
</h3>
<div align="center">
 <a href="https://github.com/firecrawl/firecrawl/blob/main/LICENSE">
 <img src="https://img.shields.io/github/license/firecrawl/firecrawl" alt="License">
</a>
 <a href="https://pepy.tech/project/firecrawl-py">
 <img src="https://static.pepy.tech/badge/firecrawl-py" alt="Downloads">
</a>
<a href="https://GitHub.com/firecrawl/firecrawl/graphs/contributors">
 <img src="https://img.shields.io/github/contributors/firecrawl/firecrawl.svg" alt="GitHub Contributors">
</a>
<a href="https://firecrawl.dev">
 <img src="https://img.shields.io/badge/Visit-firecrawl.dev-orange" alt="Visit firecrawl.dev">
</a>
</div>
<div>
 <p align="center">
 <a href="https://twitter.com/firecrawl_dev">
 <img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" />
 </a>
 <a href="https://www.linkedin.com/company/104100957">
 <img src="https://img.shields.io/badge/Follow%20on%20LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Follow on LinkedIn" />
 </a>
 <a href="https://discord.com/invite/gSmWdAkdwd">
 <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" />
 </a>
 </p>
</div>
🔥 Firecrawl

Empower your AI apps with clean data from any website. Featuring advanced scraping, crawling, and data extraction capabilities.

_This repository is in development, and we’re still integrating custom modules into the mono repo. It's not fully ready for self-hosted deployment yet, but you can run it locally._
What is Firecrawl?

Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap required. Check out our documentation.

Looking for our MCP? Check out the repo here.

_Pst. hey, you, join our stargazers :)_

<a href="https://github.com/firecrawl/firecrawl">
 <img src="https://img.shields.io/github/stars/firecrawl/firecrawl.svg?style=social&label=Star&maxAge=2592000" alt="GitHub stars">
</a>
How to use it?

We provide an easy to use API with our hosted version. You can find the playground and documentation here. You can also self host the backend if you'd like.

Check out the following resources to get started:
[x] **API**: Documentation
[x] **SDKs**: Python, Node
[x] **LLM Frameworks**: Langchain (python), Langchain (js), Llama Index, Crew.ai, Composio, PraisonAI, Superinterface, Vectorize
[x] **Low-code Frameworks**: Dify, Langflow, Flowise AI, Cargo, Pipedream
[x] **Community SDKs**: Go, Rust
[x] **Others**: Zapier, Pabbly Connect
[ ] Want an SDK or Integration? Let us know by opening an issue.

To run locally, refer to guide here.
API Key

To use the API, you need to sign up on Firecrawl and get an API key.
Features
**Scrape**: scrapes a URL and gets its content in LLM-ready format (markdown, structured data via LLM Extract, screenshot, html)
**Crawl**: scrapes all the URLs of a web page and returns content in LLM-ready format
**Map**: input a website and get all the website urls - extremely fast
**Search**: search the web and get full content from results
**Extract**: get structured data from single page, multiple pages or entire websites with AI.
Powerful Capabilities
**LLM-ready formats**: markdown, structured data, screenshot, HTML, links, metadata
**The hard stuff**: proxies, anti-bot mechanisms, dynamic content (js-rendered), output parsing, orchestration
**Customizability**: exclude tags, crawl behind auth walls with custom headers, max crawl depth, etc...
**Media parsing**: pdfs, docx, images
**Reliability first**: designed to get the data you need - no matter how hard it is
**Actions**: click, scroll, input, wait and more before extracting data
**Batching**: scrape thousands of URLs at the same time with a new async endpoint
**Change Tracking**: monitor and detect changes in website content over time

You can find all of Firecrawl's capabilities and how to use them in our documentation
Crawling

Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.

Returns a crawl job id and the url to check the status of the crawl.
Check Crawl Job

Used to check the status of a crawl job and get its result.

Scraping

Used to scrape a URL and get its content in the specified formats.

Response:
Map

Used to map a URL and get URLs of the website. This returns most links present on the website.

Response:
Map with search

Map with search param allows you to search for specific URLs inside a website.

Response will be an ordered list from the most relevant to the least relevant.
Search

Search the web and get full content from results

Firecrawl’s search API allows you to perform web searches and optionally scrape the search results in one operation.
Choose specific output formats (markdown, HTML, links, screenshots)
Search the web with customizable parameters (language, country, etc.)
Optionally retrieve content from search results in various formats
Control the number of results and set timeouts
Response
With content scraping
Extract (Beta)

Get structured data from entire websites with a prompt and/or a schema.

You can extract structured data from one or multiple URLs, including wildcards:

Single Page:
Example: https://firecrawl.dev/some-page

Multiple Pages / Full Domain
Example: https://firecrawl.dev/*

When you use /*, Firecrawl will automatically crawl and parse all URLs it can discover in that domain, then extract the requested data.

If you are using the SDKs, it will auto pull the response for you:
LLM Extraction (Beta)

Used to extract structured data from scraped pages.

Extracting without a schema (New)

You can now extract without a schema by just passing a prompt …