Firecrawl

English | 中文

This service deploys Firecrawl, a web scraping and crawling API powered by Playwright and headless browsers.

Services

api: The main Firecrawl API server with integrated workers
redis: Redis for job queue and caching
playwright-service: Playwright service for browser automation
nuq-postgres: PostgreSQL database for queue management and data storage

Environment Variables

Variable Name	Description	Default Value
FIRECRAWL_VERSION	Firecrawl image version	`latest`
REDIS_VERSION	Redis image version	`alpine`
PLAYWRIGHT_VERSION	Playwright service version	`latest`
NUQ_POSTGRES_VERSION	NUQ PostgreSQL image version	`latest`
POSTGRES_USER	PostgreSQL username	`postgres`
POSTGRES_PASSWORD	PostgreSQL password	`postgres`
POSTGRES_DB	PostgreSQL database name	`postgres`
POSTGRES_PORT_OVERRIDE	PostgreSQL port mapping	`5432`
INTERNAL_PORT	Internal API port	`3002`
FIRECRAWL_PORT_OVERRIDE	External API port mapping	`3002`
EXTRACT_WORKER_PORT	Extract worker port	`3004`
WORKER_PORT	Worker port	`3005`
USE_DB_AUTHENTICATION	Enable database authentication	`false`
OPENAI_API_KEY	OpenAI API key for AI features (optional)	`""`
OPENAI_BASE_URL	OpenAI API base URL (optional)	`""`
MODEL_NAME	AI model name (optional)	`""`
MODEL_EMBEDDING_NAME	Embedding model name (optional)	`""`
OLLAMA_BASE_URL	Ollama base URL (optional)	`""`
BULL_AUTH_KEY	Bull queue admin panel authentication key	`@`
TEST_API_KEY	Test API key (optional)	`""`
SLACK_WEBHOOK_URL	Slack webhook for notifications (optional)	`""`
POSTHOG_API_KEY	PostHog API key (optional)	`""`
POSTHOG_HOST	PostHog host (optional)	`""`
SUPABASE_ANON_TOKEN	Supabase anonymous token (optional)	`""`
SUPABASE_URL	Supabase URL (optional)	`""`
SUPABASE_SERVICE_TOKEN	Supabase service token (optional)	`""`
SELF_HOSTED_WEBHOOK_URL	Self-hosted webhook URL (optional)	`""`
SERPER_API_KEY	Serper API key for search (optional)	`""`
SEARCHAPI_API_KEY	SearchAPI key (optional)	`""`
LOGGING_LEVEL	Logging level	`info`
PROXY_SERVER	Proxy server URL (optional)	`""`
PROXY_USERNAME	Proxy username (optional)	`""`
PROXY_PASSWORD	Proxy password (optional)	`""`
BLOCK_MEDIA	Block media content	`true`
SEARXNG_ENDPOINT	SearXNG endpoint (optional)	`""`
SEARXNG_ENGINES	SearXNG engines (optional)	`""`
SEARXNG_CATEGORIES	SearXNG categories (optional)	`""`

Please modify the .env file as needed for your use case.

Volumes

redis_data: Redis data storage for job queues and caching
postgres_data: PostgreSQL data storage for queue management and metadata

Usage

Start the Services

docker compose up -d

Access the API

The Firecrawl API will be available at:

http://localhost:3002

Admin Panel

Access the Bull queue admin panel at:

http://localhost:3002/admin/@/queues

Replace @ with your BULL_AUTH_KEY value if changed.

Example API Calls

Scrape a Single Page:

curl -X POST http://localhost:3002/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

Crawl a Website:

curl -X POST http://localhost:3002/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 100
  }'

Extract Structured Data:

curl -X POST http://localhost:3002/v1/extract \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "description": {"type": "string"}
      }
    }
  }'

Features

Web Scraping: Extract clean content from any webpage
Web Crawling: Recursively crawl entire websites
JavaScript Rendering: Full support for dynamic JavaScript-rendered pages
Markdown Output: Clean markdown conversion of web content
Structured Data Extraction: Extract data using JSON schemas
Queue Management: Built-in job queue with Bull
Rate Limiting: Configurable rate limiting
Proxy Support: Optional proxy configuration for all requests
AI-Powered Features: Optional OpenAI integration for advanced extraction

Architecture

This deployment uses the official Firecrawl architecture:

API Server: Handles HTTP requests and manages the job queue
Workers: Built into the main container, processes scraping jobs
PostgreSQL: Stores queue metadata and job information
Redis: Handles job queue and caching
Playwright Service: Provides browser automation capabilities

Notes

The service uses the official ghcr.io/firecrawl/firecrawl image
PostgreSQL uses the official ghcr.io/firecrawl/nuq-postgres image for queue management (NUQ - Not Quite Bull)
Redis is used for job queuing without password by default (runs on private network)
For production use, enable USE_DB_AUTHENTICATION and configure Supabase
The BULL_AUTH_KEY should be changed in production deployments
AI features require an OPENAI_API_KEY or OLLAMA_BASE_URL
All workers run within the single API container using the harness mode

License

Firecrawl is licensed under the AGPL-3.0 License.

6.6 KiB Raw Blame History