diff --git a/README.md b/README.md index 876c698..e372007 100644 --- a/README.md +++ b/README.md @@ -31,9 +31,11 @@ Compose Anything helps users quickly deploy various services by providing a set | [HashiCorp Consul](./src/consul) | 1.20.3 | | [IOPaint](./src/io-paint) | latest | | [Jenkins](./src/jenkins) | 2.486-lts | +| [JODConverter](./src/jodconverter) | latest | | [Kibana](./src/kibana) | 8.16.1 | | [Kodbox](./src/kodbox) | 1.62 | | [Kong](./src/kong) | 3.8.0 | +| [LibreOffice](./src/libreoffice) | latest | | [Langfuse](./src/langfuse) | 3.115.0 | | [LiteLLM](./src/litellm) | main-stable | | [Logstash](./src/logstash) | 8.16.1 | @@ -59,6 +61,7 @@ Compose Anything helps users quickly deploy various services by providing a set | [OpenList](./src/openlist) | latest | | [PocketBase](./src/pocketbase) | 0.30.0 | | [Portainer](./src/portainer) | 2.27.3-alpine | +| [Portkey AI Gateway](./src/portkey-gateway) | latest | | [PostgreSQL](./src/postgres) | 17.6 | | [Prometheus](./src/prometheus) | 3.5.0 | | [Qdrant](./src/qdrant) | 1.15.4 | diff --git a/README.zh.md b/README.zh.md index 289a91b..8b9106a 100644 --- a/README.zh.md +++ b/README.zh.md @@ -31,9 +31,11 @@ Compose Anything 通过提供一组高质量的 Docker Compose 配置文件, | [HashiCorp Consul](./src/consul) | 1.20.3 | | [IOPaint](./src/io-paint) | latest | | [Jenkins](./src/jenkins) | 2.486-lts | +| [JODConverter](./src/jodconverter) | latest | | [Kibana](./src/kibana) | 8.16.1 | | [Kodbox](./src/kodbox) | 1.62 | | [Kong](./src/kong) | 3.8.0 | +| [LibreOffice](./src/libreoffice) | latest | | [Langfuse](./src/langfuse) | 3.115.0 | | [LiteLLM](./src/litellm) | main-stable | | [Logstash](./src/logstash) | 8.16.1 | @@ -59,6 +61,7 @@ Compose Anything 通过提供一组高质量的 Docker Compose 配置文件, | [OpenList](./src/openlist) | latest | | [PocketBase](./src/pocketbase) | 0.30.0 | | [Portainer](./src/portainer) | 2.27.3-alpine | +| [Portkey AI 网关](./src/portkey-gateway) | latest | | [PostgreSQL](./src/postgres) | 17.6 | | [Prometheus](./src/prometheus) | 3.5.0 | | [Qdrant](./src/qdrant) | 1.15.4 | diff --git a/src/bolt-diy/.env.example b/src/bolt-diy/.env.example new file mode 100644 index 0000000..922ee43 --- /dev/null +++ b/src/bolt-diy/.env.example @@ -0,0 +1,17 @@ +# Bolt.diy Configuration +# For more information, visit: https://github.com/stackblitz-labs/bolt.diy + +# Container port override +# BOLT_DIY_PORT_OVERRIDE=5173 + +# Log level (trace, debug, info, warn, error) +# VITE_LOG_LEVEL=info + +# Enable experimental features +# ENABLE_EXPERIMENTAL_FEATURES=false + +# Bolt.diy version +# BOLT_DIY_VERSION=latest + +# Timezone +# TZ=UTC diff --git a/src/bolt-diy/README.md b/src/bolt-diy/README.md new file mode 100644 index 0000000..8657220 --- /dev/null +++ b/src/bolt-diy/README.md @@ -0,0 +1,57 @@ +# Bolt.diy + +Bolt.diy is an AI-powered web IDE that enables you to build full-stack web applications directly in your browser. It combines the power of AI with a modern development environment to streamline your development workflow. + +## Quick Start + +```bash +docker compose up -d +``` + +Access Bolt.diy at [http://localhost:5173](http://localhost:5173) + +## Features + +- **AI-Powered Development**: Leverage AI to assist with code generation and development +- **Full-Stack Development**: Build complete web applications with frontend and backend capabilities +- **Real-time Preview**: See your changes in real-time as you develop +- **Built-in Terminal**: Execute commands directly within the IDE +- **Git Integration**: Manage your repositories within the IDE + +## Configuration + +### Environment Variables + +| Variable | Default | Description | +| ------------------------------ | ------- | ------------------------------------------- | +| `BOLT_DIY_PORT_OVERRIDE` | 5173 | Host port for accessing Bolt.diy | +| `BOLT_DIY_VERSION` | latest | Docker image version | +| `VITE_LOG_LEVEL` | info | Log level (trace, debug, info, warn, error) | +| `ENABLE_EXPERIMENTAL_FEATURES` | false | Enable experimental features | +| `TZ` | UTC | Timezone | + +### Port Mapping + +- **5173**: Bolt.diy web interface + +## Volume + +The container uses in-memory storage for the development environment. For persistent storage, you can mount volumes as needed. + +## Health Check + +The service includes a health check that monitors the availability of the web interface. + +## Resource Limits + +- **CPU**: 2 cores (limit) / 0.5 cores (reservation) +- **Memory**: 2GB (limit) / 512MB (reservation) + +## Documentation + +- [Official Bolt.diy Repository](https://github.com/stackblitz-labs/bolt.diy) +- [Bolt.diy Documentation](https://docs.bolt.new/) + +## License + +Refer to the [Bolt.diy License](https://github.com/stackblitz-labs/bolt.diy/blob/main/LICENSE) diff --git a/src/bolt-diy/README.zh.md b/src/bolt-diy/README.zh.md new file mode 100644 index 0000000..d3840aa --- /dev/null +++ b/src/bolt-diy/README.zh.md @@ -0,0 +1,57 @@ +# Bolt.diy + +Bolt.diy 是一个由 AI 驱动的网页版 IDE,让你可以直接在浏览器中构建全栈 web 应用程序。它将 AI 的强大功能与现代开发环境相结合,以简化你的开发工作流程。 + +## 快速开始 + +```bash +docker compose up -d +``` + +在 [http://localhost:5173](http://localhost:5173) 访问 Bolt.diy + +## 功能特性 + +- **AI 驱动开发**:利用 AI 辅助代码生成和开发 +- **全栈开发**:构建具有前端和后端功能的完整 web 应用程序 +- **实时预览**:在开发时实时查看你的更改 +- **内置终端**:直接在 IDE 中执行命令 +- **Git 集成**:在 IDE 中管理你的代码库 + +## 配置 + +### 环境变量 + +| 变量 | 默认值 | 说明 | +| ------------------------------ | ------ | ------------------------------------------- | +| `BOLT_DIY_PORT_OVERRIDE` | 5173 | 访问 Bolt.diy 的主机端口 | +| `BOLT_DIY_VERSION` | latest | Docker 镜像版本 | +| `VITE_LOG_LEVEL` | info | 日志级别(trace、debug、info、warn、error) | +| `ENABLE_EXPERIMENTAL_FEATURES` | false | 启用实验性功能 | +| `TZ` | UTC | 时区 | + +### 端口映射 + +- **5173**:Bolt.diy web 界面 + +## 存储卷 + +容器为开发环境使用内存存储。如需持久化存储,可根据需要挂载卷。 + +## 健康检查 + +该服务包含一个健康检查,监控 web 界面的可用性。 + +## 资源限制 + +- **CPU**:2 核心(上限)/ 0.5 核心(预留) +- **内存**:2GB(上限)/ 512MB(预留) + +## 文档 + +- [Bolt.diy 官方仓库](https://github.com/stackblitz-labs/bolt.diy) +- [Bolt.diy 文档](https://docs.bolt.new/) + +## 许可证 + +参考 [Bolt.diy 许可证](https://github.com/stackblitz-labs/bolt.diy/blob/main/LICENSE) diff --git a/src/bolt-diy/docker-compose.yaml b/src/bolt-diy/docker-compose.yaml new file mode 100644 index 0000000..b447c97 --- /dev/null +++ b/src/bolt-diy/docker-compose.yaml @@ -0,0 +1,32 @@ +x-default: &default + restart: unless-stopped + logging: + driver: json-file + options: + max-size: 100m + max-file: "3" + +services: + bolt-diy: + <<: *default + image: stackblitz/bolt:${BOLT_DIY_VERSION:-latest} + ports: + - "${BOLT_DIY_PORT_OVERRIDE:-5173}:5173" + environment: + - TZ=${TZ:-UTC} + - VITE_LOG_LEVEL=${VITE_LOG_LEVEL:-info} + - ENABLE_EXPERIMENTAL_FEATURES=${ENABLE_EXPERIMENTAL_FEATURES:-false} + deploy: + resources: + limits: + cpus: '2.00' + memory: 2G + reservations: + cpus: '0.5' + memory: 512M + healthcheck: + test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:5173/"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 10s diff --git a/src/firecrawl/.env.example b/src/firecrawl/.env.example index 6d6fd6d..ff769f3 100644 --- a/src/firecrawl/.env.example +++ b/src/firecrawl/.env.example @@ -1,25 +1,72 @@ # Firecrawl version -FIRECRAWL_VERSION="v1.16.0" +FIRECRAWL_VERSION="latest" # Redis version -REDIS_VERSION="7.4.2-alpine" +REDIS_VERSION="alpine" # Playwright version PLAYWRIGHT_VERSION="latest" +# PostgreSQL version (official Firecrawl nuq-postgres image) +NUQ_POSTGRES_VERSION="latest" + +# PostgreSQL configuration +POSTGRES_USER="postgres" +POSTGRES_PASSWORD="postgres" +POSTGRES_DB="postgres" +POSTGRES_PORT_OVERRIDE=5432 + # Redis configuration -REDIS_PASSWORD="firecrawl" +# REDIS_URL is auto-configured by docker-compose +# REDIS_URL="redis://redis:6379" -# Firecrawl configuration -NUM_WORKERS_PER_QUEUE=8 -SCRAPE_RATE_LIMIT_TOKEN_BUCKET_SIZE=20 -SCRAPE_RATE_LIMIT_TOKEN_BUCKET_REFILL=1 +# Firecrawl API configuration +INTERNAL_PORT=3002 +FIRECRAWL_PORT_OVERRIDE=3002 +EXTRACT_WORKER_PORT=3004 +WORKER_PORT=3005 -# Playwright configuration (optional) +# Database authentication +USE_DB_AUTHENTICATION="false" + +# AI features (Optional) +# OPENAI_API_KEY="" +# OPENAI_BASE_URL="" +# MODEL_NAME="" +# MODEL_EMBEDDING_NAME="" +# OLLAMA_BASE_URL="" + +# Admin and security +BULL_AUTH_KEY="@" +# TEST_API_KEY="" + +# Monitoring (Optional) +# SLACK_WEBHOOK_URL="" +# POSTHOG_API_KEY="" +# POSTHOG_HOST="" + +# Supabase authentication (Optional) +# SUPABASE_ANON_TOKEN="" +# SUPABASE_URL="" +# SUPABASE_SERVICE_TOKEN="" + +# Webhook configuration (Optional) +# SELF_HOSTED_WEBHOOK_URL="" + +# Search API keys (Optional) +# SERPER_API_KEY="" +# SEARCHAPI_API_KEY="" + +# Logging +LOGGING_LEVEL="info" + +# Playwright proxy configuration (Optional) PROXY_SERVER="" PROXY_USERNAME="" PROXY_PASSWORD="" BLOCK_MEDIA="true" -# Port overrides -FIRECRAWL_PORT_OVERRIDE=3002 +# SearXNG configuration (Optional) +# SEARXNG_ENDPOINT="" +# SEARXNG_ENGINES="" +# SEARXNG_CATEGORIES="" diff --git a/src/firecrawl/README.md b/src/firecrawl/README.md index 5fde0fc..e980910 100644 --- a/src/firecrawl/README.md +++ b/src/firecrawl/README.md @@ -6,39 +6,66 @@ This service deploys Firecrawl, a web scraping and crawling API powered by Playw ## Services -- `firecrawl`: The main Firecrawl API server. -- `redis`: Redis for job queue and caching. -- `playwright`: Playwright service for browser automation. +- `api`: The main Firecrawl API server with integrated workers +- `redis`: Redis for job queue and caching +- `playwright-service`: Playwright service for browser automation +- `nuq-postgres`: PostgreSQL database for queue management and data storage ## Environment Variables -| Variable Name | Description | Default Value | -| ------------------------------------- | ----------------------------------- | -------------- | -| FIRECRAWL_VERSION | Firecrawl image version | `v1.16.0` | -| REDIS_VERSION | Redis image version | `7.4.2-alpine` | -| PLAYWRIGHT_VERSION | Playwright service version | `latest` | -| REDIS_PASSWORD | Redis password | `firecrawl` | -| NUM_WORKERS_PER_QUEUE | Number of workers per queue | `8` | -| SCRAPE_RATE_LIMIT_TOKEN_BUCKET_SIZE | Token bucket size for rate limiting | `20` | -| SCRAPE_RATE_LIMIT_TOKEN_BUCKET_REFILL | Token refill rate per second | `1` | -| PROXY_SERVER | Proxy server URL (optional) | `""` | -| PROXY_USERNAME | Proxy username (optional) | `""` | -| PROXY_PASSWORD | Proxy password (optional) | `""` | -| BLOCK_MEDIA | Block media content | `true` | -| FIRECRAWL_PORT_OVERRIDE | Firecrawl API port | `3002` | +| Variable Name | Description | Default Value | +| ----------------------- | ------------------------------------------ | ------------- | +| FIRECRAWL_VERSION | Firecrawl image version | `latest` | +| REDIS_VERSION | Redis image version | `alpine` | +| PLAYWRIGHT_VERSION | Playwright service version | `latest` | +| NUQ_POSTGRES_VERSION | NUQ PostgreSQL image version | `latest` | +| POSTGRES_USER | PostgreSQL username | `postgres` | +| POSTGRES_PASSWORD | PostgreSQL password | `postgres` | +| POSTGRES_DB | PostgreSQL database name | `postgres` | +| POSTGRES_PORT_OVERRIDE | PostgreSQL port mapping | `5432` | +| INTERNAL_PORT | Internal API port | `3002` | +| FIRECRAWL_PORT_OVERRIDE | External API port mapping | `3002` | +| EXTRACT_WORKER_PORT | Extract worker port | `3004` | +| WORKER_PORT | Worker port | `3005` | +| USE_DB_AUTHENTICATION | Enable database authentication | `false` | +| OPENAI_API_KEY | OpenAI API key for AI features (optional) | `""` | +| OPENAI_BASE_URL | OpenAI API base URL (optional) | `""` | +| MODEL_NAME | AI model name (optional) | `""` | +| MODEL_EMBEDDING_NAME | Embedding model name (optional) | `""` | +| OLLAMA_BASE_URL | Ollama base URL (optional) | `""` | +| BULL_AUTH_KEY | Bull queue admin panel authentication key | `@` | +| TEST_API_KEY | Test API key (optional) | `""` | +| SLACK_WEBHOOK_URL | Slack webhook for notifications (optional) | `""` | +| POSTHOG_API_KEY | PostHog API key (optional) | `""` | +| POSTHOG_HOST | PostHog host (optional) | `""` | +| SUPABASE_ANON_TOKEN | Supabase anonymous token (optional) | `""` | +| SUPABASE_URL | Supabase URL (optional) | `""` | +| SUPABASE_SERVICE_TOKEN | Supabase service token (optional) | `""` | +| SELF_HOSTED_WEBHOOK_URL | Self-hosted webhook URL (optional) | `""` | +| SERPER_API_KEY | Serper API key for search (optional) | `""` | +| SEARCHAPI_API_KEY | SearchAPI key (optional) | `""` | +| LOGGING_LEVEL | Logging level | `info` | +| PROXY_SERVER | Proxy server URL (optional) | `""` | +| PROXY_USERNAME | Proxy username (optional) | `""` | +| PROXY_PASSWORD | Proxy password (optional) | `""` | +| BLOCK_MEDIA | Block media content | `true` | +| SEARXNG_ENDPOINT | SearXNG endpoint (optional) | `""` | +| SEARXNG_ENGINES | SearXNG engines (optional) | `""` | +| SEARXNG_CATEGORIES | SearXNG categories (optional) | `""` | Please modify the `.env` file as needed for your use case. ## Volumes -- `redis_data`: Redis data storage for job queues and caching. +- `redis_data`: Redis data storage for job queues and caching +- `postgres_data`: PostgreSQL data storage for queue management and metadata ## Usage ### Start the Services ```bash -docker-compose up -d +docker compose up -d ``` ### Access the API @@ -49,12 +76,22 @@ The Firecrawl API will be available at: http://localhost:3002 ``` +### Admin Panel + +Access the Bull queue admin panel at: + +```text +http://localhost:3002/admin/@/queues +``` + +Replace `@` with your `BULL_AUTH_KEY` value if changed. + ### Example API Calls **Scrape a Single Page:** ```bash -curl -X POST http://localhost:3002/v0/scrape \ +curl -X POST http://localhost:3002/v1/scrape \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com" @@ -64,12 +101,27 @@ curl -X POST http://localhost:3002/v0/scrape \ **Crawl a Website:** ```bash -curl -X POST http://localhost:3002/v0/crawl \ +curl -X POST http://localhost:3002/v1/crawl \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", - "crawlerOptions": { - "limit": 100 + "limit": 100 + }' +``` + +**Extract Structured Data:** + +```bash +curl -X POST http://localhost:3002/v1/extract \ + -H "Content-Type: application/json" \ + -d '{ + "urls": ["https://example.com"], + "schema": { + "type": "object", + "properties": { + "title": {"type": "string"}, + "description": {"type": "string"} + } } }' ``` @@ -80,16 +132,31 @@ curl -X POST http://localhost:3002/v0/crawl \ - **Web Crawling**: Recursively crawl entire websites - **JavaScript Rendering**: Full support for dynamic JavaScript-rendered pages - **Markdown Output**: Clean markdown conversion of web content -- **Rate Limiting**: Built-in rate limiting to prevent abuse +- **Structured Data Extraction**: Extract data using JSON schemas +- **Queue Management**: Built-in job queue with Bull +- **Rate Limiting**: Configurable rate limiting - **Proxy Support**: Optional proxy configuration for all requests +- **AI-Powered Features**: Optional OpenAI integration for advanced extraction + +## Architecture + +This deployment uses the official Firecrawl architecture: + +- **API Server**: Handles HTTP requests and manages the job queue +- **Workers**: Built into the main container, processes scraping jobs +- **PostgreSQL**: Stores queue metadata and job information +- **Redis**: Handles job queue and caching +- **Playwright Service**: Provides browser automation capabilities ## Notes -- The service uses Playwright for browser automation, supporting complex web pages -- Redis is used for job queuing and caching -- Rate limiting is configurable via environment variables -- For production use, consider scaling the number of workers -- BLOCK_MEDIA can reduce memory usage by blocking images/videos +- The service uses the official `ghcr.io/firecrawl/firecrawl` image +- PostgreSQL uses the official `ghcr.io/firecrawl/nuq-postgres` image for queue management (NUQ - Not Quite Bull) +- Redis is used for job queuing without password by default (runs on private network) +- For production use, enable `USE_DB_AUTHENTICATION` and configure Supabase +- The `BULL_AUTH_KEY` should be changed in production deployments +- AI features require an `OPENAI_API_KEY` or `OLLAMA_BASE_URL` +- All workers run within the single API container using the harness mode ## License diff --git a/src/firecrawl/README.zh.md b/src/firecrawl/README.zh.md index 13df775..12de982 100644 --- a/src/firecrawl/README.zh.md +++ b/src/firecrawl/README.zh.md @@ -6,39 +6,66 @@ ## 服务 -- `firecrawl`: Firecrawl API 主服务器。 -- `redis`: 用于作业队列和缓存的 Redis。 -- `playwright`: 用于浏览器自动化的 Playwright 服务。 +- `api`: Firecrawl API 主服务器,集成了工作进程 +- `redis`: 用于作业队列和缓存的 Redis +- `playwright-service`: 用于浏览器自动化的 Playwright 服务 +- `nuq-postgres`: 用于队列管理和数据存储的 PostgreSQL 数据库 ## 环境变量 -| 变量名 | 说明 | 默认值 | -| ------------------------------------- | ---------------------- | -------------- | -| FIRECRAWL_VERSION | Firecrawl 镜像版本 | `v1.16.0` | -| REDIS_VERSION | Redis 镜像版本 | `7.4.2-alpine` | -| PLAYWRIGHT_VERSION | Playwright 服务版本 | `latest` | -| REDIS_PASSWORD | Redis 密码 | `firecrawl` | -| NUM_WORKERS_PER_QUEUE | 每个队列的工作进程数 | `8` | -| SCRAPE_RATE_LIMIT_TOKEN_BUCKET_SIZE | 速率限制的令牌桶大小 | `20` | -| SCRAPE_RATE_LIMIT_TOKEN_BUCKET_REFILL | 每秒令牌填充速率 | `1` | -| PROXY_SERVER | 代理服务器 URL(可选) | `""` | -| PROXY_USERNAME | 代理用户名(可选) | `""` | -| PROXY_PASSWORD | 代理密码(可选) | `""` | -| BLOCK_MEDIA | 阻止媒体内容 | `true` | -| FIRECRAWL_PORT_OVERRIDE | Firecrawl API 端口 | `3002` | +| 变量名 | 说明 | 默认值 | +| ----------------------- | ----------------------------- | ---------- | +| FIRECRAWL_VERSION | Firecrawl 镜像版本 | `latest` | +| REDIS_VERSION | Redis 镜像版本 | `alpine` | +| PLAYWRIGHT_VERSION | Playwright 服务版本 | `latest` | +| NUQ_POSTGRES_VERSION | NUQ PostgreSQL 镜像版本 | `latest` | +| POSTGRES_USER | PostgreSQL 用户名 | `postgres` | +| POSTGRES_PASSWORD | PostgreSQL 密码 | `postgres` | +| POSTGRES_DB | PostgreSQL 数据库名称 | `postgres` | +| POSTGRES_PORT_OVERRIDE | PostgreSQL 端口映射 | `5432` | +| INTERNAL_PORT | 内部 API 端口 | `3002` | +| FIRECRAWL_PORT_OVERRIDE | 外部 API 端口映射 | `3002` | +| EXTRACT_WORKER_PORT | 提取工作进程端口 | `3004` | +| WORKER_PORT | 工作进程端口 | `3005` | +| USE_DB_AUTHENTICATION | 启用数据库身份验证 | `false` | +| OPENAI_API_KEY | OpenAI API 密钥(可选) | `""` | +| OPENAI_BASE_URL | OpenAI API 基础 URL(可选) | `""` | +| MODEL_NAME | AI 模型名称(可选) | `""` | +| MODEL_EMBEDDING_NAME | 嵌入模型名称(可选) | `""` | +| OLLAMA_BASE_URL | Ollama 基础 URL(可选) | `""` | +| BULL_AUTH_KEY | Bull 队列管理面板身份验证密钥 | `@` | +| TEST_API_KEY | 测试 API 密钥(可选) | `""` | +| SLACK_WEBHOOK_URL | Slack Webhook 通知(可选) | `""` | +| POSTHOG_API_KEY | PostHog API 密钥(可选) | `""` | +| POSTHOG_HOST | PostHog 主机(可选) | `""` | +| SUPABASE_ANON_TOKEN | Supabase 匿名令牌(可选) | `""` | +| SUPABASE_URL | Supabase URL(可选) | `""` | +| SUPABASE_SERVICE_TOKEN | Supabase 服务令牌(可选) | `""` | +| SELF_HOSTED_WEBHOOK_URL | 自托管 Webhook URL(可选) | `""` | +| SERPER_API_KEY | Serper 搜索 API 密钥(可选) | `""` | +| SEARCHAPI_API_KEY | SearchAPI 密钥(可选) | `""` | +| LOGGING_LEVEL | 日志级别 | `info` | +| PROXY_SERVER | 代理服务器 URL(可选) | `""` | +| PROXY_USERNAME | 代理用户名(可选) | `""` | +| PROXY_PASSWORD | 代理密码(可选) | `""` | +| BLOCK_MEDIA | 阻止媒体内容 | `true` | +| SEARXNG_ENDPOINT | SearXNG 端点(可选) | `""` | +| SEARXNG_ENGINES | SearXNG 引擎(可选) | `""` | +| SEARXNG_CATEGORIES | SearXNG 分类(可选) | `""` | 请根据实际需求修改 `.env` 文件。 ## 卷 -- `redis_data`: 用于作业队列和缓存的 Redis 数据存储。 +- `redis_data`: 用于作业队列和缓存的 Redis 数据存储 +- `postgres_data`: 用于队列管理和元数据的 PostgreSQL 数据存储 ## 使用方法 ### 启动服务 ```bash -docker-compose up -d +docker compose up -d ``` ### 访问 API @@ -49,12 +76,22 @@ Firecrawl API 可在以下地址访问: http://localhost:3002 ``` +### 管理面板 + +访问 Bull 队列管理面板: + +```text +http://localhost:3002/admin/@/queues +``` + +如果修改了 `BULL_AUTH_KEY`,请将 `@` 替换为您的值。 + ### API 调用示例 **抓取单个页面:** ```bash -curl -X POST http://localhost:3002/v0/scrape \ +curl -X POST http://localhost:3002/v1/scrape \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com" @@ -64,12 +101,27 @@ curl -X POST http://localhost:3002/v0/scrape \ **爬取网站:** ```bash -curl -X POST http://localhost:3002/v0/crawl \ +curl -X POST http://localhost:3002/v1/crawl \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", - "crawlerOptions": { - "limit": 100 + "limit": 100 + }' +``` + +**提取结构化数据:** + +```bash +curl -X POST http://localhost:3002/v1/extract \ + -H "Content-Type: application/json" \ + -d '{ + "urls": ["https://example.com"], + "schema": { + "type": "object", + "properties": { + "title": {"type": "string"}, + "description": {"type": "string"} + } } }' ``` @@ -80,16 +132,31 @@ curl -X POST http://localhost:3002/v0/crawl \ - **网站爬取**: 递归爬取整个网站 - **JavaScript 渲染**: 完全支持动态 JavaScript 渲染的页面 - **Markdown 输出**: 将网页内容清晰地转换为 markdown -- **速率限制**: 内置速率限制以防止滥用 +- **结构化数据提取**: 使用 JSON Schema 提取数据 +- **队列管理**: 内置 Bull 作业队列 +- **速率限制**: 可配置的速率限制 - **代理支持**: 所有请求的可选代理配置 +- **AI 驱动功能**: 可选的 OpenAI 集成以进行高级提取 + +## 架构 + +此部署使用官方 Firecrawl 架构: + +- **API 服务器**: 处理 HTTP 请求并管理作业队列 +- **工作进程**: 内置于主容器中,处理抓取作业 +- **PostgreSQL**: 存储队列元数据和作业信息 +- **Redis**: 处理作业队列和缓存 +- **Playwright 服务**: 提供浏览器自动化功能 ## 注意事项 -- 该服务使用 Playwright 进行浏览器自动化,支持复杂的网页 -- Redis 用于作业队列和缓存 -- 速率限制可通过环境变量配置 -- 对于生产环境,考虑扩展工作进程数量 -- BLOCK_MEDIA 可以通过阻止图像/视频来减少内存使用 +- 该服务使用官方的 `ghcr.io/firecrawl/firecrawl` 镜像 +- PostgreSQL 使用官方的 `ghcr.io/firecrawl/nuq-postgres` 镜像进行队列管理(NUQ - Not Quite Bull) +- Redis 默认不使用密码(运行在私有网络上) +- 对于生产环境,启用 `USE_DB_AUTHENTICATION` 并配置 Supabase +- 在生产部署中应更改 `BULL_AUTH_KEY` +- AI 功能需要 `OPENAI_API_KEY` 或 `OLLAMA_BASE_URL` +- 所有工作进程都在单个 API 容器中使用 harness 模式运行 ## 许可证 diff --git a/src/firecrawl/docker-compose.yaml b/src/firecrawl/docker-compose.yaml index 137a796..e7f1323 100644 --- a/src/firecrawl/docker-compose.yaml +++ b/src/firecrawl/docker-compose.yaml @@ -6,68 +6,41 @@ x-default: &default max-size: 100m max-file: "3" +x-common-env: &common-env + REDIS_URL: ${REDIS_URL:-redis://redis:6379} + REDIS_RATE_LIMIT_URL: ${REDIS_URL:-redis://redis:6379} + PLAYWRIGHT_MICROSERVICE_URL: ${PLAYWRIGHT_MICROSERVICE_URL:-http://playwright-service:3000/scrape} + NUQ_DATABASE_URL: ${NUQ_DATABASE_URL:-postgres://postgres:postgres@nuq-postgres:5432/postgres} + USE_DB_AUTHENTICATION: ${USE_DB_AUTHENTICATION:-false} + OPENAI_API_KEY: ${OPENAI_API_KEY:-} + OPENAI_BASE_URL: ${OPENAI_BASE_URL:-} + MODEL_NAME: ${MODEL_NAME:-} + MODEL_EMBEDDING_NAME: ${MODEL_EMBEDDING_NAME:-} + OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-} + SLACK_WEBHOOK_URL: ${SLACK_WEBHOOK_URL:-} + BULL_AUTH_KEY: ${BULL_AUTH_KEY:-@} + TEST_API_KEY: ${TEST_API_KEY:-} + POSTHOG_API_KEY: ${POSTHOG_API_KEY:-} + POSTHOG_HOST: ${POSTHOG_HOST:-} + SUPABASE_ANON_TOKEN: ${SUPABASE_ANON_TOKEN:-} + SUPABASE_URL: ${SUPABASE_URL:-} + SUPABASE_SERVICE_TOKEN: ${SUPABASE_SERVICE_TOKEN:-} + SELF_HOSTED_WEBHOOK_URL: ${SELF_HOSTED_WEBHOOK_URL:-} + SERPER_API_KEY: ${SERPER_API_KEY:-} + SEARCHAPI_API_KEY: ${SEARCHAPI_API_KEY:-} + LOGGING_LEVEL: ${LOGGING_LEVEL:-info} + PROXY_SERVER: ${PROXY_SERVER:-} + PROXY_USERNAME: ${PROXY_USERNAME:-} + PROXY_PASSWORD: ${PROXY_PASSWORD:-} + SEARXNG_ENDPOINT: ${SEARXNG_ENDPOINT:-} + SEARXNG_ENGINES: ${SEARXNG_ENGINES:-} + SEARXNG_CATEGORIES: ${SEARXNG_CATEGORIES:-} + services: - firecrawl: + playwright-service: <<: *default - image: mendableai/firecrawl:${FIRECRAWL_VERSION:-v1.16.0} - ports: - - "${FIRECRAWL_PORT_OVERRIDE:-3002}:3002" + image: ghcr.io/firecrawl/playwright-service:${PLAYWRIGHT_VERSION:-latest} environment: - TZ: ${TZ:-UTC} - REDIS_URL: redis://:${REDIS_PASSWORD:-firecrawl}@redis:6379 - PLAYWRIGHT_MICROSERVICE_URL: http://playwright:3000 - PORT: 3002 - NUM_WORKERS_PER_QUEUE: ${NUM_WORKERS_PER_QUEUE:-8} - SCRAPE_RATE_LIMIT_TOKEN_BUCKET_SIZE: ${SCRAPE_RATE_LIMIT_TOKEN_BUCKET_SIZE:-20} - SCRAPE_RATE_LIMIT_TOKEN_BUCKET_REFILL: ${SCRAPE_RATE_LIMIT_TOKEN_BUCKET_REFILL:-1} - depends_on: - redis: - condition: service_healthy - playwright: - condition: service_started - deploy: - resources: - limits: - cpus: '2.0' - memory: 4G - reservations: - cpus: '1.0' - memory: 2G - healthcheck: - test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3002/health"] - interval: 30s - timeout: 10s - retries: 3 - start_period: 30s - - redis: - <<: *default - image: redis:${REDIS_VERSION:-7.4.2-alpine} - command: redis-server --requirepass ${REDIS_PASSWORD:-firecrawl} --appendonly yes - environment: - - TZ=${TZ:-UTC} - volumes: - - redis_data:/data - deploy: - resources: - limits: - cpus: '1.0' - memory: 512M - reservations: - cpus: '0.5' - memory: 256M - healthcheck: - test: ["CMD", "redis-cli", "ping"] - interval: 10s - timeout: 3s - retries: 3 - start_period: 5s - - playwright: - <<: *default - image: mendableai/firecrawl-playwright:${PLAYWRIGHT_VERSION:-latest} - environment: - TZ: ${TZ:-UTC} PORT: 3000 PROXY_SERVER: ${PROXY_SERVER:-} PROXY_USERNAME: ${PROXY_USERNAME:-} @@ -76,11 +49,102 @@ services: deploy: resources: limits: - cpus: '2.0' - memory: 2G - reservations: - cpus: '1.0' + cpus: "1.0" memory: 1G + reservations: + cpus: "0.5" + memory: 512M + + api: + <<: *default + image: ghcr.io/firecrawl/firecrawl:${FIRECRAWL_VERSION:-latest} + environment: + <<: *common-env + HOST: 0.0.0.0 + PORT: ${INTERNAL_PORT:-3002} + EXTRACT_WORKER_PORT: ${EXTRACT_WORKER_PORT:-3004} + WORKER_PORT: ${WORKER_PORT:-3005} + ENV: local + depends_on: + redis: + condition: service_healthy + playwright-service: + condition: service_started + nuq-postgres: + condition: service_started + ports: + - "${FIRECRAWL_PORT_OVERRIDE:-3002}:${INTERNAL_PORT:-3002}" + command: node dist/src/harness.js --start-docker + deploy: + resources: + limits: + cpus: "2.0" + memory: 4G + reservations: + cpus: "1.0" + memory: 2G + healthcheck: + test: + [ + "CMD", + "wget", + "--no-verbose", + "--tries=1", + "--spider", + "http://localhost:3002/health", + ] + interval: 30s + timeout: 10s + retries: 3 + start_period: 30s + + redis: + <<: *default + image: redis:${REDIS_VERSION:-alpine} + command: redis-server --bind 0.0.0.0 + volumes: + - redis_data:/data + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.25" + memory: 256M + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 10s + timeout: 3s + retries: 3 + start_period: 5s + + nuq-postgres: + <<: *default + image: ghcr.io/firecrawl/nuq-postgres:${NUQ_POSTGRES_VERSION:-latest} + environment: + POSTGRES_USER: ${POSTGRES_USER:-postgres} + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres} + POSTGRES_DB: ${POSTGRES_DB:-postgres} + ports: + - "${POSTGRES_PORT_OVERRIDE:-5432}:5432" + volumes: + - postgres_data:/var/lib/postgresql/data + deploy: + resources: + limits: + cpus: "1.0" + memory: 1G + reservations: + cpus: "0.5" + memory: 512M + healthcheck: + test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres}"] + interval: 10s + timeout: 5s + retries: 5 + start_period: 10s volumes: redis_data: + postgres_data: diff --git a/src/jodconverter/.env.example b/src/jodconverter/.env.example new file mode 100644 index 0000000..86a34be --- /dev/null +++ b/src/jodconverter/.env.example @@ -0,0 +1,17 @@ +# OfficeConverter (based on jodconverter) version +OFFICECONVERTER_VERSION=latest + +# Timezone +TZ=UTC + +# LibreOffice instances for document conversion +CONVERTER_LIBREOFFICE_INSTANCES=2 + +# Maximum conversion queue size +CONVERTER_QUEUE_SIZE=1000 + +# Java heap memory configuration +JAVA_OPTS=-Xmx1024m + +# Port override (optional) +# OFFICECONVERTER_PORT_OVERRIDE=8000 diff --git a/src/jodconverter/README.md b/src/jodconverter/README.md new file mode 100644 index 0000000..5b3ed00 --- /dev/null +++ b/src/jodconverter/README.md @@ -0,0 +1,169 @@ +# OfficeConverter (JODConverter) + +[English](./README.md) | [中文](./README.zh.md) + +This service deploys OfficeConverter, a modern REST API for document conversion based on JODConverter and LibreOffice. It automates document conversions between various formats including Word, PDF, Excel, PowerPoint, and more. The officeconverter project is an extended and actively maintained version of jodconverter-samples-rest. + +## Services + +- `officeconverter`: The REST API service for document conversion with integrated LibreOffice instances. + +## Environment Variables + +| Variable Name | Description | Default Value | +| ------------------------------- | ---------------------------------------- | ------------- | +| OFFICECONVERTER_VERSION | OfficeConverter image version | `latest` | +| OFFICECONVERTER_PORT_OVERRIDE | Host port mapping (maps to port 8000) | 8000 | +| CONVERTER_LIBREOFFICE_INSTANCES | Number of parallel LibreOffice instances | `2` | +| CONVERTER_QUEUE_SIZE | Maximum conversion queue size | `1000` | +| JAVA_OPTS | Java heap memory configuration | `-Xmx1024m` | +| TZ | Timezone | `UTC` | + +Please modify the `.env` file as needed for your use case. + +## Volumes + +- `officeconverter_config`: A volume for storing OfficeConverter configuration at `/etc/app`. + +## Usage + +1. Start the service: + + ```bash + docker compose up -d + ``` + +2. The OfficeConverter REST API will be available at `http://localhost:8000` (or your configured port). + +3. Check service readiness at `http://localhost:8000/ready` + +## Document Conversion + +### Basic Conversion + +Convert a document using the REST API: + +```bash +curl -X POST http://localhost:8000/conversion?format=pdf \ + -F "file=@input.docx" \ + -o output.pdf +``` + +### REST Endpoints + +- `POST /conversion?format=` - Convert a document to the specified format + - Query parameter: `format` - Output format (e.g., pdf, html, docx, xlsx) + - Form parameter: `file` - The file to convert +- `GET /ready` - Health check endpoint + +### Supported Formats + +OfficeConverter supports conversion between various document formats including: + +- Documents: DOCX, DOC, ODT, RTF, TXT, DOTX +- Spreadsheets: XLSX, XLS, ODS, CSV, XLTX +- Presentations: PPTX, PPT, ODP +- PDF and HTML conversion + +Additional formats can be added by editing `src/resources/document-formats.json`. + +## Configuration + +### LibreOffice Instances + +Control the number of LibreOffice instances for parallel document processing: + +```dotenv +CONVERTER_LIBREOFFICE_INSTANCES=4 +``` + +More instances allow for greater concurrency but consume more memory. + +### Memory Configuration + +Adjust Java heap memory based on your conversion load: + +```dotenv +JAVA_OPTS=-Xmx2048m +``` + +### Custom Configuration + +Mount a custom `application.yml` file for advanced configuration: + +```yaml +# /etc/app/application.yml +converter: + libreoffice-instances: 4 + queue: + max-size: 2000 +``` + +## Resource Limits + +- CPU: Limited to 2 cores with a reservation of 0.5 cores +- Memory: Limited to 2 GB with a reservation of 512 MB + +The resource limits can be adjusted in docker-compose.yaml based on your conversion workload. + +## Health Checks + +The service includes a health check that verifies the `/ready` endpoint. The container will be considered healthy after 30 seconds of successful health checks. + +## Advanced Usage + +### Conversion with Options + +Some conversions support additional parameters. Check the OfficeConverter documentation for advanced options. + +### Monitoring + +View logs to monitor conversion activity: + +```bash +docker compose logs -f officeconverter +``` + +### Performance Tuning + +For high-volume conversion workloads, consider: + +- Increasing `CONVERTER_LIBREOFFICE_INSTANCES` to 4-8 +- Increasing `JAVA_OPTS` memory limit +- Increasing `CONVERTER_QUEUE_SIZE` for more pending jobs + +## Troubleshooting + +### Service Not Ready + +Check if the service is fully initialized: + +```bash +curl http://localhost:8000/ready +``` + +If not ready, check the logs: + +```bash +docker compose logs officeconverter +``` + +### Memory Issues + +If conversions fail with memory errors, increase the Java heap: + +```dotenv +JAVA_OPTS=-Xmx2048m +``` + +And increase the memory limit in docker-compose.yaml. + +### Conversion Failures + +Check service logs for detailed error messages: + +```bash +docker compose logs officeconverter | grep -i error +``` + +For more information, visit the [OfficeConverter GitHub repository](https://github.com/EugenMayer/officeconverter). diff --git a/src/jodconverter/README.zh.md b/src/jodconverter/README.zh.md new file mode 100644 index 0000000..77ff5b4 --- /dev/null +++ b/src/jodconverter/README.zh.md @@ -0,0 +1,169 @@ +# OfficeConverter(JODConverter) + +[English](./README.md) | [中文](./README.zh.md) + +此服务部署 OfficeConverter,一个基于 JODConverter 和 LibreOffice 的现代 REST API 文档转换服务。它自动进行文档转换,支持多种格式包括 Word、PDF、Excel、PowerPoint 等。officeconverter 项目是 jodconverter-samples-rest 的扩展和积极维护的版本。 + +## 服务 + +- `officeconverter`:具有集成 LibreOffice 实例的 REST API 文档转换服务。 + +## 环境变量 + +| 变量名 | 描述 | 默认值 | +| ------------------------------- | ------------------------------- | ----------- | +| OFFICECONVERTER_VERSION | OfficeConverter 镜像版本 | `latest` | +| OFFICECONVERTER_PORT_OVERRIDE | 主机端口映射(映射到端口 8000) | 8000 | +| CONVERTER_LIBREOFFICE_INSTANCES | 并行 LibreOffice 实例数 | `2` | +| CONVERTER_QUEUE_SIZE | 最大转换队列大小 | `1000` | +| JAVA_OPTS | Java 堆内存配置 | `-Xmx1024m` | +| TZ | 时区 | `UTC` | + +请根据您的使用情况修改 `.env` 文件。 + +## 卷 + +- `officeconverter_config`:用于存储 OfficeConverter 配置的卷,位于 `/etc/app`。 + +## 使用方法 + +1. 启动服务: + + ```bash + docker compose up -d + ``` + +2. OfficeConverter REST API 将在 `http://localhost:8000`(或您配置的端口)上可用。 + +3. 在 `http://localhost:8000/ready` 检查服务就绪状态 + +## 文档转换 + +### 基本转换 + +使用 REST API 转换文档: + +```bash +curl -X POST http://localhost:8000/conversion?format=pdf \ + -F "file=@input.docx" \ + -o output.pdf +``` + +### REST 端点 + +- `POST /conversion?format=` - 将文档转换为指定格式 + - 查询参数:`format` - 输出格式(例如 pdf、html、docx、xlsx) + - 表单参数:`file` - 待转换文件 +- `GET /ready` - 健康检查端点 + +### 支持的格式 + +OfficeConverter 支持各种文档格式之间的转换,包括: + +- 文档:DOCX、DOC、ODT、RTF、TXT、DOTX +- 电子表格:XLSX、XLS、ODS、CSV、XLTX +- 演示文稿:PPTX、PPT、ODP +- PDF 和 HTML 转换 + +可以通过编辑 `src/resources/document-formats.json` 添加其他格式。 + +## 配置 + +### LibreOffice 实例 + +控制 LibreOffice 实例数量以实现并行文档处理: + +```dotenv +CONVERTER_LIBREOFFICE_INSTANCES=4 +``` + +更多实例允许更高的并发性,但会消耗更多内存。 + +### 内存配置 + +根据您的转换负载调整 Java 堆内存: + +```dotenv +JAVA_OPTS=-Xmx2048m +``` + +### 自定义配置 + +挂载自定义 `application.yml` 文件以进行高级配置: + +```yaml +# /etc/app/application.yml +converter: + libreoffice-instances: 4 + queue: + max-size: 2000 +``` + +## 资源限制 + +- CPU:限制为 2 核,预留 0.5 核 +- 内存:限制为 2 GB,预留 512 MB + +资源限制可以根据您的转换工作负载在 docker-compose.yaml 中调整。 + +## 健康检查 + +该服务包括一个健康检查,验证 `/ready` 端点。在 30 秒的成功健康检查后,容器将被视为健康。 + +## 高级使用 + +### 带选项的转换 + +某些转换支持其他参数。查看 OfficeConverter 文档了解高级选项。 + +### 监控 + +查看日志以监视转换活动: + +```bash +docker compose logs -f officeconverter +``` + +### 性能调优 + +对于高容量转换工作负载,请考虑: + +- 将 `CONVERTER_LIBREOFFICE_INSTANCES` 增加到 4-8 +- 增加 `JAVA_OPTS` 内存限制 +- 增加 `CONVERTER_QUEUE_SIZE` 以支持更多待处理作业 + +## 故障排除 + +### 服务未就绪 + +检查服务是否已完全初始化: + +```bash +curl http://localhost:8000/ready +``` + +如果未就绪,检查日志: + +```bash +docker compose logs officeconverter +``` + +### 内存问题 + +如果转换因内存错误而失败,请增加 Java 堆: + +```dotenv +JAVA_OPTS=-Xmx2048m +``` + +并增加 docker-compose.yaml 中的内存限制。 + +### 转换失败 + +检查服务日志以获取详细错误消息: + +```bash +docker compose logs officeconverter | grep -i error +``` + +有关更多信息,请访问 [OfficeConverter GitHub 仓库](https://github.com/EugenMayer/officeconverter)。 diff --git a/src/jodconverter/docker-compose.yaml b/src/jodconverter/docker-compose.yaml new file mode 100644 index 0000000..6017b86 --- /dev/null +++ b/src/jodconverter/docker-compose.yaml @@ -0,0 +1,38 @@ +x-default: &default + restart: unless-stopped + logging: + driver: json-file + options: + max-size: 100m + max-file: "3" + +services: + officeconverter: + <<: *default + image: ghcr.io/eugenmayer/kontextwork-converter:${OFFICECONVERTER_VERSION:-latest} + ports: + - "${OFFICECONVERTER_PORT_OVERRIDE:-8000}:8000" + volumes: + - officeconverter_config:/etc/app + environment: + - TZ=${TZ:-UTC} + - CONVERTER_LIBREOFFICE_INSTANCES=${CONVERTER_LIBREOFFICE_INSTANCES:-2} + - CONVERTER_QUEUE_SIZE=${CONVERTER_QUEUE_SIZE:-1000} + - JAVA_OPTS=${JAVA_OPTS:--Xmx1024m} + deploy: + resources: + limits: + cpus: '2.00' + memory: 2G + reservations: + cpus: '0.50' + memory: 512M + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8000/ready"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 30s + +volumes: + officeconverter_config: diff --git a/src/libreoffice/.env.example b/src/libreoffice/.env.example new file mode 100644 index 0000000..7d51a28 --- /dev/null +++ b/src/libreoffice/.env.example @@ -0,0 +1,21 @@ +# LibreOffice version +LIBREOFFICE_VERSION=latest + +# User and group IDs for permission management +PUID=1000 +PGID=1000 + +# Timezone +TZ=UTC + +# HTTP Basic Authentication (optional) +# If PASSWORD is set, CUSTOM_USER will be used as the username +CUSTOM_USER=abc +# PASSWORD=your_password + +# Umask for file permissions (default: 022) +UMASK=022 + +# Port overrides (optional) +# LIBREOFFICE_HTTP_PORT_OVERRIDE=3000 +# LIBREOFFICE_HTTPS_PORT_OVERRIDE=3001 diff --git a/src/libreoffice/README.md b/src/libreoffice/README.md new file mode 100644 index 0000000..447f2bf --- /dev/null +++ b/src/libreoffice/README.md @@ -0,0 +1,83 @@ +# LibreOffice + +[English](./README.md) | [中文](./README.zh.md) + +This service deploys LibreOffice, a free and open-source office suite. The linuxserver.io image provides a desktop GUI accessible through a web browser with HTTPS support. + +## Services + +- `libreoffice`: The LibreOffice desktop environment accessible via web browser. + +## Environment Variables + +| Variable Name | Description | Default Value | +| ------------------------------- | ----------------------------------------------------- | ------------- | +| LIBREOFFICE_VERSION | LibreOffice image version | `latest` | +| LIBREOFFICE_HTTP_PORT_OVERRIDE | Host port mapping for HTTP (maps to port 3000) | 3000 | +| LIBREOFFICE_HTTPS_PORT_OVERRIDE | Host port mapping for HTTPS (maps to port 3001) | 3001 | +| PUID | User ID for permission management | `1000` | +| PGID | Group ID for permission management | `1000` | +| CUSTOM_USER | Username for HTTP Basic Auth | `abc` | +| PASSWORD | Password for HTTP Basic Auth (leave empty to disable) | (empty) | +| TZ | Timezone | `UTC` | +| UMASK | Umask for file permissions | `022` | + +Please modify the `.env` file as needed for your use case. + +## Volumes + +- `libreoffice_config`: A volume for storing LibreOffice user home directory, program settings, and documents. + +## Usage + +1. Start the service: + + ```bash + docker compose up -d + ``` + +2. The service will be available at: + - HTTP: `http://localhost:3000` + - HTTPS: `https://localhost:3001` + +3. Access the LibreOffice desktop through your web browser. + +## Security + +**HTTPS is required for full functionality.** Modern browser features such as WebCodecs used for video and audio will not function over an insecure HTTP connection. + +### Authentication + +By default, the container has no authentication. To enable HTTP Basic Auth: + +1. Set the `PASSWORD` environment variable in your `.env` file +2. Optionally customize the `CUSTOM_USER` (default: `abc`) + +For internet exposure, we strongly recommend placing the container behind a reverse proxy with robust authentication. + +### Important Security Note + +This container includes: + +- Privileged access to system resources (due to GUI requirements) +- A terminal with passwordless `sudo` access within the container +- Any user with access to the GUI can gain root control within the container + +**Do not expose this container to the Internet unless properly secured.** + +## Configuration + +- User and group IDs can be customized via `PUID` and `PGID` to match your host system +- Language support is available via `LC_ALL` environment variable (e.g., `LC_ALL=zh_CN.UTF-8` for Chinese) +- The `seccomp: unconfined` setting allows modern GUI applications to function on Docker + +## Resource Limits + +- CPU: Limited to 2 cores with a reservation of 0.5 cores +- Memory: Limited to 2 GB with a reservation of 512 MB + +## Troubleshooting + +If you encounter syscall-related errors, the `--security-opt seccomp=unconfined` setting (already included) should resolve them on older kernel versions. + +For more information, visit the [linuxserver.io LibreOffice documentation](https://docs.linuxserver.io/images/docker-libreoffice/). diff --git a/src/libreoffice/README.zh.md b/src/libreoffice/README.zh.md new file mode 100644 index 0000000..f49a4d6 --- /dev/null +++ b/src/libreoffice/README.zh.md @@ -0,0 +1,83 @@ +# LibreOffice + +[English](./README.md) | [中文](./README.zh.md) + +此服务部署 LibreOffice,一个免费开源的办公套件。linuxserver.io 镜像提供了一个可通过网络浏览器访问的桌面 GUI,支持 HTTPS。 + +## 服务 + +- `libreoffice`:可通过网络浏览器访问的 LibreOffice 桌面环境。 + +## 环境变量 + +| 变量名 | 描述 | 默认值 | +| ------------------------------- | ------------------------------------- | -------- | +| LIBREOFFICE_VERSION | LibreOffice 镜像版本 | `latest` | +| LIBREOFFICE_HTTP_PORT_OVERRIDE | HTTP 主机端口映射(映射到端口 3000) | 3000 | +| LIBREOFFICE_HTTPS_PORT_OVERRIDE | HTTPS 主机端口映射(映射到端口 3001) | 3001 | +| PUID | 用户 ID,用于权限管理 | `1000` | +| PGID | 组 ID,用于权限管理 | `1000` | +| CUSTOM_USER | HTTP 基本身份验证用户名 | `abc` | +| PASSWORD | HTTP 基本身份验证密码(留空禁用) | (空) | +| TZ | 时区 | `UTC` | +| UMASK | 文件权限掩码 | `022` | + +请根据您的使用情况修改 `.env` 文件。 + +## 卷 + +- `libreoffice_config`:用于存储 LibreOffice 用户主目录、程序设置和文档的卷。 + +## 使用方法 + +1. 启动服务: + + ```bash + docker compose up -d + ``` + +2. 该服务将在以下地址可用: + - HTTP:`http://localhost:3000` + - HTTPS:`https://localhost:3001` + +3. 通过网络浏览器访问 LibreOffice 桌面。 + +## 安全 + +**完整功能需要 HTTPS。** 现代浏览器功能(如用于视频和音频的 WebCodecs)在不安全的 HTTP 连接上无法运行。 + +### 身份验证 + +默认情况下,容器没有身份验证。要启用 HTTP 基本身份验证: + +1. 在 `.env` 文件中设置 `PASSWORD` 环境变量 +2. 可选地自定义 `CUSTOM_USER`(默认:`abc`) + +对于互联网暴露,我们强烈建议将容器放在具有强大身份验证机制的反向代理后面。 + +### 重要安全注意事项 + +此容器包括: + +- 对系统资源的特权访问(由于 GUI 需求) +- 容器内无密码 `sudo` 访问的终端 +- 任何有权访问 GUI 的用户都可以在容器内获得 root 控制权 + +**除非适当保护,否则不要将此容器暴露到互联网。** + +## 配置 + +- 用户和组 ID 可通过 `PUID` 和 `PGID` 自定义以匹配您的主机系统 +- 语言支持可通过 `LC_ALL` 环境变量获得(例如 `LC_ALL=zh_CN.UTF-8` 用于中文) +- `seccomp: unconfined` 设置允许现代 GUI 应用程序在 Docker 上运行 + +## 资源限制 + +- CPU:限制为 2 核,预留 0.5 核 +- 内存:限制为 2 GB,预留 512 MB + +## 故障排除 + +如果遇到系统调用相关错误,已包含的 `--security-opt seccomp=unconfined` 设置应该可以解决旧内核版本上的问题。 + +有关更多信息,请访问 [linuxserver.io LibreOffice 文档](https://docs.linuxserver.io/images/docker-libreoffice/)。 diff --git a/src/libreoffice/docker-compose.yaml b/src/libreoffice/docker-compose.yaml new file mode 100644 index 0000000..6c3db37 --- /dev/null +++ b/src/libreoffice/docker-compose.yaml @@ -0,0 +1,43 @@ +x-default: &default + restart: unless-stopped + logging: + driver: json-file + options: + max-size: 100m + max-file: "3" + +services: + libreoffice: + <<: *default + image: lscr.io/linuxserver/libreoffice:${LIBREOFFICE_VERSION:-latest} + ports: + - "${LIBREOFFICE_HTTP_PORT_OVERRIDE:-3000}:3000" + - "${LIBREOFFICE_HTTPS_PORT_OVERRIDE:-3001}:3001" + volumes: + - libreoffice_config:/config + environment: + - PUID=${PUID:-1000} + - PGID=${PGID:-1000} + - TZ=${TZ:-UTC} + - CUSTOM_USER=${CUSTOM_USER:-abc} + - PASSWORD=${PASSWORD:-} + - UMASK=${UMASK:-022} + security_opt: + - seccomp:unconfined + deploy: + resources: + limits: + cpus: '2.00' + memory: 2G + reservations: + cpus: '0.50' + memory: 512M + healthcheck: + test: ["CMD", "curl", "-f", "-k", "https://localhost:3001/"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 40s + +volumes: + libreoffice_config: diff --git a/src/ollama/docker-compose.yaml b/src/ollama/docker-compose.yaml index cb52770..57ff871 100644 --- a/src/ollama/docker-compose.yaml +++ b/src/ollama/docker-compose.yaml @@ -9,7 +9,7 @@ x-default: &default services: ollama: <<: *default - image: ollama/ollama:${OLLAMA_VERSION:-0.12.0} + image: ollama/ollama:${OLLAMA_VERSION:-0.12.6} ports: - "${OLLAMA_PORT_OVERRIDE:-11434}:11434" volumes: diff --git a/src/portkey-gateway/.env.example b/src/portkey-gateway/.env.example new file mode 100644 index 0000000..c91e474 --- /dev/null +++ b/src/portkey-gateway/.env.example @@ -0,0 +1,9 @@ +# Portkey Gateway Configuration +# ============================ + +# Portkey Gateway Service +PORTKEY_GATEWAY_VERSION=latest +PORTKEY_GATEWAY_PORT_OVERRIDE=8787 + +# Timezone +TZ=UTC diff --git a/src/portkey-gateway/README.md b/src/portkey-gateway/README.md new file mode 100644 index 0000000..ef9432a --- /dev/null +++ b/src/portkey-gateway/README.md @@ -0,0 +1,63 @@ +# Portkey AI Gateway + +[Portkey AI Gateway](https://github.com/Portkey-AI/gateway) is a blazing fast, open-source AI Gateway that allows you to route to 200+ language, vision, audio, and image models from a single API. It provides reliable routing, security features, cost management, and enterprise-ready deployment options. + +## Features + +- **Multi-LLM Routing**: Route to 200+ LLMs with a single API +- **Reliable Routing**: Fallbacks, automatic retries, load balancing, and request timeouts +- **Security & Accuracy**: Guardrails, secure key management, RBAC, SOC2/HIPAA/GDPR compliance +- **Cost Management**: Smart caching, usage analytics, provider optimization +- **Collaboration**: Agent framework support, prompt template management +- **Enterprise Ready**: Private deployments with advanced capabilities + +## Quick Start + +```bash +docker compose up -d +``` + +The gateway will be available at `http://localhost:8787` + +Access the console at `http://localhost:8787/public/` + +## Environment Variables + +- `PORTKEY_GATEWAY_VERSION`: Docker image version (default: `latest`) +- `PORTKEY_GATEWAY_PORT_OVERRIDE`: Host port to expose (default: `8787`) +- `TZ`: Timezone (default: `UTC`) + +## Documentation + +- [Portkey Gateway Documentation](https://portkey.ai/docs) +- [GitHub Repository](https://github.com/Portkey-AI/gateway) +- [API Reference](https://portkey.ai/docs/welcome/make-your-first-request) + +## Default Port + +- **Gateway API**: `8787` () +- **Console**: `8787` () + +## Configuration + +The gateway provides an extensive configuration system through the console. Key features include: + +- Model routing rules and conditions +- Fallback and retry strategies +- Input/output guardrails +- Custom plugins and integrations +- Key management and virtual keys + +Visit the console at `http://localhost:8787/public/` to configure the gateway. + +## Integrations + +Portkey Gateway integrates with: + +- **LLM Frameworks**: LangChain, LlamaIndex, Autogen, CrewAI +- **Agent Frameworks**: Support for custom agents +- **Monitoring**: Logging and tracing capabilities + +## License + +Portkey AI Gateway is open-source and available under the MIT License. diff --git a/src/portkey-gateway/README.zh.md b/src/portkey-gateway/README.zh.md new file mode 100644 index 0000000..8d926f3 --- /dev/null +++ b/src/portkey-gateway/README.zh.md @@ -0,0 +1,63 @@ +# Portkey AI 网关 + +[Portkey AI 网关](https://github.com/Portkey-AI/gateway)是一个快速、开源的 AI 网关,允许您通过单个 API 路由到 200+ 个语言、视觉、音频和图像模型。它提供可靠的路由、安全功能、成本管理和企业级部署选项。 + +## 特性 + +- **多 LLM 路由**:通过单个 API 路由到 200+ 个 LLM +- **可靠的路由**:故障转移、自动重试、负载均衡和请求超时 +- **安全性和准确性**:防护栏、安全密钥管理、RBAC、SOC2/HIPAA/GDPR 合规 +- **成本管理**:智能缓存、使用分析、提供者优化 +- **协作**:代理框架支持、提示模板管理 +- **企业级就绪**:具有高级功能的私有部署 + +## 快速开始 + +```bash +docker compose up -d +``` + +网关将在 `http://localhost:8787` 可用 + +访问控制台 `http://localhost:8787/public/` + +## 环境变量 + +- `PORTKEY_GATEWAY_VERSION`:Docker 镜像版本(默认:`latest`) +- `PORTKEY_GATEWAY_PORT_OVERRIDE`:暴露的主机端口(默认:`8787`) +- `TZ`:时区(默认:`UTC`) + +## 文档 + +- [Portkey 网关文档](https://portkey.ai/docs) +- [GitHub 仓库](https://github.com/Portkey-AI/gateway) +- [API 参考](https://portkey.ai/docs/welcome/make-your-first-request) + +## 默认端口 + +- **网关 API**:`8787`() +- **控制台**:`8787`() + +## 配置 + +网关通过控制台提供广泛的配置系统。主要功能包括: + +- 模型路由规则和条件 +- 故障转移和重试策略 +- 输入/输出防护栏 +- 自定义插件和集成 +- 密钥管理和虚拟密钥 + +访问控制台 `http://localhost:8787/public/` 来配置网关。 + +## 集成 + +Portkey 网关与以下集成: + +- **LLM 框架**:LangChain、LlamaIndex、Autogen、CrewAI +- **代理框架**:支持自定义代理 +- **监控**:日志记录和跟踪功能 + +## 许可证 + +Portkey AI 网关是开源的,采用 MIT 许可证。 diff --git a/src/portkey-gateway/docker-compose.yaml b/src/portkey-gateway/docker-compose.yaml new file mode 100644 index 0000000..9991a59 --- /dev/null +++ b/src/portkey-gateway/docker-compose.yaml @@ -0,0 +1,30 @@ +x-default: &default + restart: unless-stopped + logging: + driver: json-file + options: + max-size: 100m + max-file: "3" + +services: + portkey-gateway: + <<: *default + image: portkeyai/gateway:${PORTKEY_GATEWAY_VERSION:-latest} + ports: + - "${PORTKEY_GATEWAY_PORT_OVERRIDE:-8787}:8787" + environment: + - TZ=${TZ:-UTC} + healthcheck: + test: ["CMD", "wget", "--spider", "-q", "http://localhost:8787"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 10s + deploy: + resources: + limits: + cpus: '1.00' + memory: 512M + reservations: + cpus: '0.25' + memory: 128M