feat: add nexa-sdk

2025-11-16 00:12:14 +08:00
parent 5f9820e7db
commit 1c42cb2800
9 changed files with 616 additions and 5 deletions
--- a/builds/nexa-sdk/README.md
+++ b/builds/nexa-sdk/README.md
@@ -0,0 +1,233 @@
+# Nexa SDK
+
+Nexa SDK is a comprehensive toolkit for running AI models locally. It provides inference for various model types including LLM, VLM (Vision Language Models), TTS (Text-to-Speech), ASR (Automatic Speech Recognition), and more. Built with performance in mind, it supports both CPU and GPU acceleration.
+
+## Features
+
+- **Multi-Model Support**: Run LLM, VLM, TTS, ASR, embedding, reranking, and image generation models
+- **OpenAI-Compatible API**: Provides standard OpenAI API endpoints for easy integration
+- **GPU Acceleration**: Optional GPU support via NVIDIA CUDA for faster inference
+- **Resource Management**: Configurable CPU/memory limits and GPU layer offloading
+- **Model Caching**: Persistent model storage for faster startup
+- **Profile Support**: Easy switching between CPU-only and GPU-accelerated modes
+
+## Quick Start
+
+### Prerequisites
+
+- Docker and Docker Compose
+- For GPU support: NVIDIA Docker runtime and compatible GPU
+
+### Basic Usage (CPU)
+
+```bash
+# Copy environment file
+cp .env.example .env
+
+# Edit .env to configure your model and settings
+# NEXA_MODEL=gemma-2-2b-instruct
+
+# Start the service with CPU profile
+docker compose --profile cpu up -d
+```
+
+### GPU-Accelerated Usage
+
+```bash
+# Copy environment file
+cp .env.example .env
+
+# Configure for GPU usage
+# NEXA_MODEL=gemma-2-2b-instruct
+# NEXA_GPU_LAYERS=-1  # -1 means all layers on GPU
+
+# Start the service with GPU profile
+docker compose --profile gpu up -d
+```
+
+## Configuration
+
+### Environment Variables
+
+| Variable                 | Default               | Description                                            |
+| ------------------------ | --------------------- | ------------------------------------------------------ |
+| `NEXA_SDK_VERSION`       | `latest`              | Nexa SDK Docker image version                          |
+| `NEXA_SDK_PORT_OVERRIDE` | `8080`                | Host port for API access                               |
+| `NEXA_MODEL`             | `gemma-2-2b-instruct` | Model to load (e.g., qwen3-4b, llama-3-8b, mistral-7b) |
+| `NEXA_HOST`              | `0.0.0.0:8080`        | Server bind address                                    |
+| `NEXA_KEEPALIVE`         | `300`                 | Model keepalive timeout in seconds                     |
+| `NEXA_ORIGINS`           | `*`                   | CORS allowed origins                                   |
+| `NEXA_HFTOKEN`           | -                     | HuggingFace token for private models                   |
+| `NEXA_LOG`               | `none`                | Logging level (none, debug, info, warn, error)         |
+| `NEXA_GPU_LAYERS`        | `-1`                  | GPU layers to offload (-1 = all, 0 = CPU only)         |
+| `NEXA_SHM_SIZE`          | `2g`                  | Shared memory size                                     |
+| `TZ`                     | `UTC`                 | Container timezone                                     |
+
+### Resource Limits
+
+| Variable                      | Default | Description        |
+| ----------------------------- | ------- | ------------------ |
+| `NEXA_SDK_CPU_LIMIT`          | `4.0`   | Maximum CPU cores  |
+| `NEXA_SDK_MEMORY_LIMIT`       | `8G`    | Maximum memory     |
+| `NEXA_SDK_CPU_RESERVATION`    | `2.0`   | Reserved CPU cores |
+| `NEXA_SDK_MEMORY_RESERVATION` | `4G`    | Reserved memory    |
+
+### Profiles
+
+- `cpu`: Run with CPU-only inference (default profile needed)
+- `gpu`: Run with GPU acceleration (requires NVIDIA GPU)
+
+## Usage Examples
+
+### Test the API
+
+```bash
+# Check available models
+curl http://localhost:8080/v1/models
+
+# Chat completion
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemma-2-2b-instruct",
+    "messages": [
+      {"role": "user", "content": "Hello!"}
+    ]
+  }'
+```
+
+### Using Different Models
+
+Edit `.env` to change the model:
+
+```bash
+# Small models for limited resources
+NEXA_MODEL=gemma-2-2b-instruct
+# or
+NEXA_MODEL=qwen3-4b
+
+# Larger models for better quality
+NEXA_MODEL=llama-3-8b
+# or
+NEXA_MODEL=mistral-7b
+```
+
+### GPU Configuration
+
+For GPU acceleration, adjust the number of layers:
+
+```bash
+# Offload all layers to GPU (fastest)
+NEXA_GPU_LAYERS=-1
+
+# Offload 30 layers (hybrid mode)
+NEXA_GPU_LAYERS=30
+
+# CPU only
+NEXA_GPU_LAYERS=0
+```
+
+## Model Management
+
+Models are automatically downloaded on first run and cached in the `nexa_models` volume. The default cache location inside the container is `/root/.cache/nexa`.
+
+To use a different model:
+
+1. Update `NEXA_MODEL` in `.env`
+2. Restart the service: `docker compose --profile <cpu|gpu> restart`
+
+## API Endpoints
+
+Nexa SDK provides OpenAI-compatible API endpoints:
+
+- `GET /v1/models` - List available models
+- `POST /v1/chat/completions` - Chat completions
+- `POST /v1/completions` - Text completions
+- `POST /v1/embeddings` - Text embeddings
+- `GET /health` - Health check
+- `GET /docs` - API documentation (Swagger UI)
+
+## Troubleshooting
+
+### Out of Memory
+
+Increase memory limits or use a smaller model:
+
+```bash
+NEXA_SDK_MEMORY_LIMIT=16G
+NEXA_SDK_MEMORY_RESERVATION=8G
+# Or switch to a smaller model
+NEXA_MODEL=gemma-2-2b-instruct
+```
+
+### GPU Not Detected
+
+Ensure NVIDIA Docker runtime is installed:
+
+```bash
+# Check GPU availability
+docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi
+```
+
+### Model Download Issues
+
+Set HuggingFace token if accessing private models:
+
+```bash
+NEXA_HFTOKEN=your_hf_token_here
+```
+
+### Slow Performance
+
+- Use GPU profile for better performance
+- Increase `NEXA_GPU_LAYERS` to offload more computation to GPU
+- Allocate more resources or use a smaller model
+
+## Advanced Configuration
+
+### Custom Model Path
+
+If you want to use local model files, mount them as a volume:
+
+```yaml
+volumes:
+  - ./models:/models
+  - nexa_models:/root/.cache/nexa
+```
+
+Then reference the model by its path in the command.
+
+### HTTPS Configuration
+
+Set environment variables for HTTPS:
+
+```bash
+NEXA_ENABLEHTTPS=true
+```
+
+Mount certificate files:
+
+```yaml
+volumes:
+  - ./certs/cert.pem:/app/cert.pem:ro
+  - ./certs/key.pem:/app/key.pem:ro
+```
+
+## Health Check
+
+The service includes a health check that verifies the API is responding:
+
+```bash
+curl http://localhost:8080/v1/models
+```
+
+## License
+
+Nexa SDK is developed by Nexa AI. Please refer to the [official repository](https://github.com/NexaAI/nexa-sdk) for license information.
+
+## Links
+
+- [Official Repository](https://github.com/NexaAI/nexa-sdk)
+- [Nexa AI Website](https://nexa.ai)
+- [Documentation](https://docs.nexa.ai)
+- [Model Hub](https://sdk.nexa.ai)