Files
compose-anything/src/nexa-sdk
2025-12-20 19:58:14 +08:00
..
2025-12-20 19:58:14 +08:00
2025-12-20 19:58:14 +08:00
2025-12-20 19:58:14 +08:00
2025-12-20 19:58:14 +08:00

NexaSDK

English | 中文

This service deploys NexaSDK Docker for running AI models with OpenAI-compatible REST API. Supports LLM, Embeddings, Reranking, Computer Vision, and ASR models.

Features

  • OpenAI-compatible API: Drop-in replacement for OpenAI API endpoints
  • Multiple Model Types: LLM, VLM, Embeddings, Reranking, CV, ASR
  • GPU Acceleration: CUDA support for NVIDIA GPUs
  • NPU Support: Optimized for Qualcomm NPU on ARM64

Supported Models

Modality Models
LLM NexaAI/LFM2-1.2B-npu, NexaAI/Granite-4.0-h-350M-NPU
VLM NexaAI/OmniNeural-4B
Embedding NexaAI/embeddinggemma-300m-npu, NexaAI/EmbedNeural
Rerank NexaAI/jina-v2-rerank-npu
CV NexaAI/yolov12-npu, NexaAI/convnext-tiny-npu-IoT
ASR NexaAI/parakeet-tdt-0.6b-v3-npu

Usage

CPU Mode

docker compose --profile cpu up -d

GPU Mode (CUDA)

docker compose --profile gpu up -d nexa-sdk-cuda

Pull a Model

docker exec -it nexa-sdk nexa pull NexaAI/Granite-4.0-h-350M-NPU

Interactive CLI

docker exec -it nexa-sdk nexa infer NexaAI/Granite-4.0-h-350M-NPU

API Examples

  • Chat completions:

    curl -X POST http://localhost:18181/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "NexaAI/Granite-4.0-h-350M-NPU",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'
    
  • Embeddings:

    curl -X POST http://localhost:18181/v1/embeddings \
      -H "Content-Type: application/json" \
      -d '{
        "model": "NexaAI/EmbedNeural",
        "input": "Hello, world!"
      }'
    
  • Swagger UI: Visit http://localhost:18181/docs/ui

Services

  • nexa-sdk: CPU-based NexaSDK service (default)
  • nexa-sdk-cuda: GPU-accelerated service with CUDA support (profile: gpu)

Configuration

Variable Description Default
NEXA_SDK_VERSION NexaSDK image version v0.2.65
NEXA_SDK_PORT_OVERRIDE Host port for REST API 18181
NEXA_TOKEN Nexa API token (required) -
TZ Timezone UTC

Volumes

  • nexa_data: Volume for storing downloaded models and data

Getting a Token

  1. Create an account at sdk.nexa.ai
  2. Go to Deployment → Create Token
  3. Copy the token to your .env file

References