# LMDeploy Docker Compose

[LMDeploy](https://github.com/InternLM/lmdeploy) is a toolkit for compressing, deploying, and serving LLMs.

## Quick Start

1. (Optional) Configure the model and port in `.env`.
2. Start the service:

   ```bash
   docker compose up -d
   ```

3. Access the OpenAI compatible API at `http://localhost:23333/v1`.

## Configuration

| Environment Variable     | Default                        | Description                          |
| ------------------------ | ------------------------------ | ------------------------------------ |
| `LMDEPLOY_VERSION`       | `v0.11.1-cu12.8`               | LMDeploy image version               |
| `LMDEPLOY_PORT_OVERRIDE` | `23333`                        | Host port for the API server         |
| `LMDEPLOY_MODEL`         | `internlm/internlm2-chat-1_8b` | HuggingFace model ID or local path   |
| `HF_TOKEN`               |                                | HuggingFace token for private models |

## Monitoring Health

The service includes a health check that verifies if the OpenAI `/v1/models` endpoint is responsive.

## GPU Support

By default, this configuration reserves 1 NVIDIA GPU. Ensure you have the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed on your host.