# PyTorch [English](./README.md) | [中文](./README.zh.md) This service deploys PyTorch with CUDA support, Jupyter Lab, and TensorBoard for deep learning development. ## Services - `pytorch`: PyTorch container with GPU support, Jupyter Lab, and TensorBoard. ## Prerequisites **NVIDIA GPU Required**: This service requires an NVIDIA GPU with CUDA support and the NVIDIA Container Toolkit installed. ### Install NVIDIA Container Toolkit **Linux:** ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` **Windows (Docker Desktop):** Ensure you have WSL2 with NVIDIA drivers installed and Docker Desktop configured to use WSL2 backend. ## Environment Variables | Variable Name | Description | Default Value | | -------------------------- | -------------------------- | ------------------------------- | | PYTORCH_VERSION | PyTorch image version | `2.6.0-cuda12.6-cudnn9-runtime` | | JUPYTER_ENABLE_LAB | Enable Jupyter Lab | `yes` | | JUPYTER_TOKEN | Jupyter access token | `pytorch` | | NVIDIA_VISIBLE_DEVICES | GPUs to use | `all` | | NVIDIA_DRIVER_CAPABILITIES | Driver capabilities | `compute,utility` | | GPU_COUNT | Number of GPUs to allocate | `1` | | JUPYTER_PORT_OVERRIDE | Jupyter Lab port | `8888` | | TENSORBOARD_PORT_OVERRIDE | TensorBoard port | `6006` | Please modify the `.env` file as needed for your use case. ## Volumes - `pytorch_notebooks`: Jupyter notebooks and scripts. - `pytorch_data`: Training data and datasets. ## Usage ### Start the Service ```bash docker-compose up -d ``` ### Access Jupyter Lab Open your browser and navigate to: ```text http://localhost:8888 ``` Login with the token specified in `JUPYTER_TOKEN` (default: `pytorch`). ### Verify GPU Access In a Jupyter notebook: ```python import torch print(f"PyTorch version: {torch.__version__}") print(f"CUDA available: {torch.cuda.is_available()}") print(f"CUDA version: {torch.version.cuda}") print(f"Number of GPUs: {torch.cuda.device_count()}") if torch.cuda.is_available(): print(f"GPU name: {torch.cuda.get_device_name(0)}") ``` ### Example Training Script ```python import torch import torch.nn as nn import torch.optim as optim # Set device device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Define a simple model model = nn.Sequential( nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10) ).to(device) # Create dummy data x = torch.randn(64, 784).to(device) y = torch.randint(0, 10, (64,)).to(device) # Training criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters()) output = model(x) loss = criterion(output, y) loss.backward() optimizer.step() print(f"Loss: {loss.item()}") ``` ### Access TensorBoard TensorBoard port is exposed but needs to be started manually: ```python from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter('/workspace/runs') ``` Then start TensorBoard: ```bash docker exec pytorch tensorboard --logdir=/workspace/runs --host=0.0.0.0 ``` Access at: `http://localhost:6006` ## Features - **GPU Acceleration**: CUDA support for fast training - **Jupyter Lab**: Interactive development environment - **TensorBoard**: Visualization for training metrics - **Pre-installed**: PyTorch, CUDA, cuDNN ready to use - **Persistent Storage**: Notebooks and data stored in volumes ## Notes - GPU is required for optimal performance - Recommended: 8GB+ VRAM for most deep learning tasks - The container installs Jupyter and TensorBoard on first start - Use `pytorch/pytorch:*-devel` for building custom extensions - For multi-GPU training, adjust `GPU_COUNT` and use `torch.nn.DataParallel` ## License PyTorch is licensed under the BSD-style license.