Skip to content

gpu support

Bibo Hao edited this page Jul 2, 2026 · 1 revision

GPU & CUDA Integration

This module details how NVIDIA GPU drivers, CUDA toolkits, and machine learning frameworks (PyTorch, TensorFlow, PaddlePaddle) are integrated into the LabNow Docker ecosystem.


1. CUDA Wrappers and Images (docker_cuda)

GPU compatibility is achieved by wrapping official NVIDIA CUDA development images with LabNow customizations.

Wrap Hierarchy

Because NVIDIA images start from raw OS configurations, the build system wraps CUDA base images in a multi-step pipeline:

  1. Atom Wrap: docker_atom/atom.Dockerfile is built using the NVIDIA base image (e.g. nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04) passed via BASE_IMG. This yields an nvidia-cuda atom image.
  2. Base Wrap: docker_base/base.Dockerfile is built on top of the nvidia-cuda atom image to add Conda, Python, and base tools.
  3. CUDA Finalize: docker_cuda/nvidia-cuda.Dockerfile inherits the base-wrapped image, configures NVIDIA_DISABLE_REQUIRE=1, updates debpython path configurations, compiles and installs GPU monitoring utilities, and cleans up.

GPU Monitoring (setup_nvtop)

  • Downloads and builds nvtop from source to display NVIDIA, AMD, and Intel GPU status.
  • Requires CMake >= 3.18. If the base OS has an older CMake, it temporarily adds the Kitware APT repository during build.
  • Compiles nvtop binding to host NVML libraries and cleans up compile dependencies post-install to minimize layer sizes.

2. Machine Learning Framework Installation

When ARG_PROFILE_PYTHON is populated with torch, tf2, or paddle, the core docker installation hook runs specialized setup procedures to configure CUDA acceleration.

CUDA Version & Device Index Auto-Detection

The build script automatically checks if CUDA compiler compiler (nvcc) is present:

  • Evaluates $CUDA_VERSION to generate a shortened string $CUDA_VER (e.g., 12.1 -> 121).
  • Sets $IDX to cu${CUDA_VER} (e.g. cu121) if a GPU compiler is present, else defaults to cpu.

PyTorch Setup (torch Profile)

  • Evaluates GPU compatibility: If CUDA version is < 11.7, it installs PyTorch 1.x, else installs PyTorch 2.x.
  • Runs pip install targeting the official PyTorch wheel index:
    pip install --no-cache-dir --root-user-action=ignore -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/${IDX}

TensorFlow Setup (tf Profile)

  • Installs either tensorflow (CPU/v2) or tensorflow-gpu (v1) based on profile version (tf1 or tf2).

PaddlePaddle Setup (paddle Profile)

  • Evaluates if NVCC is present to install either paddlepaddle-gpu or paddlepaddle.
  • Uses official index-url https://www.paddlepaddle.org.cn/packages/stable/${IDX}/.

3. NVIDIA Package Size Optimization (Crucial Step)

A major source of layer bloat in GPU images is duplicate NVIDIA CUDA runtime wheels shipped via pip packages (e.g. nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12). These duplicate files already present in the host system.

To drastically reduce image size:

  1. Searches pip freeze outputs for nvidia-* packages and purges them:
    pip freeze | awk -F= 'tolower($1) ~ /^nvidia-/ {print $1}' | xargs -r pip uninstall -y
  2. Installs lightweight, system-wide C++ shared libraries instead:
    apt-get update && apt-get install -y --no-install-recommends libcusparselt0 libnccl2 libnccl-dev

This step typically shaves several gigabytes off the final GPU image layers while maintaining full PyTorch/TensorFlow execution functionality.

Clone this wiki locally