版本：v2.4.1

Enable Mthreads GPU sharing

Introduction

We now support mthreads.com/vgpu by implementing most device-sharing features as nvidia-GPU, including:

GPU sharing: Each task can allocate a portion of GPU instead of a whole GPU card, thus GPU can be shared among multiple tasks.

Device Memory Control: GPUs can be allocated with certain device memory size on certain type(i.e MTT S4000) and have made it that it does not exceed the boundary.

Device Core Control: GPUs can be allocated with limited compute cores on certain type(i.e MTT S4000) and have made it that it does not exceed the boundary.

Important Notes

Device sharing for multi-cards is not supported.
Only one mthreads device can be shared in a pod(even there are multiple containers).
Support allocating exclusive mthreads GPU by specifying mthreads.com/vgpu only.
These features are tested on MTT S4000

Prerequisites

MT CloudNative Toolkits > 1.9.0
driver version >= 1.2.0

Deploy MT-CloudNative Toolkit on mthreads nodes (Please consult your device provider to acquire its package and document)

NOTICE: You can remove mt-mutating-webhook and mt-gpu-scheduler after installation(optional).

set the 'devices.mthreads.enabled = true' when installing hami

helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag={your kubernetes version} --set device.mthreads.enabled=true -n kube-system

Running Mthreads jobs

Mthreads GPUs can now be requested by a container using the mthreads.com/vgpu, mthreads.com/sgpu-memory and mthreads.com/sgpu-core resource type:

apiVersion: v1
kind: Pod
metadata:
  name: gpushare-pod-default
spec:
  restartPolicy: OnFailure
  containers:
    - image: core.harbor.zlidc.mthreads.com:30003/mt-ai/lm-qy2:v17-mpc 
      imagePullPolicy: IfNotPresent
      name: gpushare-pod-1
      command: ["sleep"]
      args: ["100000"]
      resources:
        limits:
          mthreads.com/vgpu: 1
          mthreads.com/sgpu-memory: 32
          mthreads.com/sgpu-core: 8

NOTICE1: Each unit of sgpu-memory indicates 512M device memory

NOTICE2: You can find more examples in examples folder

Introduction​

Important Notes​

Prerequisites​

Enabling GPU-sharing Support​

Running Mthreads jobs​

Introduction

Important Notes

Prerequisites

Enabling GPU-sharing Support

Running Mthreads jobs