Enable Mthreads GPU sharing
Introduction
We now support mthreads.com/vgpu by implementing most device-sharing features as nvidia-GPU, including:
GPU sharing: Each task can allocate a portion of GPU instead of a whole GPU card, thus GPU can be shared among multiple tasks.
Device Memory Control: GPUs can be allocated with certain device memory size on certain type(i.e MTT S4000) and have made it that it does not exceed the boundary.
Device Core Control: GPUs can be allocated with limited compute cores on certain type(i.e MTT S4000) and have made it that it does not exceed the boundary.
Important Notes
-
Device sharing for multi-cards is not supported.
-
Only one mthreads device can be shared in a pod(even there are multiple containers).
-
Support allocating exclusive mthreads GPU by specifying mthreads.com/vgpu only.
-
These features are tested on MTT S4000
Prerequisites
- MT CloudNative Toolkits > 1.9.0
- driver version >= 1.2.0
Enabling GPU-sharing Support
- Deploy MT-CloudNative Toolkit on mthreads nodes (Please consult your device provider to acquire its package and document)
NOTICE: You can remove mt-mutating-webhook and mt-gpu-scheduler after installation(optional).
- set the 'devices.mthreads.enabled = true' when installing hami
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag={your kubernetes version} --set device.mthreads.enabled=true -n kube-system