[Write-Up: How to Run Two or More Pods Within One GPU in a GKE Cluster]
This can be useful for those who need to run small models without consuming many resources, especially for beginners who are experimenting with their solutions.
To launch a k8s cluster in GCP, it's very convenient to use public modules available at: https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/latest/submodules/beta-private-cluster-update-variant
Depending on your requirements, you can launch a cluster with public or private nodes, including adding a node pool with GPUs to the cluster."
{
name = "gpu"
machine_type = "g2-standard-8"
node_locations = "us-central1-a"
node_count = 0
total_min_count = 0
total_max_count = 0
auto_upgrade = true
autoscaling = false
spot = true
disk_type = "pd-ssd"
accelerator_count = 1
accelerator_type = "nvidia-l4"
enable_gcfs = true
}
However, within these modules, you won't be able to run multiple pods within a GPU node pool due to resource allocation limitations. By default, when adding a GPU node pool, you have:
nvidia.com/gpu: "1"
This means that we can allocate one GPU core to a single pod. We won't be able to run more than one pod within our node pool.
But what to do if we need to run multiple pods? There is an option to use GPU time sharing: https://cloud.google.com/kubernetes-engine/docs/concepts/timesharing-gpus
Let's describe the node pool and add it to the cluster.
Create a file named gpu-time-sharing.tf and describe all the necessary input data. Here you can see an example how it looks like.
resource "google_container_node_pool" "gpu_time_sharing" {
name = "gpu-time-sharing"
cluster = "cluster-id"
project = "my-project-id"
location = "us-central1"
version = "1.27.3-gke.100"
node_locations = ["us-central1-b"]
autoscaling {
location_policy = "ANY"
total_max_node_count = "0"
total_min_node_count = "1"
}
management {
auto_upgrade = true
auto_repair = true
}
network_config {
create_pod_range = false
enable_private_nodes = true
pod_range = "kubernetes-pods-subnet-name"
}
node_config {
image_type = "COS_CONTAINERD"
machine_type = "g2-standard-8"
service_account = "cluster_service_account"
spot = true
local_ssd_count = 0
disk_size_gb = 200
disk_type = "pd-ssd"
logging_variant = null
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
gcfs_config {
enabled = true
}
gvnic {
enabled = true
}
guest_accelerator {
type = "nvidia-l4"
count = 1
gpu_driver_installation_config {
gpu_driver_version = "LATEST"
}
gpu_sharing_config {
gpu_sharing_strategy = "TIME_SHARING"
max_shared_clients_per_gpu = 4
}
}
workload_metadata_config {
mode = "GCE_METADATA"
}
labels = {
cluster_name = local.name
purpose = "gpu-time-sharing"
node_pool = "gpu-time-sharing"
}
taint = [
{
key = "purpose"
value = "gpu-time-sharing"
effect = "NO_SCHEDULE"
},
{
effect = "NO_SCHEDULE"
key = "nvidia.com/gpu"
value = "present"
},
]
tags = ["gke-my-project-id-region",
"gke-my-project-id-region-gpu-time-sharing"]
}
timeouts {
create = "30m"
update = "20m"
}
}
Note: Make sure to adjust the input data according to your requirements.
Let's focus on the most critical part:
guest_accelerator {
type = "nvidia-l4"
count = 1
gpu_driver_installation_config {
gpu_driver_version = "LATEST"
}
gpu_sharing_config {
gpu_sharing_strategy = "TIME_SHARING"
max_shared_clients_per_gpu = 4
}
}
Here, it's described that we need to use the accelerator type 'nvidia-l4' with the driver version, in our case, 'latest'
. You can specify the driver version you require.
gpu_sharing_config {
gpu_sharing_strategy = "TIME_SHARING"
max_shared_clients_per_gpu = 4
}
And this block describes that we need to split the GPU into 4 partitions.
After applying, we obtain a node pool with the ability to run four pods with GPU allocation.
To launch our pod, we need to specify the resources, node selector, and toleration for it:
resources:
limits:
cpu: 2
memory: 8Gi
nvidia.com/gpu: "1"
requests:
cpu: 2
nvidia.com/gpu: "1"
memory: 8Gi
nodeSelector:
purpose: gpu-time-sharing
tolerations:
- effect: NoSchedule
key: purpose
operator: Equal
value: gpu-time-sharing
- effect: NoSchedule
key: nvidia.com/gpu
operator: Equal
value: present
In conclusion, this setup allowing the efficient utilization of resources within the GPU node pool, this approach caters to the needs of individuals experimenting with various solutions while minimizing resource allocation.