设置 MultiKueue 环境
本教程说明如何配置一个管理集群和一个工作程序集群,以在 MultiKueue 环境中运行 JobSet 和 批处理/作业。
查看概念部分,了解 MultiKueue 概述。
让我们假设你的管理器集群被命名为manager-cluster
,你的工作程序集群被命名为worker1-cluster
。要遵循本教程,请确保所有这些集群的凭据都存在于本地计算机的 kubeconfig 中。查看kubectl 文档以了解有关如何配置对多个集群的访问的更多信息。
在工作程序集群中
注意
确保你当前的kubectl配置指向工作程序集群。
运行
kubectl config use-context worker1-cluster
当 MultiKueue 从管理器集群向工作程序集群调度工作负载时,它期望作业的命名空间和 LocalQueue 也存在于工作程序集群中。换句话说,你应该确保工作程序集群配置在命名空间和 LocalQueues 方面镜像管理器集群的配置。
要在default
命名空间中创建示例队列设置,你可以应用以下清单
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor"
spec:
nodeLabels:
key1: value
key2: value
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue"
spec:
clusterQueue: "cluster-queue"
MultiKueue 特定的 Kubeconfig
为了在工作程序集群中委派作业,管理器集群需要能够创建、删除和监视工作负载及其父作业。
虽然kubectl
被设置为使用工作程序集群,但请下载
#!/bin/bash
# Copyright 2024 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://apache.ac.cn/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -o errexit
set -o nounset
set -o pipefail
KUBECONFIG_OUT=${1:-kubeconfig}
MULTIKUEUE_SA=multikueue-sa
NAMESPACE=kueue-system
# Creating a restricted MultiKueue role, service account and role binding"
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: ${MULTIKUEUE_SA}
namespace: ${NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ${MULTIKUEUE_SA}-role
rules:
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs/status
verbs:
- get
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets/status
verbs:
- get
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads/status
verbs:
- get
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ${MULTIKUEUE_SA}-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ${MULTIKUEUE_SA}-role
subjects:
- kind: ServiceAccount
name: ${MULTIKUEUE_SA}
namespace: ${NAMESPACE}
EOF
# Get or create a secret bound to the new service account.
SA_SECRET_NAME=$(kubectl get -n ${NAMESPACE} sa/${MULTIKUEUE_SA} -o "jsonpath={.secrets[0]..name}")
if [ -z $SA_SECRET_NAME ]
then
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: ${MULTIKUEUE_SA}
namespace: ${NAMESPACE}
annotations:
kubernetes.io/service-account.name: "${MULTIKUEUE_SA}"
EOF
SA_SECRET_NAME=${MULTIKUEUE_SA}
fi
# Note: service account token is stored base64-encoded in the secret but must
# be plaintext in kubeconfig.
SA_TOKEN=$(kubectl get -n ${NAMESPACE} secrets/${SA_SECRET_NAME} -o "jsonpath={.data['token']}" | base64 -d)
CA_CERT=$(kubectl get -n ${NAMESPACE} secrets/${SA_SECRET_NAME} -o "jsonpath={.data['ca\.crt']}")
# Extract cluster IP from the current context
CURRENT_CONTEXT=$(kubectl config current-context)
CURRENT_CLUSTER=$(kubectl config view -o jsonpath="{.contexts[?(@.name == \"${CURRENT_CONTEXT}\"})].context.cluster}")
CURRENT_CLUSTER_ADDR=$(kubectl config view -o jsonpath="{.clusters[?(@.name == \"${CURRENT_CLUSTER}\"})].cluster.server}")
# Create the Kubeconfig file
echo "Writing kubeconfig in ${KUBECONFIG_OUT}"
cat > ${KUBECONFIG_OUT} <<EOF
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: ${CA_CERT}
server: ${CURRENT_CLUSTER_ADDR}
name: ${CURRENT_CLUSTER}
contexts:
- context:
cluster: ${CURRENT_CLUSTER}
user: ${CURRENT_CLUSTER}-${MULTIKUEUE_SA}
name: ${CURRENT_CONTEXT}
current-context: ${CURRENT_CONTEXT}
kind: Config
preferences: {}
users:
- name: ${CURRENT_CLUSTER}-${MULTIKUEUE_SA}
user:
token: ${SA_TOKEN}
EOF
并运行
chmod +x create-multikueue-kubeconfig.sh
./create-multikueue-kubeconfig.sh worker1.kubeconfig
以创建 Kubeconfig,该 Kubeconfig 可在管理器集群中用于委派当前工作程序中的作业。
在管理器集群中
注意
确保你当前的kubectl配置指向管理器集群。
运行
kubectl config use-context manager-cluster
JobSet 安装
如果您使用的是 0.7.0 或更高版本的 Kueue,请在管理集群上安装 JobSet(有关更多详细信息,请参阅 JobSet 安装)。请为 MultiKueue 安装 0.5.1 或更高版本的 JobSet。
警告
如果您使用的是低于 0.7.0 的 Kueue 旧版本,请仅在管理集群中安装 JobSet CRD。您可以通过运行以下命令来执行此操作
kubectl apply --server-side -f https://raw.githubusercontent.com/kubernetes-sigs/jobset/v0.5.1/config/components/crd/bases/jobset.x-k8s.io_jobsets.yaml
启用 MultiKueue 功能
启用 MultiKueue
功能。有关功能门控配置的详细信息,请查看 安装指南。
创建工作节点的 Kubeconfig 密钥
对于以下示例,将 worker1
集群 Kubeconfig 存储在名为 worker1.kubeconfig
的文件中,您可以通过运行以下命令创建 worker1-secret
密钥
kubectl create secret generic worker1-secret -n kueue-system --from-file=kubeconfig=worker1.kubeconfig
有关 Kubeconfig 生成的详细信息,请查看 工作节点部分。
创建示例设置
应用以下内容以创建示例设置,其中在 ClusterQueue cluster-queue
中提交的作业将委托给工作节点 worker1
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
admissionChecks:
- sample-multikueue
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue"
spec:
clusterQueue: "cluster-queue"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: sample-multikueue
spec:
controllerName: kueue.x-k8s.io/multikueue
parameters:
apiGroup: kueue.x-k8s.io
kind: MultiKueueConfig
name: multikueue-test
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueConfig
metadata:
name: multikueue-test
spec:
clusters:
- multikueue-test-worker1
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueCluster
metadata:
name: multikueue-test-worker1
spec:
kubeConfig:
locationType: Secret
location: worker1-secret
# a secret called "worker1-secret" should be created in the namespace the kueue
# controller manager runs into, holding the kubeConfig needed to connect to the
# worker cluster in the "kubeconfig" key;
配置成功后,创建的 ClusterQueue、AdmissionCheck 和 MultiKueueCluster 将变为活动状态。
运行
kubectl get clusterqueues cluster-queue -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}CQ - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"
kubectl get admissionchecks sample-multikueue -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}AC - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"
kubectl get multikueuecluster multikueue-test-worker1 -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}MC - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"
并预期输出类似于
CQ - Active: True Reason: Ready Message: Can admit new workloads
AC - Active: True Reason: Active Message: The admission check is active
MC - Active: True Reason: Active Message: Connected