按需待处理工作负载
使用按需可见性 API 监控待处理工作负载
此页面展示如何使用 VisibilityOnDemand
功能监控待处理工作负载。
此页面的目标受众是 批处理管理员,以及 批处理用户,适用于 本地队列可见性部分。
从 v0.6.0 版本开始,Kueue 为批处理管理员提供了监控待处理作业管道的功能,并帮助用户估算作业的启动时间。
在开始之前
确保满足以下条件
- Kubernetes 集群正在运行。
- kubectl 命令行工具已与集群通信。
- Kueue 已安装 v0.6.0 或更高版本。
启用 VisibilityOnDemand 功能门
VisibilityOnDemand 是默认禁用的 Alpha
功能。要使用可见性 API,请更改 功能门配置,并将 VisibilityOnDemand=true
。
安装可见性 API
要安装可见性 API,请运行以下命令
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/visibility-api.yaml
按需监控待处理工作负载
功能状态 自 Kueue v0.6 起稳定
要安装 ClusterQueue 的简单设置
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor"
spec:
nodeLabels:
key1: value
key2: value
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue"
spec:
clusterQueue: "cluster-queue"
运行以下命令
kubectl apply -f https://kueue.kubernetes.ac.cn/examples/admin/single-clusterqueue-setup.yaml
现在,让我们创建 6 个作业
apiVersion: batch/v1
kind: Job
metadata:
generateName: sample-job-
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue
spec:
parallelism: 3
completions: 3
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
args: ["30s"]
resources:
requests:
cpu: 1
memory: "200Mi"
restartPolicy: Never
使用命令
for i in {1..6}; do kubectl create -f https://kueue.kubernetes.ac.cn/examples/jobs/sample-job.yaml; done
其中 3 个使 ClusterQueue 饱和,其他 3 个应处于待处理状态。
Cluster Queue 可见性
要查看 ClusterQueue cluster-queue
中的待处理工作负载,请运行以下命令
kubectl get --raw "/apis/visibility.kueue.x-k8s.io/v1alpha1/clusterqueues/cluster-queue/pendingworkloads"
您应该会获得类似以下的结果
{
"kind": "PendingWorkloadsSummary",
"apiVersion": "visibility.kueue.x-k8s.io/v1alpha1",
"metadata": {
"creationTimestamp": null
},
"items": [
{
"metadata": {
"name": "job-sample-job-jrjfr-8d56e",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-jrjfr",
"uid": "5863cf0e-b0e7-43bf-a445-f41fa1abedfa"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 0,
"positionInLocalQueue": 0
},
{
"metadata": {
"name": "job-sample-job-jg9dw-5f1a3",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-jg9dw",
"uid": "fd5d1796-f61d-402f-a4c8-cbda646e2676"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 1,
"positionInLocalQueue": 1
},
{
"metadata": {
"name": "job-sample-job-t9b8m-4e770",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-t9b8m",
"uid": "64c26c73-6334-4d13-a1a8-38d99196baa5"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 2,
"positionInLocalQueue": 2
}
]
}
您可以传递可选的查询参数
- limit
<integer>
- 默认值为 1000。它表示应获取的待处理工作负载的最大数量。 - offset
<integer>
- 默认值为 0。它表示应获取的第一个待处理工作负载的位置,从 0 开始。
要仅查看 1 个待处理工作负载,从 ClusterQueue 中的第 1 个位置开始,请运行
kubectl get --raw "/apis/visibility.kueue.x-k8s.io/v1alpha1/clusterqueues/cluster-queue/pendingworkloads?limit=1&offset=1"
您应该会获得类似以下的结果
{
"kind": "PendingWorkloadsSummary",
"apiVersion": "visibility.kueue.x-k8s.io/v1alpha1",
"metadata": {
"creationTimestamp": null
},
"items": [
{
"metadata": {
"name": "job-sample-job-jg9dw-5f1a3",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-jg9dw",
"uid": "fd5d1796-f61d-402f-a4c8-cbda646e2676"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 1,
"positionInLocalQueue": 1
}
]
}
Local Queue 可见性
与 ClusterQueue 类似,要查看 LocalQueue user-queue
中的待处理工作负载,请运行以下命令
kubectl get --raw /apis/visibility.kueue.x-k8s.io/v1alpha1/namespaces/default/localqueues/user-queue/pendingworkloads
您应该会获得类似以下的结果
{
"kind": "PendingWorkloadsSummary",
"apiVersion": "visibility.kueue.x-k8s.io/v1alpha1",
"metadata": {
"creationTimestamp": null
},
"items": [
{
"metadata": {
"name": "job-sample-job-jrjfr-8d56e",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-jrjfr",
"uid": "5863cf0e-b0e7-43bf-a445-f41fa1abedfa"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 0,
"positionInLocalQueue": 0
},
{
"metadata": {
"name": "job-sample-job-jg9dw-5f1a3",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-jg9dw",
"uid": "fd5d1796-f61d-402f-a4c8-cbda646e2676"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 1,
"positionInLocalQueue": 1
},
{
"metadata": {
"name": "job-sample-job-t9b8m-4e770",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-t9b8m",
"uid": "64c26c73-6334-4d13-a1a8-38d99196baa5"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 2,
"positionInLocalQueue": 2
}
]
}
您可以传递可选的查询参数
- limit
<integer>
- 默认值为 1000。它表示应获取的待处理工作负载的最大数量。 - offset
<integer>
- 默认值为 0。它表示应获取的第一个待处理工作负载的位置,从 0 开始。
要仅查看 1 个待处理工作负载,从 LocalQueue 中的第 1 个位置开始,请运行
kubectl get --raw "/apis/visibility.kueue.x-k8s.io/v1alpha1/localqueues/user-queue/pendingworkloads?limit=1&offset=1"
您应该会获得类似以下的结果
{
"kind": "PendingWorkloadsSummary",
"apiVersion": "visibility.kueue.x-k8s.io/v1alpha1",
"metadata": {
"creationTimestamp": null
},
"items": [
{
"metadata": {
"name": "job-sample-job-jg9dw-5f1a3",
"namespace": "default",
"creationTimestamp": "2023-12-05T15:42:03Z",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"kind": "Job",
"name": "sample-job-jg9dw",
"uid": "fd5d1796-f61d-402f-a4c8-cbda646e2676"
}
]
},
"priority": 0,
"localQueueName": "user-queue",
"positionInClusterQueue": 1,
"positionInLocalQueue": 1
}
]
}