K8s 学习笔记（六）：调度与扩缩

写在前面

本文是 K8s 学习笔记系列的第六篇，介绍 Pod 调度机制和自动扩缩容：标签与选择器、nodeSelector、亲和性、污点与容忍、HPA。前置知识：配置与存储（第五篇）。

一、标签与选择器

标签（Label）是 K8s 最核心的组织机制，几乎所有资源都可以打标签。

1.1 标签规范

1
2
3
4
5
6
7


metadata:
  labels:
    app: web-app              # 应用名
    env: prod                 # 环境
    tier: frontend            # 层级
    version: v2.0             # 版本
    team: backend             # 团队

1
2
3
4
5


标签规则：
- 前缀可选：team.example.com/role=admin
- 值最多63字符
- 只能字母、数字、-、_、.
- 建议用统一的标签体系

1.2 标签操作

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


# 查看标签
kubectl get pods --show-labels
kubectl get nodes --show-labels

# 按标签筛选
kubectl get pods -l app=web-app
kubectl get pods -l 'env in (prod, staging)'
kubectl get pods -l 'env!=dev'
kubectl get pods -l 'version in (v1, v2),app=web-app'

# 添加/修改标签
kubectl label pod <pod-name> env=prod
kubectl label pod <pod-name> env=staging --overwrite

# 删除标签
kubectl label pod <pod-name> env-

# 给节点打标签
kubectl label node <node-name> disktype=ssd
kubectl label node <node-name> zone=cn-east-1

1.3 选择器在 YAML 中的使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# Service 匹配 Pod
spec:
  selector:
    matchLabels:
      app: web-app

# Deployment 匹配 Pod
spec:
  selector:
    matchLabels:
      app: web-app
    matchExpressions:
    - key: env
      operator: In
      values: ["prod", "staging"]

二、Pod 调度

2.1 调度过程

1
2
3
4
5


1. 用户创建 Pod
2. kube-scheduler 监听到未调度的 Pod
3. 过滤：排除不满足条件的节点
4. 打分：对可用节点评分
5. 绑定：将 Pod 绑定到得分最高的节点

2.2 nodeSelector（最简单）

直接指定 Pod 调度到有特定标签的节点：

1
2


# 给节点打标签
kubectl label node node-1 disktype=ssd

1
2
3


spec:
  nodeSelector:
    disktype: ssd              # 只调度到有这个标签的节点

2.3 nodeAffinity（更灵活）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


spec:
  affinity:
    nodeAffinity:
      # 必须满足（硬性要求）
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values: ["cn-east-1", "cn-east-2"]
      # 尽量满足（软性偏好）
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80             # 权重
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values: ["ssd"]

IgnoredDuringExecution 表示 Pod 已运行后，节点标签变化不会驱逐 Pod。

2.4 podAffinity / podAntiAffinity

控制 Pod 之间的亲和性（倾向在一起或分开）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


spec:
  affinity:
    # Pod 亲和：倾向调度到已有相同标签 Pod 的节点
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: cache          # 和 cache Pod 在同一节点
        topologyKey: kubernetes.io/hostname

    # Pod 反亲和：避免调度到已有相同 Pod 的节点（分散部署）
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: web-app      # 尽量分散到不同节点
          topologyKey: kubernetes.io/hostname

反亲和在生产中很常用：确保同一 Deployment 的副本分布在不同节点上，一个节点挂了不会影响全部实例。

三、污点与容忍

Taint（污点）让节点排斥 Pod，Toleration（容忍）让 Pod 接受污点。

3.1 节点污点

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


# 添加污点
kubectl taint nodes node-1 dedicated=gpu:NoSchedule      # 不调度新 Pod
kubectl taint nodes node-1 special=true:NoExecute        # 不调度且驱逐已有 Pod
kubectl taint nodes node-1 maintenance=true:NoSchedule

# 查看污点
kubectl describe node node-1 | grep Taints

# 删除污点
kubectl taint nodes node-1 dedicated=gpu:NoSchedule-

3.2 污点效果

1
2
3


NoSchedule      — 不调度新 Pod（已有的不受影响）
PreferNoSchedule — 尽量不调度（不是强制的）
NoExecute       — 不调度新 Pod + 驱逐已有的 Pod

3.3 Pod 容忍

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

  # 容忍所有污点（DaemonSet 常见配置）
  - operator: "Exists"

  # 容忍 Master 节点污点
  - key: "node-role.kubernetes.io/control-plane"
    effect: "NoSchedule"

3.4 常见场景

1
2
3
4


GPU 节点        → 打 taint dedicated=gpu:NoSchedule，只有标记容忍的 Pod 能调度
Master 节点     → 默认有污点，只有系统组件有容忍
专用节点        → 打 taint 给特定业务，其他 Pod 不受影响
节点维护        → 打 taint NoExecute，驱逐 Pod

四、HPA 自动扩缩容

HPA（Horizontal Pod Autoscaler）根据指标自动调整 Pod 副本数。

4.1 基于 CPU 的 HPA

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70     # CPU 使用率超过 70% 就扩容

4.2 基于内存的 HPA

1
2
3
4
5
6
7


metrics:
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80

4.2 多指标组合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80

多指标时，HPA 取需要最多副本的指标。

4.3 命令行创建 HPA

1
2
3
4
5
6
7
8


# 快速创建
kubectl autoscale deployment web-app --min=2 --max=10 --cpu-percent=70

# 查看 HPA
kubectl get hpa

# 查看 HPA 详情
kubectl describe hpa web-app-hpa

4.4 HPA 工作原理

1
2
3
4


1. Metrics Server 采集 Pod 的 CPU/内存指标
2. HPA Controller 定期检查指标（默认15秒）
3. 计算目标副本数 = 当前副本数 × (当前指标值 / 目标值)
4. 调整 Deployment 的 replicas

4.5 安装 Metrics Server

1
2
3
4
5
6
7
8
9


# minikube
minikube addons enable metrics-server

# 通用安装
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 验证
kubectl top nodes
kubectl top pods

4.6 扩缩行为配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0     # 扩容不等待
      policies:
      - type: Percent
        value: 100                       # 一次最多扩容 100%（翻倍）
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300    # 缩容等待5分钟
      policies:
      - type: Percent
        value: 10                        # 一次最多缩容 10%
        periodSeconds: 60

五、调度场景总结

1
2
3
4
5


指定节点       → nodeSelector
灵活调度       → nodeAffinity
Pod 分散       → podAntiAffinity
专用节点       → Taint + Toleration
自动扩缩容     → HPA

六、小结

本文学习了 K8s 的调度与扩缩：

标签与选择器
nodeSelector 和 nodeAffinity
podAffinity / podAntiAffinity
Taint 和 Toleration
HPA 自动扩缩容

下一篇将学习安全与 RBAC。