Kubernetes¶

系统、深入、能打面试。覆盖核心概念、架构原理、资源对象、网络存储、调度机制、安全、监控、CI/CD、故障排查等。以 Kubernetes 1.29+ 为基准。

目录¶

核心架构篇
核心资源对象篇
配置与存储篇
网络篇
调度篇
安全篇
监控与日志篇
Helm 篇
CI/CD 篇
故障排查篇
面试高频题篇
kubectl 命令速查

一、核心架构篇¶

1.1 整体架构¶

┌─────────────────────────────────────────────────┐
│                 Control Plane（主节点）           │
│                                                  │
│  ┌──────────────┐  ┌───────────────────────────┐ │
│  │  API Server  │  │    etcd（集群状态存储）      │ │
│  │  (kube-      │  │    - 只与 API Server 通信  │ │
│  │   apiserver) │  │    - Raft 一致性协议        │ │
│  └──────┬───────┘  └───────────────────────────┘ │
│         │                                        │
│  ┌──────┴───────┐  ┌───────────────────────────┐ │
│  │  Scheduler   │  │   Controller Manager       │ │
│  │  (调度 Pod)  │  │   (Deployment/RS/Node...)  │ │
│  └──────────────┘  └───────────────────────────┘ │
│  ┌──────────────────────────────────────────────┐ │
│  │   Cloud Controller Manager（可选）            │ │
│  └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘

┌──────────────────────┐  ┌──────────────────────┐
│     Worker Node 1    │  │     Worker Node 2    │
│                      │  │                      │
│  ┌────────────────┐  │  │  ┌────────────────┐  │
│  │    kubelet     │  │  │  │    kubelet     │  │
│  │  (管理 Pod 生命│  │  │  │                │  │
│  │   周期)        │  │  │  └────────────────┘  │
│  └────────────────┘  │  │  ┌────────────────┐  │
│  ┌────────────────┐  │  │  │  kube-proxy    │  │
│  │  kube-proxy    │  │  │  │  (iptables/    │  │
│  │  (Service 流量)│  │  │  │   ipvs)        │  │
│  └────────────────┘  │  │  └────────────────┘  │
│  ┌────────────────┐  │  │  ┌────────────────┐  │
│  │  Container     │  │  │  │  Container     │  │
│  │  Runtime       │  │  │  │  Runtime       │  │
│  │  (containerd)  │  │  │  │                │  │
│  └────────────────┘  │  │  └────────────────┘  │
└──────────────────────┘  └──────────────────────┘

1.2 Control Plane 组件详解¶

kube-apiserver¶

唯一入口：所有组件（kubelet、scheduler、controller-manager）都只与 API Server 通信
认证、授权、准入控制三道关卡
Watch 机制：支持 List-Watch，组件通过 Watch 监听资源变化，实现事件驱动
水平扩展：可以部署多个实例（通过负载均衡）
无状态，所有数据存在 etcd

etcd¶

分布式 KV 存储，存储集群所有状态（Pod、Service、ConfigMap 等）
Raft 共识算法，需要奇数节点（3/5/7），超过半数节点存活才能工作
生产推荐独立部署 etcd 集群，不与 API Server 混部（如果混用可能会影响ETCD性能）
key 格式：/registry/{resource_type}/{namespace}/{name}
只有 API Server 直接读写 etcd，其他组件通过 API Server 间接操作

kube-scheduler¶

监听未调度的 Pod（spec.nodeName 为空），为其选择最优 Node
调度流程：过滤（Filter）→ 打分（Score）→ 绑定（Bind）
过滤（Predicates）：资源是否满足、污点/容忍、亲和性、端口冲突等
打分（Priorities）：资源余量、数据局部性、负载均衡等
调度结果写入 Pod 的 spec.nodeName 字段
支持自定义调度器和调度框架（Scheduling Framework）

kube-controller-manager¶

运行多种控制器（Controller），每个控制器负责一种资源
控制循环：Watch 期望状态 → 对比实际状态 → 执行调谐（Reconcile）
常见控制器：
- Deployment Controller：管理 ReplicaSet
- ReplicaSet Controller：确保 Pod 副本数
- Node Controller：监控 Node 健康
- Job Controller：管理 Job/CronJob
- Service Account Controller：创建默认 SA
- Endpoints Controller：维护 Service 的 Endpoints

1.3 Worker Node 组件详解¶

kubelet¶

Node 上最核心的 Agent，与 API Server 通信
通过 CRI（Container Runtime Interface） 调用容器运行时（containerd/CRI-O）
通过 CNI（Container Network Interface） 配置网络
通过 CSI（Container Storage Interface） 挂载存储
定期向 API Server 上报 Node 状态（心跳）
执行 liveness/readiness/startup probe 健康检查

kube-proxy¶

维护 iptables/ipvs 规则，实现 Service 的负载均衡
三种模式：
- userspace（已废弃）
- iptables（默认）：规则多时性能差，O(n) 查找
- ipvs（推荐）：基于内核 IPVS，O(1) 查找，支持更多负载均衡算法

Container Runtime¶

containerd（目前主流，K8s 1.20 后废弃 dockershim）
CRI-O（专为 K8s 设计，更轻量）
Docker 通过 cri-dockerd 桥接使用（不推荐）

1.4 核心工作原理：List-Watch 机制¶

kubelet/scheduler/controller-manager
         │
         │  1. List：获取当前所有资源（全量同步）
         ▼
    API Server ──── etcd
         │
         │  2. Watch：建立长连接，监听资源变化事件
         ▼
    事件类型：ADDED / MODIFIED / DELETED
         │
         │  3. 放入本地 WorkQueue
         ▼
    Controller 处理：期望状态 vs 实际状态 → Reconcile

为什么用 List-Watch 而不是轮询？

轮询延迟高、开销大
Watch 基于 HTTP 长连接（HTTP/2 或 HTTP Chunked），推送事件
断连后自动 Re-list，保证一致性

1.5 Pod 创建完整流程¶

用户 kubectl apply -f pod.yaml
         │
         ▼
1. API Server 接收请求 → 认证 → 授权 → 准入控制 → 写入 etcd

2. Scheduler Watch 到未调度 Pod（nodeName 为空）
   → 过滤合适 Node
   → 打分选出最优 Node
   → 调用 API Server 绑定（写入 pod.spec.nodeName）

3. 目标 Node 上的 kubelet Watch 到分配给自己的 Pod
   → 调用 CRI 创建 Pause 容器（建立 Pod 网络命名空间）
   → 调用 CNI 配置 Pod 网络（分配 IP）
   → 依次启动 Init Container
   → 依次启动业务容器
   → 执行 PostStart Hook
   → 开始 Probe 探针

4. kubelet 定期向 API Server 上报 Pod 状态
5. Pod Running，Endpoints Controller 将 Pod IP 加入 Service Endpoints

二、核心资源对象篇¶

2.1 Pod¶

Pod 是 K8s 中最小调度单元，包含一个或多个容器，共享：

网络命名空间（同一 IP、端口空间）
存储（Volumes）
IPC 命名空间（可通过共享内存通信）

apiVersion: v1
kind: Pod
metadata:
  name: myapp
  namespace: default
  labels:
    app: myapp
    version: "1.0"
  annotations:
    description: "生产环境应用"
spec:
  # 初始化容器（按顺序执行，全部完成后才启动业务容器）
  initContainers:
  - name: init-db
    image: busybox:1.35
    command: ['sh', '-c', 'until nc -z mysql 3306; do sleep 2; done']

  containers:
  - name: app
    image: myapp:1.0.0
    imagePullPolicy: IfNotPresent  # Always / IfNotPresent / Never

    ports:
    - containerPort: 8080
      protocol: TCP

    # 资源请求与限制
    resources:
      requests:
        cpu:    "100m"     # 0.1 核
        memory: "128Mi"
      limits:
        cpu:    "500m"
        memory: "512Mi"

    # 环境变量
    env:
    - name: DB_HOST
      value: "mysql-service"
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: db-secret
          key:  password
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP

    # 从 ConfigMap/Secret 批量注入环境变量
    envFrom:
    - configMapRef:
        name: app-config
    - secretRef:
        name: app-secret

    # 存储卷挂载
    volumeMounts:
    - name: config-vol
      mountPath: /etc/config
      readOnly: true
    - name: data-vol
      mountPath: /data

    # 健康检查
    startupProbe:             # 启动探针（慢启动保护）
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

    livenessProbe:            # 存活探针（失败则重启容器）
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

    readinessProbe:           # 就绪探针（失败则从 Service Endpoints 摘除）
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3

    # 生命周期钩子
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo started > /tmp/started"]
      preStop:                # 优雅退出（在收到 SIGTERM 前执行）
        exec:
          command: ["/bin/sh", "-c", "sleep 10"]

  # Sidecar 容器
  - name: log-agent
    image: fluentd:v1.16
    volumeMounts:
    - name: log-vol
      mountPath: /var/log/app

  volumes:
  - name: config-vol
    configMap:
      name: app-config
  - name: data-vol
    persistentVolumeClaim:
      claimName: app-pvc
  - name: log-vol
    emptyDir: {}

  # 重启策略
  restartPolicy: Always       # Always / OnFailure / Never

  # 优雅终止时间
  terminationGracePeriodSeconds: 30

  # 节点选择
  nodeSelector:
    disktype: ssd

  # 服务账号
  serviceAccountName: myapp-sa

  # DNS 策略
  dnsPolicy: ClusterFirst     # ClusterFirst / ClusterFirstWithHostNet / Default / None

  # 主机网络
  hostNetwork: false

  # 镜像拉取凭证
  imagePullSecrets:
  - name: registry-secret

2.2 Deployment¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp

  # 更新策略
  strategy:
    type: RollingUpdate          # RollingUpdate / Recreate
    rollingUpdate:
      maxSurge:       1          # 更新时最多多出几个 Pod（绝对值或百分比）
      maxUnavailable: 0          # 更新时最多不可用几个 Pod

  # 版本历史保留数量
  revisionHistoryLimit: 10

  # 最短就绪时间（认为 Pod 可用的最短就绪时间）
  minReadySeconds: 10

  # 进度截止时间
  progressDeadlineSeconds: 600

  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:1.0.0
        resources:
          requests: { cpu: "100m", memory: "128Mi" }
          limits:   { cpu: "500m", memory: "512Mi" }

Deployment 滚动更新原理：

旧 RS（v1）: 3 Pod  →  新 RS（v2）: 0 Pod
           ↓ 创建 1 个新 Pod（maxSurge=1，共4个）
旧 RS（v1）: 3 Pod  →  新 RS（v2）: 1 Pod（就绪后）
           ↓ 删除 1 个旧 Pod
旧 RS（v1）: 2 Pod  →  新 RS（v2）: 1 Pod
           ↓ 循环直到完成
旧 RS（v1）: 0 Pod  →  新 RS（v2）: 3 Pod

# 常用操作
kubectl apply -f deployment.yaml
kubectl rollout status deployment/myapp-deployment   # 查看滚动更新状态
kubectl rollout history deployment/myapp-deployment  # 查看版本历史
kubectl rollout undo deployment/myapp-deployment     # 回滚到上一版本
kubectl rollout undo deployment/myapp-deployment --to-revision=2  # 回滚到指定版本
kubectl rollout pause deployment/myapp-deployment    # 暂停更新
kubectl rollout resume deployment/myapp-deployment   # 恢复更新
kubectl scale deployment myapp-deployment --replicas=5  # 手动扩缩容

2.3 StatefulSet¶

有状态应用（数据库、MQ、分布式存储等），提供：

稳定的网络标识：Pod 名称固定（pod-0, pod-1, pod-2）
稳定的持久存储：每个 Pod 有独立 PVC，删除 Pod 后数据保留
有序部署和扩缩容：按序号顺序创建/删除（0→1→2 或 2→1→0）
有序滚动更新：从最大序号开始更新

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql-headless    # 必须指定 Headless Service 名称
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
  # 每个 Pod 自动创建独立 PVC
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi

---
# 必须的 Headless Service（clusterIP: None）
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
spec:
  clusterIP: None              # Headless Service
  selector:
    app: mysql
  ports:
  - port: 3306

访问方式（通过 Headless Service + DNS）：

mysql-0.mysql-headless.default.svc.cluster.local
mysql-1.mysql-headless.default.svc.cluster.local
mysql-2.mysql-headless.default.svc.cluster.local

2.4 DaemonSet¶

保证每个（或指定的）Node 上运行一个该 Pod 副本。典型用途：日志收集（Fluentd）、监控采集（Node Exporter）、网络插件（CNI）、存储插件。

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      hostPID: true
      tolerations:                  # 容忍 Master 节点污点，使 Master 也运行
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      containers:
      - name: node-exporter
        image: prom/node-exporter:v1.7.0
        ports:
        - containerPort: 9100
          hostPort: 9100
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys

2.5 Job 与 CronJob¶

# Job：运行一次性任务，成功完成后不重启
apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  completions:  1       # 需要成功完成的 Pod 数
  parallelism:  1       # 并行运行的 Pod 数
  backoffLimit: 3       # 失败重试次数
  activeDeadlineSeconds: 3600  # 最大运行时间（秒）
  ttlSecondsAfterFinished: 86400  # 完成后自动清理时间
  template:
    spec:
      restartPolicy: OnFailure   # Job 必须用 OnFailure 或 Never
      containers:
      - name: migration
        image: myapp:1.0
        command: ["python", "migrate.py"]

---
# CronJob：定时任务
apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup
spec:
  schedule: "0 2 * * *"          # Cron 表达式：每天凌晨2点
  timeZone: "Asia/Shanghai"       # K8s 1.27+ 支持
  concurrencyPolicy: Forbid       # Allow / Forbid / Replace
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 300    # 错过执行时间后，最多延迟多少秒还可以执行
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: backup-tool:1.0
            command: ["./backup.sh"]

2.6 Service¶

Service 为一组 Pod 提供稳定的访问入口（固定 IP + DNS）。

# ClusterIP（默认，集群内部访问）
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
spec:
  type: ClusterIP
  selector:
    app: myapp
  ports:
  - name: http
    port: 80            # Service 端口
    targetPort: 8080    # Pod 端口
    protocol: TCP

---
# NodePort（集群外部通过 NodeIP:NodePort 访问）
apiVersion: v1
kind: Service
metadata:
  name: myapp-nodeport
spec:
  type: NodePort
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080     # 范围 30000-32767，不指定则随机分配

---
# LoadBalancer（云厂商负载均衡器）
apiVersion: v1
kind: Service
metadata:
  name: myapp-lb
spec:
  type: LoadBalancer
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080

---
# ExternalName（将 Service 映射到外部域名，用于集群内访问外部服务）
apiVersion: v1
kind: Service
metadata:
  name: external-mysql
  namespace: default
spec:
  type: ExternalName
  externalName: mysql.example.com

---
# Headless Service（clusterIP: None，DNS 直接返回 Pod IP 列表）
apiVersion: v1
kind: Service
metadata:
  name: myapp-headless
spec:
  clusterIP: None
  selector:
    app: myapp
  ports:
  - port: 8080

Service 类型对比

类型	可达范围	用途
ClusterIP	集群内部	内部服务互访
NodePort	集群内 + 外部	开发测试
LoadBalancer	集群内 + 外部	生产暴露服务（云环境）
ExternalName	集群内	访问外部服务
Headless	集群内	StatefulSet、服务发现

2.7 Ingress¶

七层（HTTP/HTTPS）流量路由，统一集群入口。

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls             # TLS 证书 Secret
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
  - host: admin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-service
            port:
              number: 80

Ingress 流量路径：

外部请求 → DNS → LoadBalancer/NodePort
       → Ingress Controller（Nginx/Traefik/Kong）Pod
       → 根据 Host/Path 规则匹配
       → ClusterIP Service
       → Pod

2.8 HPA（水平 Pod 自动伸缩）¶

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-deployment
  minReplicas: 2
  maxReplicas: 20
  metrics:
  # CPU 利用率
  - type: Resource
    resource:
      name: cpu
      target:
        type:               AverageUtilization
        averageUtilization: 70    # 目标 CPU 利用率 70%
  # 内存利用率
  - type: Resource
    resource:
      name: memory
      target:
        type:         AverageValue
        averageValue: 400Mi
  # 自定义指标（需要 custom-metrics-apiserver）
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type:         AverageValue
        averageValue: "1k"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 缩容稳定窗口
      policies:
      - type:          Pods
        value:         2
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type:          Percent
        value:         100
        periodSeconds: 15

2.9 VPA（垂直 Pod 自动伸缩）¶

# pip install metrics-server 前提条件
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-deployment
  updatePolicy:
    updateMode: "Auto"    # Off / Initial / Recreate / Auto
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"

2.10 Namespace¶

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    env: prod

# LimitRange（命名空间级别的资源默认值和限制）
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu:    "200m"
      memory: "256Mi"
    defaultRequest:
      cpu:    "100m"
      memory: "128Mi"
    max:
      cpu:    "2"
      memory: "2Gi"

# ResourceQuota（命名空间资源配额）
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ns-quota
  namespace: production
spec:
  hard:
    requests.cpu:    "20"
    requests.memory: "40Gi"
    limits.cpu:      "40"
    limits.memory:   "80Gi"
    pods:            "100"
    services:        "20"
    persistentvolumeclaims: "20"
    secrets:         "50"
    configmaps:      "50"

三、配置与存储篇¶

3.1 ConfigMap¶

# 创建 ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  # 键值对
  LOG_LEVEL:   "info"
  DB_HOST:     "mysql-service"
  APP_PORT:    "8080"
  # 多行配置文件
  application.yaml: |
    server:
      port: 8080
    database:
      host: mysql-service
      port: 3306
  nginx.conf: |
    server {
        listen 80;
        location / {
            proxy_pass http://backend;
        }
    }

# 使用方式1：环境变量
envFrom:
- configMapRef:
    name: app-config

# 使用方式2：指定 key 的环境变量
env:
- name: LOG_LEVEL
  valueFrom:
    configMapKeyRef:
      name: app-config
      key:  LOG_LEVEL

# 使用方式3：Volume 挂载（自动同步更新）
volumes:
- name: config-vol
  configMap:
    name: app-config
    items:                          # 只挂载指定 key
    - key:  application.yaml
      path: app.yaml

volumeMounts:
- name:      config-vol
  mountPath: /etc/config

注意：Volume 挂载方式会自动同步 ConfigMap 更新（kubelet 定期同步，默认约60s）；环境变量方式不会自动更新，需重启 Pod。

3.2 Secret¶

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque                  # Opaque / kubernetes.io/dockerconfigjson / kubernetes.io/tls 等
data:
  username: YWRtaW4=          # base64 编码
  password: cGFzc3dvcmQxMjM=
stringData:                   # 明文（自动 base64 编码，优先级高于 data）
  api-key: "my-secret-api-key"

# 命令行创建
kubectl create secret generic db-secret \
  --from-literal=username=admin \
  --from-literal=password=password123

# 从文件创建
kubectl create secret generic tls-secret \
  --from-file=tls.crt=server.crt \
  --from-file=tls.key=server.key

# TLS Secret
kubectl create secret tls myapp-tls \
  --cert=server.crt \
  --key=server.key

# 镜像拉取 Secret
kubectl create secret docker-registry registry-secret \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=pass

Secret vs ConfigMap

	ConfigMap	Secret
存储内容	非敏感配置	敏感信息（密码/Token/证书）
存储方式	明文	base64 编码（非加密！）
etcd 加密	否（默认）	可开启 EncryptionConfiguration
内存存储	否	tmpfs（不写磁盘）

3.3 PV / PVC / StorageClass¶

存储架构三层：

StorageClass（存储类）
    ↓ 动态 Provisioning
PV（Persistent Volume）    ← 管理员手动创建 或 动态创建
    ↓ 绑定
PVC（Persistent Volume Claim）  ← 开发者声明存储需求
    ↓ 挂载
Pod

# StorageClass（动态存储，按需创建 PV）
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"  # 默认 StorageClass
provisioner: kubernetes.io/aws-ebs   # 存储插件（云厂商或 CSI 驱动）
parameters:
  type: gp3
  iops: "3000"
reclaimPolicy: Delete         # Delete（删除 PVC 时删除 PV）/ Retain / Recycle
allowVolumeExpansion: true    # 允许扩容
volumeBindingMode: WaitForFirstConsumer  # Immediate / WaitForFirstConsumer（延迟绑定）

# PV（管理员创建，或由 StorageClass 自动创建）
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
  - ReadWriteOnce       # RWO：单节点读写
  # - ReadWriteMany     # RWX：多节点读写
  # - ReadOnlyMany      # ROX：多节点只读
  reclaimPolicy: Retain
  storageClassName: fast-ssd
  hostPath:             # 生产不用 hostPath，示例
    path: /data/storage

---
# PVC（开发者声明存储需求）
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-pvc
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: fast-ssd    # 匹配 StorageClass
  resources:
    requests:
      storage: 5Gi

PV 生命周期：

Available → Bound（PVC 绑定）→ Released（PVC 删除）→ Failed
                                         ↓ reclaimPolicy
                               Delete（删除）/ Retain（保留）

四、网络篇¶

4.1 K8s 网络模型¶

K8s 网络模型三条基本规则：

所有 Pod 可以直接通信（不需要 NAT）
Node 可以直接与所有 Pod 通信（不需要 NAT）
Pod 看到的自己的 IP 与其他 Pod 看到的该 Pod 的 IP 相同

网络层次：

Pod 内容器间通信   → localhost（共享网络命名空间）
同 Node Pod 间     → veth pair + 虚拟网桥（cbr0/cni0）
跨 Node Pod 间     → Overlay（VXLAN/IPIP）或 BGP 路由（取决于 CNI）
Pod 访问 Service   → iptables/ipvs（kube-proxy 维护）
Service 访问 Pod   → Service → Endpoints → Pod

4.2 CNI 网络插件¶

插件	实现方式	特点
Flannel	VXLAN（overlay）	简单，适合入门
Calico	BGP 路由（underlay）	高性能，支持网络策略
Cilium	eBPF	最高性能，支持 L7 策略
Weave	VXLAN + UDP	易部署
Canal	Flannel + Calico	两者结合

4.3 NetworkPolicy（网络策略）¶

默认所有 Pod 可以互相访问，NetworkPolicy 可以限制流量。

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: app-network-policy
  namespace: production
spec:
  # 应用于哪些 Pod
  podSelector:
    matchLabels:
      app: myapp

  policyTypes:
  - Ingress
  - Egress

  # 入站规则
  ingress:
  - from:
    # 来自同命名空间的 frontend Pod
    - podSelector:
        matchLabels:
          role: frontend
    # 来自 monitoring 命名空间
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8080

  # 出站规则
  egress:
  - to:
    # 只允许访问 mysql
    - podSelector:
        matchLabels:
          app: mysql
    ports:
    - protocol: TCP
      port: 3306
  # 允许 DNS 查询
  - ports:
    - protocol: UDP
      port: 53

4.4 Service DNS¶

K8s 内置 CoreDNS，提供服务发现。

# Service DNS 格式
{service-name}.{namespace}.svc.cluster.local

# 同命名空间内可直接用 service-name
# 跨命名空间用 service-name.namespace

# Pod DNS 格式（StatefulSet）
{pod-name}.{headless-service}.{namespace}.svc.cluster.local

# 示例
mysql.default.svc.cluster.local       # mysql Service
mysql-0.mysql-headless.default.svc.cluster.local  # StatefulSet Pod

五、调度篇¶

5.1 调度流程详解¶

调度队列（ActiveQ）
     │
     ▼
Filter 阶段（过滤不合格节点）：
  - NodeResourcesFit：CPU/内存资源是否充足
  - NodeAffinity：节点亲和性
  - PodAffinity/PodAntiAffinity：Pod 亲和/反亲和
  - TaintToleration：污点/容忍
  - NodeSelector：节点标签选择
  - NodeName：指定节点名
  - VolumeBinding：存储卷是否可绑定
  - ... 共20+ 个 Filter 插件
     │
     ▼
Score 阶段（对候选节点打分，0-100分）：
  - LeastAllocated：资源使用少的分高（分散）
  - MostAllocated：资源使用多的分高（集中）
  - NodeAffinityPriority：亲和性分值
  - InterPodAffinityPriority：Pod 亲和性分值
  - ImageLocality：镜像已存在的节点分高
  - ... 共10+ 个 Score 插件
     │
     ▼
选择得分最高的 Node → 绑定 Pod

5.2 节点亲和性¶

spec:
  affinity:
    # 节点亲和性
    nodeAffinity:
      # 硬性要求（必须满足）
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key:      kubernetes.io/arch
            operator: In
            values:   [amd64]
          - key:      disktype
            operator: In
            values:   [ssd]
      # 软性偏好（尽量满足，不满足也可调度）
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        preference:
          matchExpressions:
          - key:      zone
            operator: In
            values:   [zone-a]
      - weight: 20
        preference:
          matchExpressions:
          - key:      zone
            operator: In
            values:   [zone-b]

    # Pod 亲和性（与哪些 Pod 在一起）
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: cache
        topologyKey: kubernetes.io/hostname  # 同一 Node

    # Pod 反亲和性（与哪些 Pod 分开）
    podAntiAffinity:
      # 硬性：同一 Node 上不能有相同 app 的 Pod（高可用）
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: myapp
        topologyKey: kubernetes.io/hostname
      # 软性：尽量分散到不同可用区
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: myapp
          topologyKey: topology.kubernetes.io/zone

5.3 Taint（污点）与 Toleration（容忍）¶

污点用途：阻止 Pod 被调度到特定节点（保留节点用于特定工作负载）。

# 给节点打污点
kubectl taint nodes node1 key=value:NoSchedule
kubectl taint nodes node1 key=value:NoExecute
kubectl taint nodes node1 key=value:PreferNoSchedule

# 删除污点
kubectl taint nodes node1 key=value:NoSchedule-

# 效果说明
# NoSchedule:       新 Pod 不会被调度（已有 Pod 不受影响）
# NoExecute:        新 Pod 不调度 + 已有无容忍的 Pod 被驱逐
# PreferNoSchedule: 尽量不调度（软性）

# Pod 容忍污点
spec:
  tolerations:
  - key:      "key"
    operator: "Equal"    # Equal / Exists
    value:    "value"
    effect:   "NoSchedule"
  # 容忍所有污点（系统组件如 DaemonSet 使用）
  - operator: "Exists"
  # NoExecute 可设置容忍时间
  - key:      "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect:   "NoExecute"
    tolerationSeconds: 300   # 容忍300秒后驱逐

5.4 TopologySpreadConstraints（拓扑分布约束）¶

# 将 Pod 均匀分散到不同 Zone 和 Node
spec:
  topologySpreadConstraints:
  - maxSkew:           1           # 最大偏差（节点间 Pod 数差值）
    topologyKey:       topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule   # DoNotSchedule / ScheduleAnyway
    labelSelector:
      matchLabels:
        app: myapp
  - maxSkew:           1
    topologyKey:       kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: myapp

5.5 PodDisruptionBudget（中断预算）¶

# 保证在主动中断（节点维护/滚动更新）时最少可用 Pod 数
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable:   2     # 最少2个可用（也可用 maxUnavailable）
  # maxUnavailable: 1   # 最多1个不可用
  selector:
    matchLabels:
      app: myapp

六、安全篇¶

6.1 认证、授权、准入控制¶

API 请求 → 认证（Authentication）→ 授权（Authorization）→ 准入控制（Admission）→ etcd

认证方式：

X.509 客户端证书
Bearer Token（ServiceAccount Token / Static Token）
OpenID Connect（OIDC）
Webhook Token Authentication

授权方式（RBAC 最常用）：

RBAC（基于角色的访问控制）
ABAC（基于属性）
Node Authorization
Webhook

准入控制器（Admission Controllers）：

ValidatingWebhook：验证资源合法性
MutatingWebhook：修改资源（注入 Sidecar 等）
LimitRanger、ResourceQuota、PodSecurity 等内置

6.2 RBAC¶

# ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: default

---
# Role（命名空间级别权限）
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: default
rules:
- apiGroups: [""]             # "" 表示 core API group
  resources: ["pods", "pods/log", "pods/exec"]
  verbs:     ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs:     ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["app-config"]   # 限制只能访问指定资源
  verbs:     ["get"]

---
# RoleBinding（绑定 Role 到 SA/User/Group）
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind:      ServiceAccount
  name:      myapp-sa
  namespace: default
- kind:      User
  name:      alice
  apiGroup:  rbac.authorization.k8s.io
roleRef:
  kind:     Role
  name:     pod-reader
  apiGroup: rbac.authorization.k8s.io

---
# ClusterRole（集群级别权限）
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-viewer
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs:     ["get", "list", "watch"]
- nonResourceURLs: ["/metrics", "/healthz"]
  verbs:           ["get"]

---
# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-viewer-binding
subjects:
- kind: ServiceAccount
  name: monitor-sa
  namespace: monitoring
roleRef:
  kind:     ClusterRole
  name:     node-viewer
  apiGroup: rbac.authorization.k8s.io

6.3 PodSecurityContext¶

spec:
  securityContext:
    runAsUser:    1000         # 以指定 UID 运行
    runAsGroup:   3000         # 以指定 GID 运行
    runAsNonRoot: true         # 禁止以 root 运行
    fsGroup:      2000         # 挂载卷的 GID
    seccompProfile:
      type: RuntimeDefault

  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false  # 禁止提权
      readOnlyRootFilesystem: true     # 只读根文件系统
      capabilities:
        drop: ["ALL"]                  # 去除所有 Linux 能力
        add:  ["NET_BIND_SERVICE"]     # 只添加必要能力
      privileged: false

七、监控与日志篇¶

7.1 监控体系¶

Prometheus（指标采集）
  ├── kube-state-metrics：K8s 资源对象指标
  ├── node-exporter：节点系统指标
  ├── cadvisor（内置于 kubelet）：容器资源指标
  └── 应用自定义指标（/metrics 端点）

Grafana（可视化展示）
AlertManager（告警管理）

# ServiceMonitor（Prometheus Operator 使用）
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s

7.2 日志体系¶

容器日志 → stdout/stderr
         ↓ kubelet 收集 → /var/log/containers/
         ↓ DaemonSet（Fluentd/Filebeat/Promtail）
         ↓ 日志聚合（Elasticsearch/Loki）
         ↓ 可视化（Kibana/Grafana）

# 常见方案
EFK：Elasticsearch + Fluentd + Kibana
PLG：Promtail + Loki + Grafana（更轻量，推荐）

# 查看日志
kubectl logs pod-name -c container-name    # 查看指定容器日志
kubectl logs pod-name --previous           # 查看上次崩溃日志
kubectl logs pod-name -f --tail=100        # 实时跟踪
kubectl logs -l app=myapp --all-containers # 按标签查看所有 Pod 日志

八、Helm 篇¶

8.1 基础概念¶

Chart：Helm 包（类似 apt/pip 包）
Release：Chart 在集群中的实例
Repository：Chart 仓库
Values：Chart 的配置参数

# 安装
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# 搜索
helm search repo nginx
helm search hub wordpress

# 安装 Chart
helm install my-nginx bitnami/nginx
helm install my-nginx bitnami/nginx --namespace web --create-namespace
helm install my-nginx bitnami/nginx -f custom-values.yaml
helm install my-nginx bitnami/nginx --set replicaCount=3,image.tag=1.25

# 查看
helm list -n web
helm status my-nginx
helm get values my-nginx
helm get manifest my-nginx

# 升级
helm upgrade my-nginx bitnami/nginx --set replicaCount=5
helm upgrade --install my-nginx bitnami/nginx    # 不存在则安装

# 回滚
helm history my-nginx
helm rollback my-nginx 1

# 卸载
helm uninstall my-nginx

8.2 创建 Chart¶

helm create myapp
# 生成结构：
# myapp/
# ├── Chart.yaml          # Chart 元数据
# ├── values.yaml         # 默认参数
# ├── templates/          # K8s 资源模板
# │   ├── deployment.yaml
# │   ├── service.yaml
# │   ├── ingress.yaml
# │   ├── hpa.yaml
# │   ├── serviceaccount.yaml
# │   ├── NOTES.txt       # 安装提示
# │   └── _helpers.tpl    # 模板辅助函数
# ├── charts/             # 依赖 Chart
# └── .helmignore

# values.yaml
replicaCount: 2
image:
  repository: myapp
  tag: "1.0.0"
  pullPolicy: IfNotPresent
service:
  type: ClusterIP
  port: 80
ingress:
  enabled: true
  host: myapp.example.com
resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "myapp.fullname" . }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      {{- include "myapp.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "myapp.selectorLabels" . | nindent 8 }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - containerPort: 80
        resources:
          {{- toYaml .Values.resources | nindent 10 }}
        {{- if .Values.env }}
        env:
          {{- range .Values.env }}
          - name:  {{ .name }}
            value: {{ .value | quote }}
          {{- end }}
        {{- end }}

九、CI/CD 篇¶

9.1 典型 GitOps 流程¶

开发者 git push
    ↓
CI Pipeline（GitHub Actions / GitLab CI / Jenkins）
    ├── 代码检查（lint / test）
    ├── 构建镜像（docker build）
    ├── 推送镜像（docker push）
    └── 更新 Helm values 或 K8s manifest（git commit）
    ↓
GitOps 控制器（ArgoCD / FluxCD）
    ├── 监听 Git 仓库变化
    ├── 对比集群实际状态与期望状态
    └── 自动同步（kubectl apply）
    ↓
Kubernetes 集群

9.2 ArgoCD 示例¶

# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myapp-manifests
    targetRevision: main
    path: k8s/overlays/production      # Kustomize 目录
    # helm:                             # 或 Helm Chart
    #   chart: myapp
    #   valueFiles: [values-prod.yaml]
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune:    true     # 自动删除多余资源
      selfHeal: true     # 集群状态偏离时自动修复
    syncOptions:
    - CreateNamespace=true
    retry:
      limit: 3
      backoff:
        duration:    5s
        maxDuration: 3m

9.3 Kustomize¶

k8s/
├── base/                  # 基础配置
│   ├── kustomization.yaml
│   ├── deployment.yaml
│   └── service.yaml
└── overlays/
    ├── staging/           # 测试环境
    │   ├── kustomization.yaml
    │   └── patch-replicas.yaml
    └── production/        # 生产环境
        ├── kustomization.yaml
        └── patch-resources.yaml

# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
commonLabels:
  app: myapp

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
namespace: production
images:
- name:    myapp
  newTag:  v2.0.0
patches:
- path: patch-replicas.yaml
configMapGenerator:
- name: app-config
  literals:
  - LOG_LEVEL=warn

# overlays/production/patch-replicas.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 5

十、故障排查篇¶

10.1 Pod 常见问题¶

# 查看 Pod 状态和事件
kubectl get pod myapp-xxx -o wide
kubectl get pods -n aas -o custom-columns=NAME:.metadata.name,CONTAINERS:.spec.containers[*].name,PORTS:.spec.containers[*].ports[*].containerPort,STATUS:.status.phase
# 可以像docker ps 一样输出端口

kubectl describe pod myapp-xxx     # 关键：看 Events 和 Conditions
kubectl logs myapp-xxx -c app      # 查看日志
kubectl logs myapp-xxx --previous  # 查看上次崩溃日志

# 进入容器调试
kubectl exec -it myapp-xxx -c app -- bash
kubectl exec -it myapp-xxx -- sh

# 使用临时调试容器（不破坏原容器）
kubectl debug -it myapp-xxx --image=busybox --target=app

Pod 状态速查：

状态	含义	排查方向
`Pending`	未调度	`describe` 看 Events：资源不足/节点选择失败/PVC 未绑定
`ContainerCreating`	创建中	镜像拉取失败/挂载 Volume 失败
`CrashLoopBackOff`	反复崩溃	看 `logs --previous`，应用启动失败/OOM
`OOMKilled`	内存溢出	增加内存 limits 或优化应用内存
`ImagePullBackOff`	镜像拉取失败	镜像名错误/镜像不存在/私有仓库凭证
`Evicted`	被驱逐	节点资源不足（磁盘/内存压力）
`Terminating` 卡住	无法终止	`kubectl delete pod xxx --force --grace-period=0`
`Unknown`	节点失联	Node 网络/kubelet 问题

10.2 排查流程¶

# ---- 节点问题 ----
kubectl get nodes
kubectl describe node node-1
# 检查 Conditions: Ready/MemoryPressure/DiskPressure/PIDPressure
# 登录节点
ssh node-1
systemctl status kubelet
journalctl -u kubelet -f --since="1 hour ago"
df -h              # 磁盘使用
free -h            # 内存使用
top                # CPU 使用

# ---- Service 不通 ----
# 检查 Service
kubectl get svc myapp-svc
kubectl describe svc myapp-svc
# 检查 Endpoints（是否有 Pod IP）
kubectl get endpoints myapp-svc
# 如果 Endpoints 为空：检查 selector 是否与 Pod labels 匹配
kubectl get pod -l app=myapp

# 在 Pod 内测试连通性
kubectl exec -it debug-pod -- curl myapp-svc.default.svc.cluster.local
kubectl exec -it debug-pod -- nslookup myapp-svc
kubectl exec -it debug-pod -- wget -O- myapp-svc:80

# ---- DNS 问题 ----
kubectl get pods -n kube-system | grep coredns
kubectl logs -n kube-system coredns-xxx
# 测试 DNS
kubectl exec -it debug-pod -- nslookup kubernetes.default
kubectl exec -it debug-pod -- cat /etc/resolv.conf

# ---- 网络问题 ----
# 查看 kube-proxy
kubectl get pods -n kube-system | grep kube-proxy
kubectl logs -n kube-system kube-proxy-xxx
# 检查 iptables 规则
iptables -t nat -L KUBE-SERVICES | grep myapp-svc

# ---- 资源配额问题 ----
kubectl describe resourcequota -n production
kubectl describe limitrange -n production

# ---- etcd 问题 ----
kubectl get pods -n kube-system | grep etcd
kubectl exec -it etcd-master -n kube-system -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint health

10.3 性能排查¶

# 查看节点资源使用（需安装 metrics-server）
kubectl top nodes
kubectl top pods --all-namespaces
kubectl top pod myapp-xxx --containers

# 查看资源请求和限制
kubectl describe pod myapp-xxx | grep -A 5 "Requests\|Limits"

# 查看 HPA 状态
kubectl get hpa
kubectl describe hpa myapp-hpa

# 查看调度详情
kubectl get events --field-selector reason=FailedScheduling

十一、面试高频题篇¶

Q1: 简述 K8s 架构及各组件作用¶

答： K8s 采用 Master-Worker 架构。

Control Plane（主节点）：

kube-apiserver：集群唯一入口，负责认证/授权/准入控制，所有组件只与它通信
etcd：分布式 KV 存储，保存集群所有状态，基于 Raft 协议，只与 API Server 通信
kube-scheduler：监听未调度 Pod，通过 Filter（过滤）→ Score（打分）选择最优节点
kube-controller-manager：运行各种控制器，通过控制循环将实际状态调谐到期望状态

Worker Node：

kubelet：Node 上的 Agent，通过 CRI/CNI/CSI 管理 Pod 生命周期
kube-proxy：维护 iptables/ipvs 规则，实现 Service 负载均衡
Container Runtime：containerd/CRI-O，实际创建和管理容器

Q2: Pod 与容器的区别，为什么需要 Pod？¶

答：

Pod 是 K8s 的最小调度单元，可以包含一个或多个容器
同一 Pod 内容器共享：网络命名空间（同 IP/端口空间）、存储 Volume、IPC 命名空间
需要 Pod 的原因：
1. 原子调度：紧密耦合的容器（如应用+日志采集 Sidecar）需要一起调度到同一节点
2. 共享资源：Sidecar 模式需要共享网络/存储，如 Istio Envoy 代理
3. Init Container：可以在主容器启动前做初始化（等依赖就绪、迁移数据等）
4. Pause 容器：作为 Pod 网络命名空间的"根容器"，生命周期与 Pod 一致

Q3: Deployment 与 StatefulSet 的区别？¶

维度	Deployment	StatefulSet
适用场景	无状态应用（Web、API）	有状态应用（DB、MQ、ZK）
Pod 名称	随机后缀（pod-xkd8f）	固定序号（pod-0, pod-1）
网络标识	共享 ClusterIP	每个 Pod 有独立 DNS（需 Headless Service）
存储	所有 Pod 共享 PVC（或无）	每个 Pod 独立 PVC（不共享）
启动顺序	并行启动	按序号顺序启动（0→1→2）
更新策略	滚动更新（并行）	从最大序号开始逐个更新
删除 Pod 后	创建新 Pod（随机名）	重新创建相同名字的 Pod，挂载相同 PVC

Q4: Service 的几种类型及区别？¶

答：

ClusterIP（默认）：仅集群内访问，分配一个虚拟 IP
NodePort：在 ClusterIP 基础上，在每个 Node 上开放端口（30000-32767），集群外通过 NodeIP:NodePort 访问
LoadBalancer：在 NodePort 基础上，向云厂商请求外部负载均衡器，获得公网 IP
ExternalName：CNAME 解析到外部域名，不做代理
Headless（clusterIP: None）：不分配 VIP，DNS 直接返回 Pod IP 列表，用于 StatefulSet 和服务发现

Q5: kube-proxy 的 iptables 和 ipvs 模式区别？¶

	iptables	ipvs
实现	链式 iptables 规则	Linux 内核 IPVS 模块
时间复杂度	O(n)，规则多时慢	O(1)，基于哈希表
负载均衡算法	随机	RR/WRR/LC/SH/DH 等
大规模支持	Service>1000 时性能下降	支持大规模集群
连接追踪	conntrack	conntrack（可关闭）
推荐	中小规模	生产推荐

Q6: 简述 K8s 调度过程，如何影响调度决策？¶

答：调度流程：过滤（Filter）→ 打分（Score）→ 绑定（Bind）

影响调度的方式：

nodeSelector：最简单，按节点标签精确匹配
nodeAffinity：节点亲和性，支持硬性（required）和软性（preferred）
podAffinity/podAntiAffinity：Pod 间亲和/反亲和，用于共置或分散
Taints/Tolerations：污点/容忍，保留节点用于特定负载
资源限制：通过 requests 声明资源需求，调度器检查节点可用资源
topologySpreadConstraints：拓扑分布，实现 Pod 在 Zone/Node 间均匀分布

Q7: 解释 Liveness、Readiness、Startup Probe 的区别？¶

探针	失败后果	使用场景
startupProbe	重启容器	慢启动应用，防止被 liveness 误杀
livenessProbe	重启容器	检测应用死锁/僵死，触发自愈
readinessProbe	从 Service Endpoints 摘除	检测应用是否准备好接流量（启动中/依赖不可用）

关键区别：

Liveness 失败 → 杀容器 → 重启
Readiness 失败 → 不杀容器 → 只是不接流量
Startup 成功前 → Liveness 和 Readiness 不会执行

Q8: ConfigMap 和 Secret 有什么区别？Secret 安全吗？¶

答：

ConfigMap 存储非敏感配置；Secret 存储敏感信息（密码/Token/证书）
Secret 在 Pod 中以 tmpfs 挂载（不写磁盘），ConfigMap 正常文件挂载
Secret 并不是真正加密的，只是 base64 编码（可解码），etcd 中默认明文存储
提高 Secret 安全性的方案：
1. 开启 etcd EncryptionConfiguration（AES/Secretbox 加密）
2. 使用 Vault + CSI Secret Store Driver
3. 使用 Sealed Secrets（加密后提交 Git）
4. 限制 Secret 的 RBAC 访问权限

Q9: PV/PVC/StorageClass 是什么关系？¶

答：

StorageClass：定义存储类型和参数（如云磁盘类型），支持动态 Provisioning
PV（Persistent Volume）：集群级别的存储资源，由管理员手动创建或 StorageClass 动态创建
PVC（Persistent Volume Claim）：命名空间级别的存储申请，开发者只需声明需要多大空间/什么访问模式

绑定流程：

PVC 创建 → K8s 寻找满足条件的 PV → 绑定
若有 StorageClass → 自动创建 PV（动态 Provisioning）→ 绑定
Pod 挂载 PVC → 数据持久化

Q10: RBAC 中 Role 和 ClusterRole 的区别？¶

答：

Role + RoleBinding：命名空间级别，只在指定 Namespace 内有效
ClusterRole + ClusterRoleBinding：集群级别，对所有 Namespace 或集群级资源（Node/PV）有效
ClusterRole + RoleBinding（混合用法）：可以将 ClusterRole 绑定到特定 Namespace 中，方便复用权限定义

Q11: 什么是 Operator？CRD 是什么？¶

答：

CRD（Custom Resource Definition）：自定义资源类型，扩展 K8s API。例如 Kafka、MySQLCluster、Certificate
Operator：自定义控制器 + CRD 的组合。监听自定义资源变化，执行运维逻辑（自动扩缩容/备份/故障恢复）
典型 Operator：Prometheus Operator、MySQL Operator、Cert-Manager、Strimzi（Kafka）

# 查看集群中安装的 CRD
kubectl get crd

Q12: 如何实现零停机部署？¶

答：需要多方面配合：

Deployment 滚动更新：maxUnavailable: 0 + maxSurge: 1，保证始终有 Pod 可用
readinessProbe：新 Pod 就绪后才加入流量，旧 Pod 才被删除
PodDisruptionBudget：保证主动中断时最小可用副本数
preStop Hook + terminationGracePeriodSeconds：旧 Pod 先停止接流量（从 Endpoints 摘除），再等待当前请求处理完（优雅退出）
Service 层面：Endpoint 更新有延迟，所以 preStop 需要 sleep 几秒
多副本：至少 2 个副本（单副本无法做到零停机）

Q13: HPA 的原理是什么？¶

答： HPA 工作原理：

每隔 15s（默认）从 Metrics Server（或自定义 Metrics API）获取 Pod 资源指标
计算当前实际值与目标值的比率：期望副本数 = 当前副本数 × (当前指标值 / 目标指标值)
有冷却机制：扩容 3min 后才能再扩容，缩容 5min 后才能再缩容（稳定窗口）
扩缩容范围受 minReplicas 和 maxReplicas 限制

十二、kubectl 命令速查¶

# ---- 基本操作 ----
kubectl get nodes -o wide
kubectl get pods -n kube-system
kubectl get all -n production
kubectl get pod -l app=myapp,env=prod
kubectl get pod -o yaml / -o json / -o jsonpath='{.status.podIP}'
kubectl describe pod myapp-xxx
kubectl delete pod myapp-xxx --grace-period=0 --force   # 强制删除

# ---- 创建/更新 ----
kubectl apply -f deploy.yaml
kubectl apply -f ./k8s/          # 应用整个目录
kubectl create -f pod.yaml
kubectl replace -f pod.yaml      # 强制替换
kubectl patch deploy myapp -p '{"spec":{"replicas":5}}'
kubectl edit deploy myapp        # 在线编辑

# ---- 调试 ----
kubectl exec -it pod-xxx -- bash
kubectl exec pod-xxx -c sidecar -- ls /tmp
kubectl logs pod-xxx -f --tail=50
kubectl logs -l app=myapp --all-containers=true
kubectl port-forward pod/myapp-xxx 8080:80
kubectl port-forward svc/myapp-svc 8080:80

# ---- 镜像 -----
sudo ctr -n k8s.io containers ls | grep aas-server      # ctr 找到 docker 容器
sudo ctr -n k8s.io images import /tmp/your-image.tar    # ctr 导入 docker 镜像

# ---- 扩缩容 / 回滚 ----
kubectl scale deploy myapp --replicas=5
kubectl autoscale deploy myapp --min=2 --max=10 --cpu-percent=70
kubectl rollout status deploy/myapp
kubectl rollout history deploy/myapp
kubectl rollout undo deploy/myapp --to-revision=2
kubectl rollout restart deployment/aas-server -n aas  # 使之重新应用
kubectl set image deploy/myapp app=myapp:v2.0.0

# ---- 资源 ----
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=cpu
kubectl resource-capacity          # 需要插件

# ---- 命名空间 ----
kubectl config set-context --current --namespace=production   # 切换默认 ns
kubectl get pods -A              # 所有命名空间

# ---- 标签 / 注解 ----
kubectl label pod myapp-xxx env=prod
kubectl annotate pod myapp-xxx description="test pod"
kubectl label node node-1 disktype=ssd

# ---- Secret / ConfigMap ----
kubectl create secret generic mysecret --from-literal=key=value
kubectl create configmap myconfig --from-file=./config/

# ---- 证书 / 权限 ----
kubectl auth can-i create pods --namespace=production
kubectl auth can-i create pods --as=system:serviceaccount:default:myapp-sa

# ---- 集群信息 ----
kubectl cluster-info
kubectl version
kubectl api-resources        # 所有资源类型
kubectl api-versions         # 所有 API 版本
kubectl explain pod.spec.containers.resources   # 字段文档

参考资源¶

K8s 官方文档：https://kubernetes.io/docs/
K8s 源码：https://github.com/kubernetes/kubernetes
CKAD 考试：https://training.linuxfoundation.org/certification/certified-kubernetes-application-developer-ckad/
CKA 考试：https://training.linuxfoundation.org/certification/certified-kubernetes-administrator-cka/
Helm 文档：https://helm.sh/docs/
ArgoCD 文档：https://argo-cd.readthedocs.io/