编程 Kubernetes 1.36 "晴"版本深度解析:DRA生产就绪、Ingress NGINX退役与云原生架构新纪元

2026-04-26 06:42:20 +0800 CST views 472

Kubernetes 1.36 "晴"版本深度解析:DRA生产就绪、Ingress NGINX退役与云原生架构新纪元

引言:云原生基础设施的又一次跃迁

2026年4月22日，Kubernetes社区正式发布了代号为"ハル"(Haru，意为"晴")的v1.36版本。这个充满诗意的版本名称背后，是云原生基础设施领域的重大技术变革。作为2026年的第一个主要版本，Kubernetes 1.36不仅带来了动态资源分配(DRA)生态系统的生产就绪状态，更标志着Ingress NGINX项目的正式退役——这一变化将影响全球超过50%的云原生环境。

对于正在构建或维护生产级Kubernetes集群的工程师而言，1.36版本既是机遇也是挑战。本文将从架构设计、核心特性、迁移策略和实战代码四个维度，深入剖析这一版本的技术内涵，帮助读者在理解新特性的同时，制定合理的升级路径。

第一章:Kubernetes 1.36 全景概览

1.1 版本发布周期与关键里程碑

Kubernetes项目保持着每4个月发布一个主要版本的节奏。v1.36的发布历程如下:

时间节点	里程碑事件
2026年1月12日	发布周期启动
2026年2月11日	功能增强冻结(Enhancement Freeze)
2026年3月18日	代码冻结(Code Freeze)
2026年4月8日	文档冻结(Docs Freeze)
2026年4月22日	v1.36正式发布

1.2 核心变更矩阵

本次版本更新涉及多个层面的重大调整，我们可以将其归纳为以下矩阵:

破坏性变更(Breaking Changes)

gitRepo卷驱动永久禁用
Ingress NGINX项目正式退役
Service.spec.externalIPs字段弃用

重大新特性(Major Features)

DRA(Dynamic Resource Allocation)动态资源分配达到GA(Generally Available)
SELinux卷标签挂载优化
ServiceAccount令牌外部签名机制

性能与稳定性改进

节点生命周期管理优化
调度器性能提升
API Server响应延迟降低

1.3 版本徽标背后的文化寓意

本次版本的徽标由艺术家Natsuho Ide(avocadoneko)创作，灵感源自葛饰北斋的《富嶽三十六景》。这一设计选择并非偶然——正如北斋通过36个不同视角描绘富士山，Kubernetes 1.36也为云原生资源管理提供了全新的观察视角。版本代号"Haru"(春/晴)象征着云原生生态的复苏与繁荣，也暗示着DRA等特性为集群资源管理带来的"晴朗"前景。

第二章:DRA动态资源分配——从实验到生产

2.1 为什么需要DRA?

在传统的Kubernetes资源模型中，CPU和内存是唯一的可调度资源。然而，随着AI/ML工作负载、GPU加速计算和专用硬件(如FPGA、TPU)的普及，这种二元资源模型已经无法满足复杂应用场景的需求。

传统资源模型的局限性:

# 传统方式:只能通过limits/requests声明CPU和内存
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: gpu-workload
    image: cuda-app:latest
    resources:
      limits:
        cpu: "4"
        memory: "16Gi"
        # 无法精确声明GPU资源

在上述配置中，GPU资源只能通过厂商特定的设备插件(Device Plugin)机制暴露，这导致了以下问题:

资源碎片化:GPU无法像CPU一样被精细分割和调度
调度黑盒:调度器不理解GPU的拓扑结构(NVLink、PCIe拓扑)
分配僵化:资源分配在Pod创建时确定，无法动态调整

2.2 DRA架构深度解析

DRA引入了一套全新的资源抽象层，其核心组件包括:

ResourceClaim(资源声明)

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: gpu-claim-for-training
spec:
  resourceClassName: gpu.nvidia.com
  parameters:
    apiVersion: gpu.nvidia.com/v1alpha1
    kind: GPUAllocationParameters
    profile: "training"
    memory: "40Gi"
    count: 2
    topologyPreference: "nvlink-paired"

ResourceClass(资源类)

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClass
metadata:
  name: gpu.nvidia.com
spec:
  driverName: gpu.nvidia.com
  parameters:
    apiVersion: gpu.nvidia.com/v1alpha1
    kind: GPUClassParameters
  structuredParameters:
    nodeSelector:
      node.kubernetes.io/instance-type: "p4d.24xlarge"

ResourceSlice(资源切片)

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSlice
metadata:
  name: node-gpu-slice
spec:
  driverName: gpu.nvidia.com
  nodeName: worker-node-1
  pool:
    name: gpu-pool-a
    generation: 1
    resourceSliceCount: 4
  devices:
  - name: gpu-0
    basic:
      capacity:
        memory: 40960Mi
        compute: 100
      taints:
      - key: gpu.nvidia.com/health
        value: "healthy"
        effect: NoSchedule
  - name: gpu-1
    basic:
      capacity:
        memory: 40960Mi
        compute: 100

2.3 DRA在1.36中的GA特性

Kubernetes 1.36将以下DRA功能提升至GA状态:

1. 设备Taints和Tolerations

类似于节点的Taint机制，DRA设备现在支持Taints，允许管理员标记设备状态:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSlice
spec:
  devices:
  - name: gpu-2
    basic:
      capacity:
        memory: 40960Mi
      taints:
      - key: maintenance.scheduled
        value: "2026-05-01"
        effect: NoSchedule
      - key: gpu.temperature
        value: "high"
        effect: PreferNoSchedule

Pod可以通过 tolerations 声明接受特定状态的设备:

apiVersion: v1
kind: Pod
spec:
  resourceClaims:
  - name: gpu
    resourceClaimTemplateName: gpu-claim
  tolerations:
  - key: gpu.temperature
    operator: Exists
    effect: PreferNoSchedule

2. 可分区设备支持(Partitionable Devices)

这是DRA最具革命性的特性之一。它允许将单个物理设备(如GPU)划分为多个逻辑分区，供不同Pod使用:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSlice
spec:
  devices:
  - name: a100-40gb
    basic:
      capacity:
        memory: 40960Mi
        compute: 100
      partitioned:
        - name: a100-40gb-part-0
          capacity:
            memory: 10240Mi
            compute: 25
        - name: a100-40gb-part-1
          capacity:
            memory: 10240Mi
            compute: 25
        - name: a100-40gb-part-2
          capacity:
            memory: 20480Mi
            compute: 50

3. 结构化参数(Structured Parameters)

1.36版本引入了结构化的资源参数定义，使得调度器能够更好地理解资源约束:

// DRA驱动实现的参数结构
type GPUAllocationParameters struct {
    Profile string `json:"profile"`
    Memory  string `json:"memory"`
    Count   int    `json:"count"`
    TopologyPreference string `json:"topologyPreference"`
    
    // 新增:亲和性约束
    Affinity *DeviceAffinity `json:"affinity,omitempty"`
    // 新增:反亲和性约束  
    AntiAffinity *DeviceAntiAffinity `json:"antiAffinity,omitempty"`
}

type DeviceAffinity struct {
    RequiredDuringScheduling []DeviceSelector `json:"requiredDuringScheduling"`
}

type DeviceSelector struct {
    MatchLabels map[string]string `json:"matchLabels"`
    MatchExpressions []DeviceRequirement `json:"matchExpressions"`
}

2.4 实战:构建DRA驱动的AI训练平台

以下是一个完整的DRA部署示例，展示如何为分布式AI训练工作负载配置GPU资源:

步骤1:部署NVIDIA DRA驱动

# 安装DRA驱动组件
kubectl apply -f https://github.com/NVIDIA/k8s-dra-driver/releases/download/v0.2.0/nvidia-dra-driver.yaml

# 验证驱动状态
kubectl get pods -n nvidia-dra-driver
kubectl get resourceslices.resource.k8s.io

步骤2:定义GPU资源类

# gpu-resource-class.yaml
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClass
metadata:
  name: nvidia-gpu-a100
spec:
  driverName: gpu.nvidia.com
  parameters:
    apiVersion: gpu.nvidia.com/v1alpha1
    kind: GPUClassParameters
    defaultProfile: "training"
  structuredParameters:
    nodeSelector:
      accelerator: nvidia-a100
---
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClass
metadata:
  name: nvidia-gpu-h100
spec:
  driverName: gpu.nvidia.com
  parameters:
    apiVersion: gpu.nvidia.com/v1alpha1
    kind: GPUClassParameters
    defaultProfile: "inference"
  structuredParameters:
    nodeSelector:
      accelerator: nvidia-h100

步骤3:创建ResourceClaim模板

# gpu-claim-template.yaml
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaimTemplate
metadata:
  name: distributed-training-gpu
  namespace: ai-workloads
spec:
  spec:
    resourceClassName: nvidia-gpu-a100
    parameters:
      apiVersion: gpu.nvidia.com/v1alpha1
      kind: GPUAllocationParameters
      profile: "distributed-training"
      count: 4
      topologyPreference: "nvlink-fully-connected"
      networkInterface: "rdma"

步骤4:部署分布式训练Job

# distributed-training-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: llm-pretraining
  namespace: ai-workloads
spec:
  parallelism: 4
  completionMode: Indexed
  template:
    spec:
      resourceClaims:
      - name: training-gpus
        resourceClaimTemplateName: distributed-training-gpu
      containers:
      - name: trainer
        image: pytorch-distributed:latest
        command:
        - torchrun
        - --nproc_per_node=4
        - --nnodes=4
        - --node_rank=$(JOB_COMPLETION_INDEX)
        - train.py
        resources:
          claims:
          - name: training-gpus
        env:
        - name: NCCL_DEBUG
          value: "INFO"
        - name: NCCL_IB_DISABLE
          value: "0"
        volumeMounts:
        - name: shm
          mountPath: /dev/shm
      volumes:
      - name: shm
        emptyDir:
          medium: Memory
          sizeLimit: 100Gi
      restartPolicy: Never

步骤5:监控DRA资源分配

# 查看ResourceClaim状态
kubectl get resourceclaims -n ai-workloads

# 查看设备分配详情
kubectl describe resourceclaim training-gpus-xxx -n ai-workloads

# 查看ResourceSlice库存
kubectl get resourceslices -o yaml | grep -A 20 "devices:"

2.5 DRA性能优化与最佳实践

调度性能优化:

# kube-scheduler配置优化
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: dra-aware-scheduler
  plugins:
    filter:
      enabled:
      - name: DRAFilter
        weight: 100
    score:
      enabled:
      - name: DRATopologyAware
        weight: 50
  pluginConfig:
  - name: DRAFilter
    args:
      cacheTimeout: 30s
      maxConcurrentEvaluations: 100

资源预留策略:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: reserved-gpu-pool
spec:
  resourceClassName: nvidia-gpu-a100
  allocationMode: WaitForFirstConsumer
  parameters:
    apiVersion: gpu.nvidia.com/v1alpha1
    kind: GPUAllocationParameters
    reservationPolicy:
      type: "Pool"
      poolSize: 8
      expirationTime: "2h"

第三章:Ingress NGINX退役——迁移策略与替代方案

3.1 退役背景与影响评估

2026年3月24日，Kubernetes SIG Network正式宣布Ingress NGINX项目退役。这一决定源于长期的安全维护负担和架构局限性:

退役时间线:

2025年11月:退役公告发布
2026年3月:项目正式归档
2026年4月(Kubernetes 1.36):相关代码从核心仓库移除

影响范围评估:

根据CNCF 2026年度调查，约50%的Kubernetes生产环境仍在使用Ingress NGINX。对于这些环境，迁移已成为紧迫任务。

# 检查当前集群是否使用Ingress NGINX
kubectl get pods --all-namespaces -l app.kubernetes.io/name=ingress-nginx

# 检查Ingress资源使用的Class
kubectl get ingress --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.ingressClassName}{"\n"}{end}'

3.2 主流替代方案对比

方案	架构	性能	功能特性	迁移复杂度
Gateway API + Envoy	下一代标准	高	丰富	中
Traefik	云原生	高	中等	低
HAProxy Ingress	传统成熟	高	丰富	低
Cilium Gateway	eBPF加速	极高	中等	中
NGINX Ingress(商业版)	企业级	高	极丰富	低

3.3 Gateway API:Kubernetes的未来标准

Gateway API是Kubernetes社区推出的下一代Ingress标准，在1.36版本中已完全稳定:

核心资源模型:

# GatewayClass定义基础设施提供者
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy-gw-class
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: custom-proxy-config
    namespace: envoy-gateway-system

# Gateway定义监听器配置
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: production-gateway
  namespace: ingress
spec:
  gatewayClassName: envoy-gw-class
  listeners:
  - name: https-public
    protocol: HTTPS
    port: 443
    hostname: "*.example.com"
    tls:
      mode: Terminate
      certificateRefs:
      - name: wildcard-cert
        kind: Secret
    allowedRoutes:
      namespaces:
        from: All
  - name: http-redirect
    protocol: HTTP
    port: 80
    hostname: "*.example.com"

# HTTPRoute定义路由规则
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-routes
  namespace: api
spec:
  parentRefs:
  - name: production-gateway
    namespace: ingress
    sectionName: https-public
  hostnames:
  - "api.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1/users
    filters:
    - type: URLRewrite
      urlRewrite:
        path:
          type: ReplacePrefixMatch
          replacePrefixMatch: /users
    backendRefs:
    - name: user-service
      port: 8080
      weight: 90
    - name: user-service-canary
      port: 8080
      weight: 10
  - matches:
    - path:
        type: PathPrefix
        value: /v1/orders
    filters:
    - type: RequestHeaderModifier
      requestHeaderModifier:
        add:
        - name: X-Route-Source
          value: gateway-api
    backendRefs:
    - name: order-service
      port: 8080

3.4 从Ingress NGINX到Gateway API的迁移实战

迁移步骤1:并行部署Gateway API

# 安装Envoy Gateway
helm repo add envoy-gateway https://gateway.envoyproxy.io/helm-charts
helm install eg envoy-gateway/gateway-helm -n envoy-gateway-system --create-namespace

# 验证安装
kubectl wait --timeout=5m -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available

迁移步骤2:配置流量分割

# 使用HTTPRoute的权重实现渐进式迁移
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: migration-route
spec:
  parentRefs:
  - name: production-gateway
  hostnames:
  - "app.example.com"
  rules:
  - backendRefs:
    # 第一阶段:10%流量到新Gateway
    - name: new-backend
      port: 80
      weight: 10
    # 90%流量保持原Ingress NGINX
    - name: legacy-backend
      port: 80
      weight: 90

迁移步骤3:注解转换映射

Ingress NGINX注解	Gateway API等效配置
`nginx.ingress.kubernetes.io/rewrite-target`	`URLRewrite` filter
`nginx.ingress.kubernetes.io/ssl-redirect`	`HTTPSRedirect` filter
`nginx.ingress.kubernetes.io/rate-limit`	`RateLimit` policy
`nginx.ingress.kubernetes.io/cors-*`	`CORS` policy
`nginx.ingress.kubernetes.io/auth-*`	`ExtensionRef` to Auth Service

迁移步骤4:高级功能迁移

# 速率限制配置
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: HTTPRoute
metadata:
  name: rate-limited-route
  annotations:
    gateway.envoyproxy.io/ratelimit: |
      type: Local
      local:
        requests_per_unit: 100
        unit: Minute
spec:
  parentRefs:
  - name: production-gateway
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api/
    backendRefs:
    - name: api-backend
      port: 8080

3.5 其他替代方案快速部署

Traefik迁移:

# traefik-values.yaml
ingressRoute:
  dashboard:
    enabled: false
providers:
  kubernetesIngress:
    enabled: true
    ingressClass: traefik
  kubernetesGateway:
    enabled: true
experimental:
  kubernetesGateway:
    enabled: true

# 安装
helm repo add traefik https://traefik.github.io/charts
helm install traefik traefik/traefik -f traefik-values.yaml

Cilium Gateway(eBPF加速):

# cilium-values.yaml
kubeProxyReplacement: true
gatewayAPI:
  enabled: true
envoy:
  enabled: true

# 安装
helm install cilium cilium/cilium -n kube-system -f cilium-values.yaml

第四章:安全与稳定性增强

4.1 gitRepo卷驱动禁用

Kubernetes 1.36永久禁用了gitRepo卷驱动，这是出于安全考虑的重要变更:

禁用原因:

允许Pod在节点上执行任意git命令，存在代码注入风险
难以审计和管控从外部仓库拉取的代码
与GitOps理念不符(配置应声明式管理)

迁移方案:

# 旧方式(已禁用)
volumes:
- name: git-volume
  gitRepo:
    repository: "https://github.com/example/repo.git"
    revision: "main"
    directory: "."

# 新方式1:使用initContainer + emptyDir
volumes:
- name: code-volume
  emptyDir: {}
initContainers:
- name: git-clone
  image: alpine/git:latest
  command:
  - sh
  - -c
  - |
    git clone --depth 1 --branch main \
      https://github.com/example/repo.git /workspace
  volumeMounts:
  - name: code-volume
    mountPath: /workspace
  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
    capabilities:
      drop:
      - ALL
containers:
- name: app
  volumeMounts:
  - name: code-volume
    mountPath: /app
    readOnly: true

# 新方式2:使用ConfigMap/Secret存储配置
# 新方式3:使用CSI驱动(如secrets-store-csi-driver)

4.2 ServiceAccount令牌外部签名

1.36版本引入了ServiceAccount令牌的外部签名机制，增强了多集群场景下的身份验证能力:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cross-cluster-sa
  namespace: default
  annotations:
    # 指定外部签名提供者
    serviceaccount.kubernetes.io/external-signer: "vault.example.com/k8s-signer"
spec:
  # 令牌配置
  tokenExpirationSeconds: 3600
  # 外部签名配置
  externalSigning:
    provider: vault
    config:
      vaultAddr: "https://vault.example.com"
      role: "k8s-token-signer"

4.3 SELinux卷标签优化

apiVersion: v1
kind: Pod
metadata:
  name: selinux-optimized
spec:
  securityContext:
    seLinuxOptions:
      level: "s0:c123,c456"
      role: "system_r"
      type: "container_t"
      user: "system_u"
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: my-pvc
      # 1.36新特性:精细化SELinux标签控制
      seLinuxMountOptions:
        context: "system_u:object_r:container_file_t:s0:c123,c456"
        relabelPolicy: "Shared"
  containers:
  - name: app
    securityContext:
      seLinuxOptions:
        level: "s0:c123,c456"
    volumeMounts:
    - name: data
      mountPath: /data

第五章:生产环境升级指南

5.1 升级前检查清单

#!/bin/bash
# pre-upgrade-check.sh

echo "=== Kubernetes 1.36 升级前检查 ==="

# 1. 检查gitRepo卷使用
echo "[检查] gitRepo卷使用情况..."
kubectl get pods --all-namespaces -o json | jq -r '
  .items[] |
  select(.spec.volumes[]?.gitRepo != null) |
  "\(.metadata.namespace)/\(.metadata.name)"
' | sort | uniq

# 2. 检查Ingress NGINX
echo "[检查] Ingress NGINX部署情况..."
kubectl get pods --all-namespaces -l app.kubernetes.io/name=ingress-nginx

# 3. 检查externalIPs使用
echo "[检查] Service externalIPs使用情况..."
kubectl get svc --all-namespaces -o json | jq -r '
  .items[] |
  select(.spec.externalIPs != null) |
  "\(.metadata.namespace)/\(.metadata.name): \(.spec.externalIPs)"
'

# 4. 检查API废弃警告
echo "[检查] API废弃警告..."
kubectl get events --all-namespaces --field-selector reason=DeprecatedAPIWarning

# 5. 检查DRA驱动准备状态
echo "[检查] DRA驱动..."
kubectl get resourceslices.resource.k8s.io 2>/dev/null || echo "DRA未部署"

5.2 渐进式升级策略

蓝绿升级方案:

# 使用Cluster API进行蓝绿升级
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-blue
spec:
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: control-plane-blue
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: cluster-blue
---
# 创建1.36版本的新集群(green)
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-green
  annotations:
    kubernetes.version: "v1.36.0"
spec:
  # 新集群配置...

5.3 回滚预案

# 创建升级前快照
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).db

# 如果升级失败，回滚到1.35
kubeadm upgrade apply v1.35.2 --force

# 或者使用备份恢复
etcdctl snapshot restore /backup/etcd-$(date +%Y%m%d).db \
  --data-dir=/var/lib/etcd-backup

第六章:性能优化与监控

6.1 调度器性能调优

# kube-scheduler高性能配置
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
  plugins:
    multiPoint:
      enabled:
      - name: PrioritySort
      - name: NodeResourcesFit
      - name: NodePorts
      - name: VolumeBinding
      - name: PodTopologySpread
      - name: InterPodAffinity
      - name: DRAFilter
  pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: MostAllocated
        resources:
        - name: cpu
          weight: 1
        - name: memory
          weight: 1
  - name: DRAFilter
    args:
      cacheTimeout: 10s
      maxConcurrentEvaluations: 200

6.2 监控指标采集

# Prometheus ServiceMonitor配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: k8s-136-metrics
spec:
  endpoints:
  - port: https
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    metricRelabelings:
    # DRA相关指标
    - sourceLabels: [__name__]
      regex: 'dra_(.*)'
      targetLabel: component
      replacement: 'dra'
    # Gateway API指标
    - sourceLabels: [__name__]
      regex: 'gateway_(.*)'
      targetLabel: component
      replacement: 'gateway'
  selector:
    matchLabels:
      app: kube-apiserver

关键监控指标:

# DRA资源分配延迟
histogram_quantile(0.95, 
  rate(dra_allocation_duration_seconds_bucket[5m])
)

# Gateway API请求延迟
histogram_quantile(0.99,
  rate(gateway_request_duration_seconds_bucket[5m])
)

# 调度器吞吐量
rate(scheduler_schedule_attempts_total[5m])

# API Server请求延迟
histogram_quantile(0.99,
  rate(apiserver_request_duration_seconds_bucket[5m])
)

第七章:未来展望与总结

7.1 Kubernetes演进趋势

Kubernetes 1.36标志着云原生基础设施进入新的发展阶段:

资源管理精细化:DRA的GA使得Kubernetes能够管理从CPU到GPU、从存储到网络的全栈异构资源
网络标准化:Gateway API的成熟将逐步替代Ingress，成为统一的流量管理标准
安全默认化:从gitRepo禁用、SELinux优化到ServiceAccount改进，安全正在成为默认配置

7.2 升级建议

立即行动:

评估gitRepo卷的使用情况并制定迁移计划
开始规划Ingress NGINX的替代方案
测试DRA在开发环境的可行性

中期规划:

建立Gateway API的技术储备
评估GPU工作负载的DRA迁移
升级监控体系以支持新特性

长期愿景:

构建基于DRA的AI/ML平台
实现多集群统一流量管理
建立GitOps驱动的配置管理体系

7.3 结语

Kubernetes 1.36 "Haru"版本正如其名，为云原生领域带来了"晴朗"的前景。DRA的生产就绪让异构资源管理变得可能，Gateway API的标准化让流量管理更加统一，而安全特性的增强则为生产环境提供了更坚实的保障。

对于云原生工程师而言，这是一个需要积极拥抱变化的版本。无论是DRA带来的资源管理革命，还是Ingress NGINX退役带来的架构调整，都将在未来数年内深刻影响我们的技术选型和实践方式。

正如葛饰北斋在《富嶽三十六景》中从不同角度描绘富士山，Kubernetes 1.36也为我们提供了审视云原生基础设施的新视角——一个更加精细、更加安全、更加标准化的视角。

附录:参考资源

本文撰写于2026年4月，基于Kubernetes 1.36.0版本。由于Kubernetes项目持续演进，部分细节可能在后续版本中有所调整，请以官方文档为准。

复制全文生成海报 Kubernetes 云原生容器编排 DRA Gateway API Ingress DevOps