1. Kubernetes容器编排完全指南:从单机到分布式集群
去年我们团队经历了一次惊心动魄的流量洪峰——双十一大促期间,原本稳定的Docker Compose架构在流量激增300%的情况下彻底崩溃。那次事故后,我们痛定思痛,全面迁移到Kubernetes。现在,同样的流量波动对我们来说只是小菜一碟,集群可以自动扩展应对,工程师再也不用半夜爬起来手动扩容了。
如果你也正在考虑从单机部署转向分布式集群,或者刚接触Kubernetes感到无从下手,这篇指南将带你快速掌握核心要领。不同于官方文档的抽象描述,我会结合我们团队踩过的坑、验证过的方案,给你最接地气的实操建议。
2. 为什么需要Kubernetes?
2.1 单机部署的致命缺陷
我们最初用Docker Compose部署的电商系统,在开发环境跑得挺好,但上了生产就问题频出:
-
致命单点故障:某次机房断电,整个服务不可用长达2小时。Kubernetes的多节点部署可以自动将Pod调度到健康节点,实现秒级故障转移。
-
手动扩缩容效率低下:大促时需要提前预估流量,手动增加服务器。有次预估失误,临时加机器都来不及。Kubernetes的HPA可以根据CPU/内存使用率自动增减Pod数量。
-
升级如走钢丝:每次更新都要停服维护,用户投诉不断。Kubernetes的滚动更新可以实现零停机部署,新版本Pod就绪后才会终止旧Pod。
-
资源配置浪费:为应对峰值,平时闲置30%的服务器资源。Kubernetes的bin packing调度算法可以将多个应用紧凑部署,提升资源利用率。
2.2 Kubernetes的分布式优势
在生产环境运行一年后,我们统计了关键指标对比:
| 指标 | Docker Compose | Kubernetes | 提升幅度 |
|---|---|---|---|
| 部署效率 | 45分钟/次 | 3分钟/次 | 15倍 |
| 故障恢复时间 | 15-30分钟 | <1分钟 | 30倍 |
| 服务器利用率 | 40-60% | 75-85% | 50% |
| 突发流量承载能力 | 3倍基准流量 | 10倍基准流量 | 3.3倍 |
| 运维人力投入 | 3人/天 | 0.5人/天 | 6倍 |
3. 核心架构深度解析
3.1 控制平面组件协作原理
Master节点就像集群的大脑,由四个关键组件构成精密协作系统:
-
API Server:所有请求的唯一入口,采用声明式API设计。当收到一个Deployment创建请求时:
- 验证请求合法性
- 将配置写入etcd
- 触发Controller Manager工作
-
etcd:采用Raft协议保证一致性的键值存储。我们曾因etcd磁盘满导致集群瘫痪,现在严格监控其存储用量,建议:
- 使用SSD磁盘
- 定期做快照备份
- 设置自动压缩历史数据
-
Controller Manager:包含30多种控制器,持续对比实际状态与期望状态。比如Deployment控制器发现实际Pod数量少于replicas定义时,会通过API Server创建新的Pod。
-
Scheduler:为Pod选择最优节点的决策引擎。我们优化调度策略的经验:
- 给数据库Pod添加亲和性规则,固定到特定节点
- 对计算密集型应用设置反亲和性,避免资源竞争
- 定义优先级抢占策略,确保关键业务优先
3.2 工作节点内部机制
每个Node节点就像勤劳的工人,主要运行三个核心进程:
-
kubelet:最复杂的组件,负责:
- 定时向API Server汇报节点状态
- 按照PodSpec创建/销毁容器
- 执行存活探针和就绪探针
- 挂载存储卷
常见问题:当镜像拉取失败时,kubelet会不断重试。我们建议:
bash复制# 查看kubelet日志定位问题 journalctl -u kubelet -n 50 -
kube-proxy:网络流量的交通警察,通过iptables/IPVS实现:
- Service的虚拟IP到Pod IP的转换
- 负载均衡规则维护
- 网络策略执行
-
容器运行时:我们对比测试后选择containerd,相比Docker:
- 内存占用减少40%
- 启动速度快20%
- 更稳定的CRI接口实现
4. 生产级集群搭建实战
4.1 高可用控制平面部署
使用kubeadm搭建生产集群时,这三个配置项最关键:
yaml复制apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
controlPlaneEndpoint: "负载均衡IP:6443" # 必须配置
apiServer:
extraArgs:
advertise-address: "192.168.1.100"
etcd-servers: "https://etcd1:2379,https://etcd2:2379"
controllerManager:
extraArgs:
node-monitor-period: "2s" # 加快故障检测
scheduler:
extraArgs:
bind-address: "0.0.0.0"
networking:
podSubnet: "10.244.0.0/16" # 匹配CNI插件
serviceSubnet: "10.96.0.0/12"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: "unix:///var/run/containerd/containerd.sock"
kubeletExtraArgs:
node-ip: "192.168.1.100" # 明确指定节点IP
部署后必须检查:
bash复制# 验证各组件健康状态
kubectl get cs
# 检查证书有效期
kubeadm certs check-expiration
# 测试etcd集群健康度
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key endpoint health
4.2 关键插件选型建议
-
CNI网络插件:经过性能测试,我们最终选择Calico:
- 吞吐量:比Flannel高30%
- 延迟:平均降低15ms
- 支持网络策略
- 易于排查问题
安装命令:
bash复制
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/tigera-operator.yaml kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/custom-resources.yaml -
Ingress Controller:根据协议支持需求选择:
- Nginx Ingress:最通用,支持gRPC
- Traefik:更好的Dashboard和自动服务发现
- ALB Ingress:AWS环境深度集成
-
监控方案:Prometheus Operator + Grafana是标配,但要特别注意:
- 配置资源限制避免OOM
- 使用Thanos或VictoriaMetrics解决长期存储
- 对核心业务指标设置合理告警阈值
5. 工作负载管理进阶技巧
5.1 Deployment高级配置模板
这是我们线上使用的增强版Deployment配置,包含所有最佳实践:
yaml复制apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
labels:
app: frontend
annotations:
# 记录变更历史便于审计
change-log: "2023-08-01 - 增加资源限制"
spec:
revisionHistoryLimit: 5 # 保留5个旧版本用于回滚
progressDeadlineSeconds: 600 # 部署超时时间
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25% # 最大可超出replicas的数量
maxUnavailable: 0 # 保证始终有可用实例
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
annotations:
# 配合Linkerd实现细粒度监控
linkerd.io/inject: enabled
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: [frontend]
topologyKey: kubernetes.io/hostname
containers:
- name: web
image: nginx:1.23-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
protocol: TCP
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 80
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 2
successThreshold: 1
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"] # 优雅终止等待时间
envFrom:
- configMapRef:
name: frontend-config
volumeMounts:
- name: tmp-volume
mountPath: /tmp
volumes:
- name: tmp-volume
emptyDir: {}
terminationGracePeriodSeconds: 30 # Pod终止宽限期
5.2 StatefulSet有状态应用管理
部署MySQL集群的经典模式:
yaml复制apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql # 必须匹配Headless Service
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: mysql:8.0
command:
- bash
- "-c"
- |
set -ex
# 基于序号生成server-id
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
# 主从配置
if [[ $ordinal -eq 0 ]]; then
echo "binlog_format=ROW" >> /mnt/conf.d/master.cnf
else
echo "binlog_format=ROW" >> /mnt/conf.d/slave.cnf
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- name: mysql
containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
volumes:
- name: conf
emptyDir: {}
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "ssd"
resources:
requests:
storage: 50Gi
关键注意事项:
- 首次启动需要手动初始化主从复制
- 备份方案建议使用Percona XtraBackup
- 监控binlog延迟和连接数
- 升级时要严格按顺序操作
6. 网络与服务治理实战
6.1 Service流量控制进阶
场景一:金丝雀发布时的流量切分
yaml复制apiVersion: v1
kind: Service
metadata:
name: frontend
spec:
selector:
app: frontend
ports:
- protocol: TCP
port: 80
targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: frontend-ingress
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "20" # 20%流量到新版本
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
场景二:基于Header的流量路由
yaml复制apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: frontend-ingress
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "true"
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend-canary # 金丝雀版本
port:
number: 80
6.2 网络策略精细化控制
限制只有特定命名空间的Pod可以访问MySQL:
yaml复制apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: mysql-allow-only-backend
spec:
podSelector:
matchLabels:
app: mysql
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: backend
ports:
- protocol: TCP
port: 3306
禁止所有跨命名空间通信的默认策略:
yaml复制apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-namespace
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector: {}
egress:
- to:
- podSelector: {}
7. 存储方案选型与优化
7.1 持久化卷性能对比
我们在AWS环境测试的存储类性能数据:
| 存储类型 | IOPS | 吞吐量(MB/s) | 延迟(ms) | 适用场景 |
|---|---|---|---|---|
| gp2 (默认) | 3000 | 250 | 1-2 | 常规工作负载 |
| gp3 (推荐) | 6000 | 500 | 0.5-1 | 高性价比通用场景 |
| io1 (高IOPS) | 16000 | 1000 | 0.3-0.5 | 数据库类应用 |
| st1 (吞吐优化) | 500 | 500 | 5-10 | 大数据分析 |
| sc1 (冷存储) | 250 | 250 | 10-20 | 归档数据 |
生产环境推荐配置:
yaml复制apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iops: "4000" # 根据负载调整
throughput: "250" # MB/s
encrypted: "true" # 必须开启加密
fsType: ext4
volumeBindingMode: WaitForFirstConsumer # 延迟绑定
reclaimPolicy: Retain # 防止误删
allowVolumeExpansion: true # 允许在线扩容
7.2 数据备份与恢复方案
备份策略:
- 应用层备份:mysqldump等工具导出数据
- 存储卷快照:定期创建EBS快照
- 全集群备份:使用Velero工具
Velero备份示例:
bash复制# 安装CLI工具
brew install velero
# 配置AWS凭证
aws configure
# 安装服务端
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.5.0 \
--bucket my-backup-bucket \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2 \
--use-volume-snapshots=true \
--secret-file ./credentials-velero
# 创建定时备份
velero schedule create daily-backup \
--schedule="0 3 * * *" \
--include-namespaces=production \
--ttl=720h
# 恢复备份
velero restore create --from-backup daily-backup-20230801
8. 安全加固最佳实践
8.1 最小权限原则实施
- ServiceAccount权限控制:
yaml复制apiVersion: v1
kind: ServiceAccount
metadata:
name: frontend-sa
automountServiceAccountToken: false # 禁止自动挂载[token](https://taotoken.net?utm_source=general)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: frontend-role
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: frontend-rolebinding
subjects:
- kind: ServiceAccount
name: frontend-sa
roleRef:
kind: Role
name: frontend-role
apiGroup: rbac.authorization.k8s.io
- Pod安全策略(PSP已弃用,改用Pod Security Admission):
yaml复制apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: baseline
8.2 敏感数据保护方案
- Secrets加密存储:
bash复制# 启用etcd加密
kubectl create secret generic encryption-key \
--from-literal=key=$(head -c 32 /dev/urandom | base64)
# 修改API Server配置
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-secret>
- identity: {} # 允许解密已有数据
- Vault集成方案:
yaml复制apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
template:
spec:
initContainers:
- name: vault-agent
image: vault:1.12
command: ["/bin/sh", "-c"]
args:
- |
vault agent -config=/etc/vault/config.hcl
volumeMounts:
- name: vault-config
mountPath: /etc/vault
containers:
- name: app
image: my-app:latest
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: vault-secret
key: db_password
volumes:
- name: vault-config
configMap:
name: vault-agent-config
9. 监控与日志体系构建
9.1 指标监控黄金指标
我们定义的Kubernetes监控指标体系:
-
集群健康指标:
- API Server延迟和错误率
- etcd写入延迟和心跳间隔
- 节点CPU/内存/Disk压力
-
工作负载指标:
- Pod重启次数
- 容器CPU/内存使用率
- 网络吞吐量和错误包率
- 存储IOPS和延迟
-
业务指标:
- HTTP请求成功率
- 服务响应时间P99
- 队列积压数量
- 数据库连接池使用率
Prometheus配置示例:
yaml复制- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
9.2 日志收集优化方案
EFK架构下的优化技巧:
- Fluentd配置优化:
xml复制<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
keep_time_key true
</parse>
</source>
<filter kubernetes.**>
@type record_[transformer](https://taotoken.net/?utm_source=general)
enable_ruby true
<record>
pod_name ${record.dig("kubernetes", "pod_name")}
namespace ${record.dig("kubernetes", "namespace_name")}
container_name ${record.dig("kubernetes", "container_name")}
log ${record["log"].strip}
</record>
</filter>
<match kubernetes.**>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix kubernetes
buffer_chunk_limit 2M
buffer_queue_limit 32
flush_interval 5s
max_retry_wait 30
disable_retry_limit
num_threads 4
</match>
- 日志采样策略:
yaml复制apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<filter kubernetes.**>
@type sample
rate 10 # 10%采样率
invert_sampling true # 只保留采样日志
add_tag_prefix sampled.
</filter>
10. 持续交付与GitOps实践
10.1 Argo CD部署流水线
典型应用目录结构:
code复制apps/
└── frontend/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
└── overlays/
├── production/
│ ├── replica-count-patch.yaml
│ └── kustomization.yaml
└── staging/
├── resource-limits-patch.yaml
└── kustomization.yaml
Argo CD Application配置:
yaml复制apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: frontend-prod
spec:
destination:
server: https://kubernetes.default.svc
namespace: production
project: default
source:
path: apps/frontend/overlays/production
repoURL: git@github.com:my-org/gitops-repo.git
targetRevision: HEAD
syncPolicy:
automated:
prune: true # 自动清理已删除资源
selfHeal: true # 自动修复偏差
syncOptions:
- CreateNamespace=true
10.2 金丝雀发布策略
渐进式交付方案:
yaml复制apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: frontend
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
service:
port: 80
targetPort: 8080
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 5
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 30s
webhooks:
- name: load-test
type: pre-rollout
url: http://loadtester/start
timeout: 5m
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://frontend-canary.production/"
- name: acceptance-test
type: rollout
url: http://test-server/
timeout: 5m
11. 性能调优实战记录
11.1 API Server优化参数
我们调整后的关键参数:
yaml复制apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
qps: 100 # 默认50
burst: 200 # 默认100
leaderElection:
leaderElect: true
leaseDuration: 15s # 默认15s
renewDeadline: 10s # 默认10s
retryPeriod: 2s # 默认2s
profiles:
- schedulerName: default-scheduler
pluginConfig:
- name: DefaultPreemption
args:
minCandidateNodesPercentage: 20 # 默认10
minCandidateNodesAbsolute: 50 # 默认100
11.2 节点内核参数调优
/etc/sysctl.d/k8s.conf优化配置:
code复制# 增加连接跟踪表大小
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_buckets = 65536
# 提升端口范围
net.ipv4.ip_local_port_range = 1024 65535
# 加快TIME_WAIT回收
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
# 提升文件描述符限制
fs.file-max = 2097152
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# 内存分配策略
vm.swappiness = 0
vm.overcommit_memory = 1
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
12. 故障排查工具箱
12.1 问题诊断流程图
plaintext复制Pod状态异常排查路径:
1. kubectl describe pod <name>
- 查看Events部分错误信息
- 检查状态变化时间线
2. 常见问题分类:
├── Pending
│ ├── 资源不足 → 检查kubectl describe nodes
│ ├── PVC未绑定 → 检查PV和StorageClass
│ └── 镜像拉取失败 → 检查镜像地址和凭证
│
├── CrashLoopBackOff
│ ├→ kubectl logs --previous 查看上次日志
│ └→ 检查资源限制是否过小
│
└── Running但无响应
├→ kubectl exec进入容器检查进程
└→ 检查网络策略和Service配置
12.2 实用诊断命令集
bash复制# 查看资源分配情况
kubectl describe nodes | grep -A 10 "Allocated resources"
# 检查API Server请求延迟
kubectl get --raw /metrics | grep apiserver_request_duration_seconds
# 追踪Service到Pod的链路
kubectl get endpoints <service-name>
kubectl get pods -o wide | grep <pod-ip>
# 诊断DNS问题
kubectl run -it --rm --image=infoblox/dnstools:latest dnstools
> dig <service>.<namespace>.svc.cluster.local
# 网络连通性测试
kubectl run -it --rm --image=alpine:latest nettest -- sh
> ping <target-ip>
> nc -zv <service> <port>
# 检查证书有效期
openssl x509 -noout -dates -in /etc/kubernetes/pki/apiserver.crt
13. 多集群管理方案
13.1 Cluster API架构
使用Cluster API管理多集群的部署模型:
yaml复制apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
spec:
clusterNetwork:
pods:
cidrBlocks: ["10.244.0.0/16"]
services:
cidrBlocks: ["10.96.0.0/12"]
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: production-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: production-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: production-cluster
spec:
region: us-west-2
sshKeyName: cluster-api
networkSpec:
vpc:
cidrBlock: "10.0.0.0/16"
subnets:
- cidrBlock: "10.0.1.0/24"
availabilityZone: us-west-2a
isPublic: false
13.2 联邦集群配置
KubeFed核心配置示例:
yaml复制apiVersion: core.kubefed.io/v1beta1
kind: KubeFedCluster
metadata:
name: cluster-1
spec:
apiEndpoint: https://cluster1.example.com:6443
secretRef:
name: cluster-1-credentials
---
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: frontend
namespace: production
spec:
placement:
clusters:
- name: cluster-1
- name: cluster-2
template:
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
overrides:
- clusterName: cluster-2
clusterOverrides:
- path: "/spec/replicas"
value: 5
14. 成本优化实战经验
14.1 节点自动伸缩策略
Cluster Autoscaler配置建议:
yaml复制apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.25.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste # 选择最少浪费资源的机型
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
resources:
limits:
cpu: 100m
memory: 600Mi
requests:
cpu: 100m
memory: 600Mi
14.2 资源利用率提升方案
-
工作负载密度优化:
- 使用Vertical Pod Autoscaler自动调整requests/limits
- 部署密度计算器:
bash复制kubectl top nodes kubectl get pods -o wide | grep <node-name> | wc -l
-
Spot实例集成:
yaml复制apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: spot
spec:
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["m5.large", "m5a.large", "m5d.large"]
limits:
resources:
cpu: 100
memory: 1000Gi
ttlSecondsAfterEmpty: 60 # 空节点60秒后回收
15. 新兴技术集成展望
15.1 eBPF网络加速方案
Cilium + eBPF的部署示例:
yaml复制apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
name: allow-cross-namespace
spec:
endpointSelector: {}
egress:
- toEntities:
- cluster
ingress:
- fromEntities:
- cluster
---
apiVersion: cilium.io/v2
kind: Cilium