在云计算和容器化技术普及的今天,Kubernetes(简称K8s)已成为容器编排领域的事实标准。作为一名长期从事DevOps实践的工程师,我经历过各种环境的K8s部署,其中CentOS作为企业级Linux发行版,因其稳定性和长期支持特性,成为生产环境部署Kubernetes的热门选择。本文将基于CentOS 7/8系统,详细拆解一个生产可用的Kubernetes集群部署全流程。
不同于官方文档的通用说明,我会重点分享在实际企业环境中部署时遇到的典型问题及其解决方案。例如,在金融行业某次部署中,由于节点时钟不同步导致证书验证失败;在电商系统部署时,曾因防火墙配置不当造成节点间通信故障。这些实战经验都会融入各个部署环节中。
对于生产环境,建议至少准备:
测试环境可使用2台虚拟机(1master+1worker),但需要注意:
关键提示:所有节点的主机名必须能够互相解析,这是后续证书生成的基础。我曾遇到因主机名解析失败导致kubeadm init卡在证书生成阶段的案例。
在所有节点执行以下操作:
bash复制# 关闭SELinux(临时)
setenforce 0
# 永久关闭
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# 关闭防火墙(生产环境应开放特定端口)
systemctl stop firewalld
systemctl disable firewalld
# 加载br_netfilter模块
modprobe br_netfilter
echo 'br_netfilter' > /etc/modules-load.d/br_netfilter.conf
# 配置内核参数
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
虽然K8s支持多种容器运行时,但Docker仍是目前最成熟的选择:
bash复制# 安装依赖
yum install -y yum-utils device-mapper-persistent-data lvm2
# 添加Docker仓库
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# 安装指定版本(避免最新版兼容性问题)
yum install -y docker-ce-20.10.7 docker-ce-cli-20.10.7 containerd.io
# 配置cgroup驱动为systemd
mkdir -p /etc/docker
cat <<EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
# 启动服务
systemctl enable docker && systemctl start docker
配置K8s仓库并安装三件套:
bash复制cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
# 安装指定版本(生产环境建议锁定版本)
yum install -y kubelet-1.22.3 kubeadm-1.22.3 kubectl-1.22.3
# 禁止自动更新
yum -y install yum-versionlock
yum versionlock add kubelet kubeadm kubectl
# 启动kubelet
systemctl enable --now kubelet
使用kubeadm初始化第一个控制平面节点:
bash复制kubeadm init \
--apiserver-advertise-address=192.168.1.100 \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12 \
--kubernetes-version=v1.22.3 \
--image-repository=registry.aliyuncs.com/google_containers
初始化成功后,按照提示配置kubectl:
bash复制mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
选择Calico作为网络插件(兼容性较好):
bash复制kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
验证核心组件状态:
bash复制kubectl get pods -n kube-system
# 等待所有Pod变为Running状态
在Worker节点执行Master初始化后输出的join命令:
bash复制kubeadm join 192.168.1.100:6443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxxxxxxxxx
在Master节点验证节点状态:
bash复制watch kubectl get nodes
# 等待所有节点状态变为Ready
修改kubeadm配置实现证书自动续期:
bash复制# 编辑kubelet配置
vi /var/lib/kubelet/config.yaml
# 增加或修改:
rotateCertificates: true
serverTLSBootstrap: true
# 重启kubelet
systemctl restart kubelet
对于控制平面高可用,需要:
示例HAProxy配置片段:
text复制frontend k8s-api
bind *:6443
mode tcp
default_backend k8s-api-nodes
backend k8s-api-nodes
mode tcp
balance roundrobin
server master1 192.168.1.100:6443 check
server master2 192.168.1.101:6443 check
server master3 192.168.1.102:6443 check
检查流程:
systemctl status kubeletjournalctl -xeu kubeletdocker ps -a 或 crictl pskubectl get pods -n kube-system典型问题解决方案:
kubeadm alpha certs renew allfree -h和df -h输出Pod间通信检查:
bash复制# 在Pod内测试
kubectl run test-$RANDOM --image=busybox --rm -it -- sh
# 执行 ping <其他PodIP>
# 检查Calico日志
kubectl logs -n kube-system <calico-pod-name>
Service无法访问排查:
kubectl get endpoints <service-name>kubectl logs -n kube-system <kube-proxy-pod>iptables-save | grep <service-ip>追加到/etc/sysctl.d/k8s.conf:
text复制# 提高连接跟踪表大小
net.netfilter.nf_conntrack_max=1048576
net.netfilter.nf_conntrack_tcp_timeout_established=86400
# 优化网络性能
net.core.somaxconn=32768
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.tcp_tw_reuse=1
编辑/var/lib/kubelet/config.yaml:
yaml复制systemReserved:
cpu: "500m"
memory: "500Mi"
kubeReserved:
cpu: "500m"
memory: "500Mi"
evictionHard:
memory.available: "200Mi"
nodefs.available: "10%"
创建最小权限ServiceAccount示例:
bash复制kubectl create serviceaccount limited-user
kubectl create role pod-reader --verb=get,list,watch --resource=pods
kubectl create rolebinding limited-user-binding \
--role=pod-reader \
--serviceaccount=default:limited-user
启用PSP(需在kube-apiserver开启对应特性门控):
yaml复制apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
seLinux:
rule: RunAsAny
supplementalGroups:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
runAsUser:
rule: 'MustRunAsNonRoot'
fsGroup:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
定期备份集群状态:
bash复制# 备份所有资源定义
kubectl get all --all-namespaces -o yaml > cluster-state-$(date +%F).yaml
# 备份etcd(在Master节点执行)
docker run --rm \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
-v /var/lib/etcd:/var/lib/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd:3.5.0 \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /var/lib/etcd/snapshot.db
从备份恢复etcd:
bash复制# 停止kube-apiserver
docker stop $(docker ps | grep kube-apiserver | awk '{print $1}')
# 执行恢复
docker run --rm \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
-v /var/lib/etcd:/var/lib/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd:3.5.0 \
etcdctl snapshot restore /var/lib/etcd/snapshot.db \
--data-dir /var/lib/etcd/restored
# 替换原数据目录
mv /var/lib/etcd/restored/* /var/lib/etcd/
K8s版本升级建议路线:
具体命令示例:
bash复制# 查看可升级版本
yum list --showduplicates kubeadm --disableexcludes=kubernetes
# 升级kubeadm
yum install -y kubeadm-1.23.5 --disableexcludes=kubernetes
# 检查升级计划
kubeadm upgrade plan
# 执行Master节点升级
kubeadm upgrade apply v1.23.5
# 升级kubelet和kubectl
yum install -y kubelet-1.23.5 kubectl-1.23.5 --disableexcludes=kubernetes
systemctl daemon-reload
systemctl restart kubelet
在金融行业的生产实践中,我们通常会在测试环境验证至少两周后,选择业务低峰期进行滚动升级,每次升级间隔一周观察稳定性。这种保守策略虽然耗时,但能最大限度保证业务连续性。