在容器编排领域,Kubernetes已成为事实标准,但国内用户在部署过程中常遇到镜像拉取失败、网络不稳定等痛点。本文将手把手带您完成一个生产可用的K8s集群部署,特别针对国内网络环境优化,整合阿里云镜像加速等实用技巧。
部署生产级Kubernetes集群前,需要确保基础环境满足以下要求:
硬件配置:
操作系统:
bash复制# 检查系统版本
cat /etc/redhat-release
# 输出应为:CentOS Linux release 7.x
生产环境必须关闭Swap并优化内核参数:
bash复制# 永久关闭Swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# 加载br_netfilter模块
sudo modprobe br_netfilter
echo "br_netfilter" | sudo tee /etc/modules-load.d/br_netfilter.conf
# 设置内核参数
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
提示:生产环境如需保留防火墙,需开放以下端口:
- Master节点:6443, 2379-2380, 10250-10252
- Worker节点:10250, 30000-32767
使用阿里云镜像源加速Docker安装:
bash复制# 安装依赖
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# 添加阿里云Docker源
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安装指定版本(生产推荐19.03+)
sudo yum install -y docker-ce-19.03.15 docker-ce-cli-19.03.15 containerd.io
配置Docker镜像加速与cgroup驱动:
json复制// /etc/docker/daemon.json
{
"registry-mirrors": ["https://<your-aliyun-mirror>.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
重启服务并设置开机启动:
bash复制sudo systemctl daemon-reload
sudo systemctl restart docker
sudo systemctl enable docker
配置阿里云Kubernetes源:
bash复制cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装指定版本组件:
bash复制# 查看可用版本
yum list kubeadm --showduplicates | sort -r
# 安装特定版本(生产推荐1.18+)
sudo yum install -y kubeadm-1.18.20 kubelet-1.18.20 kubectl-1.18.20
sudo systemctl enable kubelet
bash复制sudo kubeadm init \
--kubernetes-version=v1.18.20 \
--apiserver-advertise-address=<MASTER_IP> \
--image-repository=registry.aliyuncs.com/google_containers \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=192.168.0.0/16 \
--upload-certs
成功初始化后,记录join命令:
code复制kubeadm join <MASTER_IP>:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH> \
--control-plane --certificate-key <KEY>
配置kubectl:
bash复制mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
下载定制化Calico配置:
bash复制wget https://docs.projectcalico.org/manifests/calico.yaml -O calico.yaml
修改CIDR匹配初始化参数:
yaml复制# 修改calico.yaml
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
部署网络插件:
bash复制kubectl apply -f calico.yaml
验证安装:
bash复制watch kubectl get pods -n kube-system
# 等待所有Pod变为Running状态
在每个Worker节点执行:
bash复制sudo kubeadm join <MASTER_IP>:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
验证节点状态:
bash复制kubectl get nodes
# 应显示所有节点Ready状态
下载定制化Dashboard配置:
bash复制wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml -O dashboard.yaml
修改Service为NodePort:
yaml复制# 在dashboard.yaml末尾添加
---
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kubernetes-dashboard
spec:
type: NodePort
ports:
- port: 443
targetPort: 8443
nodePort: 30443
selector:
k8s-app: kubernetes-dashboard
部署Dashboard:
bash复制kubectl apply -f dashboard.yaml
创建管理员账户:
bash复制cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
EOF
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
EOF
获取访问Token:
bash复制kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')
访问Dashboard:
code复制https://<MASTER_IP>:30443
| 组件 | 推荐配置 | 说明 |
|---|---|---|
| API Server | 3节点负载均衡 | 使用Nginx或HAProxy实现 |
| etcd | 奇数节点(3/5)集群 | 跨机架/可用区部署 |
| Control Plane | 多Master节点 | 使用kubeadm join添加 |
监控方案:
bash复制# 安装Prometheus Operator
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/setup
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/
日志收集:
bash复制# 安装EFK栈
kubectl apply -f https://raw.githubusercontent.com/elastic/cloud-on-k8s/2.7/config/recipes/beats/filebeat.yaml
检查集群健康状态:
bash复制kubectl get componentstatuses
kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'
安全升级集群:
bash复制# 升级kubeadm
yum install -y kubeadm-1.18.20-0
# 升级控制平面
kubeadm upgrade apply v1.18.20
# 升级节点
kubeadm upgrade node
yum install -y kubelet-1.18.20-0 kubectl-1.18.20-0
systemctl restart kubelet