Kubernetes节点运维：安全下线与新增节点实践指南-代码聚汇网

Kubernetes节点运维：安全下线与新增节点实践指南

斯迈尔齿科

1. Kubernetes节点下线与新增标准流程解析

在生产环境中管理Kubernetes集群时，节点的下线与新增是最基础但也是最容易出问题的运维操作之一。我经历过多次因为节点操作不当导致的业务中断，特别是在金融行业的容器化改造项目中，一个不当的drain操作就可能导致核心交易系统短暂不可用。本文将分享经过生产验证的标准操作流程，以及那些只有踩过坑才知道的细节。

2. 节点下线全流程详解

2.1 预检查与业务影响评估

在执行任何下线操作前，必须进行全面的业务影响评估。我通常会执行以下检查：

bash复制# 查看目标节点上运行的所有Pod
kubectl get pods -A -o wide --field-selector spec.nodeName=k8s-node02

# 检查节点资源使用情况
kubectl describe node k8s-node02 | grep -A 10 "Allocated resources"

# 识别关键业务Pod
kubectl get pods -A -l appType=critical -o wide | grep k8s-node02

重要提示：务必提前与业务团队确认维护窗口期，特别是对延迟敏感的业务（如支付系统）需要安排在低峰期操作。

2.2 标记节点不可调度（Cordon）

将节点标记为不可调度是下线操作的第一步，这可以防止新的Pod被调度到该节点：

bash复制kubectl cordon k8s-node02

验证状态：

bash复制kubectl get node k8s-node02

输出中应显示SchedulingDisabled状态。这里有个细节：cordon操作不会影响已经运行在该节点上的Pod，只是阻止新Pod的调度。

2.3 Pod驱逐（Drain）的深度实践

驱逐Pod是整个流程中最容易出问题的环节，需要特别注意以下几点：

2.3.1 基础驱逐命令

bash复制kubectl drain k8s-node02 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=300 \
  --timeout=5m

参数说明：

--ignore-daemonsets：必须添加，否则会因DaemonSet Pod无法删除而失败
--delete-emptydir-data：删除使用emptyDir卷的Pod数据
--grace-period：给Pod的优雅终止时间（秒）
--timeout：整个drain操作的超时时间

2.3.2 特殊Pod处理方案

StatefulSet Pod处理：

bash复制# 先确认StatefulSet的副本数和存储类型
kubectl get statefulset -A -o wide | grep k8s-node02
kubectl describe pod <pod-name> | grep -A 5 Volumes

如果StatefulSet使用本地存储（hostPath）且副本数为1，必须先进行以下操作之一：

修改StatefulSet配置增加副本数
将数据迁移到PVC存储
协调业务停机时间窗口

单副本Deployment处理：

bash复制# 临时增加副本数
kubectl scale deployment/<name> --replicas=2 -n <namespace>

2.3.3 驱逐后验证

bash复制# 检查节点上是否还有非DaemonSet Pod
kubectl get pods -A --field-selector spec.nodeName=k8s-node02 -o wide

# 检查被驱逐Pod的新状态
kubectl get pods -A -o wide | grep -Ev '1/1|2/2|3/3|Completed|Terminating'

2.4 节点移除与清理

2.4.1 从集群移除节点

bash复制kubectl delete node k8s-node02

2.4.2 节点本地清理

在被移除节点上执行：

bash复制systemctl stop kubelet
kubeadm reset -f
rm -rf /etc/cni/net.d /var/lib/kubelet /var/lib/etcd /etc/kubernetes
iptables -F && iptables -t nat -F
ipvsadm --clear

特别注意：如果使用了特定的CNI插件（如Calico），可能还需要执行插件特定的清理命令，例如：
bash复制calicoctl node decommission

3. 新增节点标准流程

3.1 节点预配置

3.1.1 系统基础配置

bash复制# 设置主机名
hostnamectl set-hostname k8s-node03

# 配置hosts（所有节点保持一致）
echo "192.168.1.10 k8s-master01
192.168.1.11 k8s-node01
192.168.1.12 k8s-node02
192.168.1.13 k8s-node03" >> /etc/hosts

# 关闭swap
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

# 内核参数调整
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system

3.1.2 容器运行时安装

以containerd为例：

bash复制# 安装containerd
apt-get install -y containerd
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
systemctl restart containerd

# 配置cgroup驱动
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

3.2 加入集群操作

3.2.1 获取加入命令

在master节点执行：

bash复制kubeadm token create --print-join-command

输出示例：

code复制kubeadm join 192.168.1.10:6443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

3.2.2 执行加入命令

在新节点上执行上述命令，成功后应看到类似输出：

code复制[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

3.3 加入后验证与配置

3.3.1 基础验证

bash复制# 在master节点查看节点状态
kubectl get nodes
kubectl describe node k8s-node03

# 检查节点资源容量
kubectl get node k8s-node03 -o json | jq '.status.capacity'

3.3.2 网络插件配置

如果使用Calico：

bash复制# 在新节点安装CNI插件二进制
curl -L https://github.com/projectcalico/calico/releases/download/v3.24.1/calicoctl-linux-amd64 -o /usr/local/bin/calicoctl
chmod +x /usr/local/bin/calicoctl

# 验证网络状态
calicoctl node status

3.3.3 节点标签与污点设置

bash复制# 添加节点标签
kubectl label nodes k8s-node03 node-role.kubernetes.io/worker=worker
kubectl label nodes k8s-node03 topology.kubernetes.io/zone=zone-a

# 设置污点（可选）
kubectl taint nodes k8s-node03 dedicated=special-user:NoSchedule

4. 生产环境关键注意事项

4.1 下线操作避坑指南

DaemonSet处理：
- 使用--ignore-daemonsets时，这些Pod仍会保留在节点上
- 如果需要完全清空节点，需先手动删除DaemonSet Pod
Pod中断预算（PDB）：
```
bash复制kubectl get poddisruptionbudgets -A
```
- 确保drain操作不会违反PDB规则
- 必要时临时调整PDB配置
长连接服务处理：
- 对于保持长连接的服务（如WebSocket），建议先通过API触发优雅关闭
- 监控连接数直到降为0再执行drain

4.2 新增节点常见问题排查

节点NotReady状态：

bash复制# 查看kubelet日志
journalctl -u kubelet -f

# 常见原因：
# - CNI插件未正确安装
# - 节点与master网络不通（检查6443端口）
# - 证书问题（检查/var/lib/kubelet/pki）

Pod无法调度到新节点：

bash复制kubectl describe node k8s-node03 | grep -A 10 Events
kubectl get pods -n kube-system -o wide | grep k8s-node03

检查节点资源是否充足
验证节点标签与Pod亲和性是否匹配

网络连通性问题：

bash复制# 在新节点测试与master的连接
telnet <master-ip> 6443

# 检查CNI插件日志
cat /var/log/calico/cni/cni.log

4.3 性能优化建议

大集群节点加入优化：

bash复制# 在master节点调整kube-controller-manager参数
--node-monitor-grace-period=20s
--node-startup-grace-period=1m

批量操作脚本示例：

bash复制# 并行下线多个节点
for node in node{01..03}; do
  kubectl cordon $node &
done
wait

for node in node{01..03}; do
  kubectl drain $node --ignore-daemonsets --delete-emptydir-data &
done
wait

节点资源预留配置：

bash复制# 在kubelet配置中（/var/lib/kubelet/config.yaml）
systemReserved:
  cpu: "500m"
  memory: "1Gi"
kubeReserved:
  cpu: "500m"
  memory: "1Gi"

在实际生产环境中，我建议将这些操作封装成Ansible Playbook或Terraform模块，并集成到CI/CD流水线中。对于大规模集群，可以考虑使用Cluster API等工具进行自动化管理。每次节点操作前，务必在测试环境验证流程，特别是Kubernetes版本升级后的兼容性测试。