在ARM架构上部署Kubernetes(k8s)集群正成为边缘计算和混合云场景下的热门需求。不同于传统的x86环境,ARM架构的CPU指令集差异导致容器镜像兼容性、内核参数调优等方面都存在特殊考量。我最近在树莓派集群和华为鲲鹏服务器上完成了k8s 1.34.5 + KubeSphere 3.4.1的部署实践,这里将完整记录从环境准备到最终落地的全流程。
选择这个特定版本组合有两个原因:k8s 1.34.5是当前ARM平台兼容性最稳定的版本之一,而KubeSphere 3.4.1作为开源的Kubernetes管理平台,提供了对ARM64架构的官方支持。部署方式上,我会同时覆盖在线和离线两种场景——在线部署适合有外网访问的环境,离线部署则针对内网隔离的生产环境。
ARM架构的设备选择范围很广,从树莓派这样的开发板到华为鲲鹏服务器都可以支持。以下是经过验证的配置建议:
注意:不同ARM芯片的兼容性差异较大,建议优先选择Cortex-A72/A76及以上架构的CPU。树莓派4B(Cortex-A72)实测可行,但早期型号如树莓派3B(Cortex-A53)可能遇到内核模块加载问题。
在所有节点上执行以下初始化操作:
bash复制# 关闭swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# 设置主机名解析
sudo tee -a /etc/hosts <<EOF
192.168.1.100 k8s-master
192.168.1.101 k8s-node1
192.168.1.102 k8s-node2
EOF
# 加载内核模块
sudo modprobe overlay
sudo modprobe br_netfilter
# 设置内核参数
sudo tee /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
在ARM架构上,推荐使用containerd作为容器运行时:
bash复制# 安装containerd
sudo apt-get update && sudo apt-get install -y containerd
# 生成默认配置
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# 修改配置以适配ARM架构
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
sudo sed -i 's/sandbox_image = "k8s.gcr.io\/pause:3.6"/sandbox_image = "registry.aliyuncs.com\/google_containers\/pause:3.6"/g' /etc/containerd/config.toml
# 重启服务
sudo systemctl restart containerd
sudo systemctl enable containerd
bash复制# 添加Kubernetes源
sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
sudo tee /etc/apt/sources.list.d/kubernetes.list <<EOF
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
# 安装指定版本组件
sudo apt-get update
sudo apt-get install -y kubelet=1.34.5-00 kubeadm=1.34.5-00 kubectl=1.34.5-00
sudo apt-mark hold kubelet kubeadm kubectl
bash复制sudo kubeadm init \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.34.5 \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=192.168.1.100 \
--service-cidr=10.96.0.0/12
初始化完成后,按照提示配置kubectl:
bash复制mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
ARM架构推荐使用flannel:
bash复制kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
在有外网的x86机器上执行:
bash复制# 拉取ARM架构镜像
docker pull --platform=arm64 registry.aliyuncs.com/google_containers/kube-apiserver:v1.34.5
docker pull --platform=arm64 registry.aliyuncs.com/google_containers/kube-controller-manager:v1.34.5
docker pull --platform=arm64 registry.aliyuncs.com/google_containers/kube-scheduler:v1.34.5
docker pull --platform=arm64 registry.aliyuncs.com/google_containers/kube-proxy:v1.34.5
docker pull --platform=arm64 registry.aliyuncs.com/google_containers/pause:3.6
docker pull --platform=arm64 registry.aliyuncs.com/google_containers/etcd:3.5.6-0
docker pull --platform=arm64 coredns/coredns:1.8.6
# 保存镜像为tar包
docker save -o k8s-images-arm64.tar \
registry.aliyuncs.com/google_containers/kube-apiserver:v1.34.5 \
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.34.5 \
registry.aliyuncs.com/google_containers/kube-scheduler:v1.34.5 \
registry.aliyuncs.com/google_containers/kube-proxy:v1.34.5 \
registry.aliyuncs.com/google_containers/pause:3.6 \
registry.aliyuncs.com/google_containers/etcd:3.5.6-0 \
coredns/coredns:1.8.6
将tar包复制到ARM节点后加载:
bash复制docker load -i k8s-images-arm64.tar
在有外网的机器上下载deb包:
bash复制wget https://mirrors.aliyun.com/kubernetes/apt/pool/kubeadm_1.34.5-00_arm64.deb
wget https://mirrors.aliyun.com/kubernetes/apt/pool/kubelet_1.34.5-00_arm64.deb
wget https://mirrors.aliyun.com/kubernetes/apt/pool/kubectl_1.34.5-00_arm64.deb
传输到ARM节点后安装:
bash复制sudo dpkg -i kubeadm_1.34.5-00_arm64.deb kubelet_1.34.5-00_arm64.deb kubectl_1.34.5-00_arm64.deb
创建kubeadm配置文件kubeadm-config.yaml:
yaml复制apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.34.5
imageRepository: registry.aliyuncs.com/google_containers
networking:
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
执行初始化:
bash复制sudo kubeadm init --config kubeadm-config.yaml --upload-certs
bash复制# 检查集群状态
kubectl get nodes
kubectl get pods --all-namespaces
# 确保所有节点状态为Ready
# 确保kube-system命名空间下所有pod运行正常
bash复制kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.4.1/kubesphere-installer.yaml
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.4.1/cluster-configuration.yaml
# 查看安装日志
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f
在有外网的机器上下载:
bash复制wget https://github.com/kubesphere/ks-installer/releases/download/v3.4.1/kubesphere-installer.yaml
wget https://github.com/kubesphere/ks-installer/releases/download/v3.4.1/cluster-configuration.yaml
wget https://github.com/kubesphere/ks-installer/releases/download/v3.4.1/kubesphere-images-arm64.tar.gz
在ARM节点上:
bash复制docker load -i kubesphere-images-arm64.tar.gz
bash复制kubectl apply -f kubesphere-installer.yaml
kubectl apply -f cluster-configuration.yaml
问题1:coredns pods处于CrashLoopBackOff状态
解决方案:修改coredns deployment配置:
bash复制kubectl edit deployment coredns -n kube-system
# 添加以下环境变量
- name: GODEBUG
value: "netdns=go"
问题2:节点NotReady状态
检查步骤:
systemctl status kubeletcrictl psjournalctl -u kubelet -f问题:安装过程卡在"Waiting for etcd to be ready"
解决方案:
kubectl logs -n kube-system etcd-<node-name>bash复制# 获取当前修订版本
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key endpoint status
# 压缩旧版本
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key compact <revision>
# 整理碎片
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key defrag
bash复制# 编辑/etc/sysctl.conf
sudo tee -a /etc/sysctl.conf <<EOF
# 增加文件描述符限制
fs.file-max = 655350
# 网络相关优化
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65000
EOF
# 应用配置
sudo sysctl -p
创建/etc/default/kubelet:
bash复制KUBELET_EXTRA_ARGS="--max-pods=100 --kube-reserved=cpu=500m,memory=1Gi --system-reserved=cpu=500m,memory=1Gi --eviction-hard=memory.available<500Mi,nodefs.available<10%"
重启kubelet:
bash复制sudo systemctl daemon-reload
sudo systemctl restart kubelet
对于资源有限的ARM设备,可以调整KubeSphere组件资源请求:
bash复制kubectl edit deployments -n kubesphere-system ks-console
# 修改resources部分
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"