当你的Kubernetes集群需要处理有状态应用时,手动管理PV(Persistent Volume)就像用勺子挖隧道——理论上可行,但效率低得令人崩溃。每次创建PVC(Persistent Volume Claim)都要手动配PV?这种操作在测试环境尚可忍受,但在生产环境简直是运维人员的噩梦。本文将带你用StorageClass和NFS-Client-Provisioner实现真正的"申领即用"动态存储,告别手动配置的石器时代。
传统静态供给模式下,管理员需要预先创建好PV,就像提前准备一堆空U盘等着被认领。而动态供给则是当PVC提出存储需求时,系统自动按需创建并绑定PV,相当于有个智能仓库随时按需生产U盘。
这种自动化魔法背后是三个关键组件协同工作:
mermaid复制graph TD
A[PVC申请存储] -->|指定StorageClass| B(StorageClass)
B -->|触发| C[NFS-Client-Provisioner]
C -->|自动创建| D[PV]
D -->|绑定| A
这种架构特别适合以下场景:
动态存储系统需要共享存储作为后端,我们以最常用的NFS为例。假设已有NFS服务器(192.168.1.100),创建共享目录:
bash复制# 在NFS服务器执行
mkdir -p /data/k8s_storage
chmod 777 /data/k8s_storage
echo "/data/k8s_storage *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
systemctl restart nfs-server
在所有Kubernetes节点安装NFS客户端工具:
bash复制# 所有Node节点执行
yum install -y nfs-utils || apt-get install -y nfs-common
在任意Node节点测试挂载:
bash复制mkdir -p /mnt/nfs_test
mount -t nfs 192.168.1.100:/data/k8s_storage /mnt/nfs_test
touch /mnt/nfs_test/testfile
umount /mnt/nfs_test
如果测试文件能正常创建,说明NFS配置正确。
Provisioner需要特定权限来管理PV资源,创建rbac.yaml:
yaml复制apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
应用配置:
bash复制kubectl apply -f rbac.yaml
创建deployment.yaml部署Provisioner:
yaml复制apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 192.168.1.100
- name: NFS_PATH
value: /data/k8s_storage
volumes:
- name: nfs-client-root
nfs:
server: 192.168.1.100
path: /data/k8s_storage
部署并验证:
bash复制kubectl apply -f deployment.yaml
kubectl get pods -l app=nfs-client-provisioner
定义存储类storageclass.yaml:
yaml复制apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-dynamic
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
archiveOnDelete: "false"
reclaimPolicy: Delete
volumeBindingMode: Immediate
关键参数说明:
| 参数 | 可选值 | 说明 |
|---|---|---|
| reclaimPolicy | Delete/Retain | PV回收策略,Delete会自动删除NFS上的数据 |
| archiveOnDelete | true/false | 设置为false时,删除PVC会同步删除NFS上的数据 |
| volumeBindingMode | Immediate/WaitForFirstConsumer | 立即绑定或延迟到Pod调度时绑定 |
应用配置:
bash复制kubectl apply -f storageclass.yaml
创建pvc-test.yaml:
yaml复制apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
storageClassName: nfs-dynamic
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
创建并观察PV自动生成:
bash复制kubectl apply -f pvc-test.yaml
kubectl get pvc test-pvc -w
kubectl get pv
创建test-pod.yaml:
yaml复制apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx:alpine
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumes:
- name: data
persistentVolumeClaim:
claimName: test-pvc
验证数据持久化:
bash复制kubectl exec -it test-pod -- sh -c "echo 'Hello Dynamic PV' > /usr/share/nginx/html/test.txt"
kubectl delete pod test-pod
kubectl apply -f test-pod.yaml
kubectl exec -it test-pod -- cat /usr/share/nginx/html/test.txt
对于有状态应用,使用volumeClaimTemplate实现每个Pod独享存储:
yaml复制apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "nfs-dynamic"
resources:
requests:
storage: 1Gi
观察自动创建的PVC和PV:
bash复制kubectl get pvc -l app=nginx
kubectl get pv
NFS服务端调优:
bash复制# 增加NFS服务器内存缓存
echo 1048576 > /proc/sys/vm/nfsd_mem
# 使用更高性能的磁盘
StorageClass参数优化:
yaml复制apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-fast
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
mountOptions: "hard,nolock,noatime,nodiratime,rsize=65536,wsize=65536"
问题1:PVC一直处于Pending状态
检查步骤:
bash复制kubectl logs -l app=nfs-client-provisioner --tail=50
bash复制kubectl get storageclass nfs-dynamic -o yaml
bash复制kubectl exec -it <provisioner-pod> -- mount | grep nfs
问题2:删除PVC后数据未被清理
解决方案:
Provisioner多副本:虽然Provisioner本身无状态,但多副本可以避免单点故障
yaml复制spec:
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
NFS服务器高可用:使用DRBD+Keepalived实现NFS服务高可用
通过StorageClass参数实现不同团队使用独立子目录:
yaml复制apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-team-a
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
pathPattern: "${.PVC.namespace}/${.PVC.name}"
结合ResourceQuota限制命名空间存储用量:
yaml复制apiVersion: v1
kind: ResourceQuota
metadata:
name: storage-quota
spec:
hard:
requests.storage: "100Gi"
persistentvolumeclaims: "20"
当多个Kubernetes集群使用同一NFS后端时,需要注意:
yaml复制parameters:
pathPattern: "${.PVC.namespace}-${CLUSTER_NAME}/${.PVC.name}"
NFS-Client-Provisioner暴露的指标包括:
示例告警规则:
yaml复制- alert: ProvisionerErrorsHigh
expr: rate(provisioner_operations_errors_total[5m]) / rate(provisioner_operations_total[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "High error rate in NFS provisioner ({{ $value }})"
使用kubelet的VolumeStats指标预测存储容量:
yaml复制- alert: NFSStorageRunningLow
expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes < 0.2
for: 1h
labels:
severity: warning
虽然NFS方案简单易用,但在高性能场景可能需要考虑其他动态供给方案:
| 方案 | 优点 | 缺点 | 适用场景 |
|---|---|---|---|
| NFS动态供给 | 部署简单,兼容性好 | 性能较低,单点问题 | 开发测试、低频IO |
| Ceph RBD | 高性能,支持多读写 | 部署复杂,需要专用存储集群 | 生产环境,高IOPS需求 |
| AWS EBS | 全托管,弹性扩展 | 仅限AWS环境,成本较高 | AWS云原生应用 |
| Longhorn | 纯软件定义,易管理 | 占用节点资源,性能中等 | 混合云、边缘计算 |
在Kubernetes集群中实际部署NFS-Client-Provisioner时,遇到最棘手的问题是版本兼容性。某次升级集群后,Provisioner突然停止工作,日志显示权限拒绝错误。经过排查发现是新版Kubernetes加强了RBAC校验,需要为ServiceAccount添加额外的events资源权限。这个经历让我深刻体会到:在动态存储方案中,Provisioner的日志应该是你第一个查看的地方。