Python Kubernetes客户端实战：集群管理与自动化运维-代码聚汇网

1. Python Kubernetes 客户端：从入门到精通

作为一名长期在云原生领域摸爬滚打的开发者，我深刻体会到Kubernetes已经成为现代应用部署的事实标准。而Python作为最受欢迎的编程语言之一，其Kubernetes客户端库为我们提供了与集群交互的强大工具。今天，我将分享如何通过Python Kubernetes客户端实现高效的集群管理，这些经验都来自我在生产环境的实战积累。

1.1 为什么选择Python操作Kubernetes？

在日常工作中，我们经常需要执行重复性的集群管理任务。虽然kubectl命令行工具很强大，但在以下场景中Python客户端展现出独特优势：

批量操作：当需要对数百个资源进行相同操作时，Python脚本比手动执行kubectl命令高效得多
复杂逻辑：需要条件判断、循环等编程逻辑的操作，用Python实现更加自然
系统集成：将Kubernetes管理能力集成到现有Python系统中（如运维平台、CI/CD流水线）
自定义控制器：基于Watch机制开发特定业务逻辑的控制器

提示：Python Kubernetes客户端是官方维护的项目，与Kubernetes API保持同步更新，稳定性有保障。

1.2 核心组件架构解析

Python Kubernetes客户端库采用分层设计，主要包含以下关键组件：

组件	功能	对应API版本
CoreV1Api	管理Pod、Service、ConfigMap等核心资源	v1
AppsV1Api	管理Deployment、StatefulSet等应用负载	apps/v1
BatchV1Api	管理Job、CronJob等批处理任务	batch/v1
NetworkingV1Api	管理Ingress、NetworkPolicy等网络资源	networking.k8s.io/v1
StorageV1Api	管理PersistentVolume等存储资源	storage.k8s.io/v1
CustomObjectsApi	操作CRD（自定义资源）	自定义

这种模块化设计使得我们可以按需导入特定API组，避免不必要的依赖。

2. 环境准备与认证配置

2.1 安装与版本管理

安装最新稳定版的Python Kubernetes客户端：

bash复制pip install kubernetes==29.0.0

版本选择建议：

生产环境：使用与Kubernetes集群版本匹配的客户端
开发环境：可以使用较新版本，但要注意API兼容性

注意：客户端大版本号（如29.x.x）对应Kubernetes 1.29.x，小版本更新通常保持API兼容。

2.2 认证机制深度解析

Python客户端支持多种认证方式，根据运行环境自动选择：

2.2.1 本地开发配置

python复制from kubernetes import config

# 加载默认kubeconfig文件（~/.kube/config）
config.load_kube_config()

# 指定自定义kubeconfig路径
config.load_kube_config(config_file="/path/to/kubeconfig")

认证流程解析：

解析kubeconfig文件
读取当前上下文配置
根据配置加载证书、token等凭据
建立API Server连接

2.2.2 集群内部署配置

当应用运行在Kubernetes Pod中时，使用服务账号认证：

python复制config.load_incluster_config()

这种模式下，客户端会自动获取：

Pod所在命名空间
API Server端点（通过环境变量）
服务账号token（挂载在/var/run/secrets/kubernetes.io/serviceaccount/）

2.2.3 高级认证场景

对于需要动态认证的场景，可以手动配置：

python复制from kubernetes.client import Configuration

conf = Configuration()
conf.host = "https://your-k8s-api:6443"
conf.ssl_ca_cert = "/path/to/ca.crt"
conf.api_key = {"authorization": "Bearer your-token"}

# 应用配置
client.Configuration.set_default(conf)

3. 核心API实战详解

3.1 Pod生命周期管理

3.1.1 创建Pod的最佳实践

python复制from kubernetes.client import V1Pod, V1Container, V1PodSpec

def create_pod():
    container = V1Container(
        name="nginx",
        image="nginx:1.25",
        ports=[{"containerPort": 80}],
        resources={"requests": {"cpu": "100m", "memory": "128Mi"}}
    )
    
    pod_spec = V1PodSpec(
        containers=[container],
        restart_policy="Always",
        node_selector={"env": "prod"}  # 节点选择器
    )
    
    pod = V1Pod(
        api_version="v1",
        kind="Pod",
        metadata={"name": "secure-nginx", "labels": {"app": "web"}},
        spec=pod_spec
    )
    
    try:
        api_response = client.CoreV1Api().create_namespaced_pod(
            namespace="default",
            body=pod
        )
        print(f"Pod创建成功: {api_response.metadata.name}")
    except Exception as e:
        print(f"创建Pod失败: {str(e)}")

关键点说明：

使用V1Pod等类型化对象比原始字典更安全
始终设置资源请求（requests）避免资源竞争
合理使用nodeSelector进行调度控制

3.1.2 Pod状态监控与故障排查

python复制def monitor_pod(name, namespace="default"):
    w = watch.Watch()
    try:
        for event in w.stream(
            client.CoreV1Api().list_namespaced_pod,
            namespace=namespace,
            field_selector=f"metadata.name={name}",
            timeout_seconds=300
        ):
            pod = event["object"]
            print(f"事件类型: {event['type']}")
            print(f"当前状态: {pod.status.phase}")
            
            # 检查容器状态
            for container in pod.status.container_statuses or []:
                if container.state.waiting:
                    print(f"容器{container.name}等待原因: {container.state.waiting.reason}")
                if container.state.terminated:
                    print(f"容器{container.name}退出码: {container.state.terminated.exit_code}")
            
            if pod.status.phase == "Running":
                w.stop()
                break
    except Exception as e:
        print(f"监控异常: {str(e)}")

常见问题排查技巧：

ImagePullBackOff：检查镜像名称和拉取权限
CrashLoopBackOff：查看容器日志和退出码
Pending：检查资源配额和节点调度条件

3.2 Deployment高级管理

3.2.1 滚动更新策略配置

python复制from kubernetes.client import V1Deployment, V1DeploymentSpec, V1RollingUpdateDeployment

def create_deployment():
    deployment = V1Deployment(
        metadata={"name": "canary-demo"},
        spec=V1DeploymentSpec(
            replicas=3,
            selector={"match_labels": {"app": "canary"}},
            template={
                "metadata": {"labels": {"app": "canary"}},
                "spec": {
                    "containers": [{
                        "name": "web",
                        "image": "nginx:1.24",
                        "ports": [{"container_port": 80}]
                    }]
                }
            },
            strategy={
                "type": "RollingUpdate",
                "rolling_update": {
                    "max_unavailable": 1,
                    "max_surge": "25%"
                }
            }
        )
    )
    
    client.AppsV1Api().create_namespaced_deployment(
        namespace="default",
        body=deployment
    )

滚动更新参数解析：

max_unavailable：更新过程中允许不可用的Pod数量
max_surge：可以超过期望副本数的最大Pod数量
min_ready_seconds：Pod就绪后视为可用的最小等待时间

3.2.2 金丝雀发布实现

python复制def canary_release(deployment_name, new_image, canary_percent=20):
    api = client.AppsV1Api()
    
    # 1. 获取当前Deployment
    dep = api.read_namespaced_deployment(deployment_name, "default")
    
    # 2. 创建金丝雀Deployment
    canary_dep = V1Deployment(
        metadata={
            "name": f"{deployment_name}-canary",
            "labels": {"release": "canary"}
        },
        spec=dep.spec
    )
    canary_dep.spec.replicas = int(dep.spec.replicas * canary_percent / 100)
    canary_dep.spec.template.spec.containers[0].image = new_image
    
    # 3. 创建金丝雀版本
    api.create_namespaced_deployment("default", canary_dep)
    
    # 4. 监控金丝雀状态
    monitor_deployment(f"{deployment_name}-canary")
    
    # 5. 确认无误后全量发布
    dep.spec.template.spec.containers[0].image = new_image
    api.patch_namespaced_deployment(deployment_name, "default", dep)
    
    # 6. 清理金丝雀
    api.delete_namespaced_deployment(f"{deployment_name}-canary", "default")

3.3 高级监控与事件处理

3.3.1 使用Watch实现实时监控

python复制def watch_resources(resource_type, namespace=None):
    api_map = {
        "pods": client.CoreV1Api().list_namespaced_pod,
        "deployments": client.AppsV1Api().list_namespaced_deployment,
        "services": client.CoreV1Api().list_namespaced_service
    }
    
    w = watch.Watch()
    try:
        for event in w.stream(
            api_map[resource_type],
            namespace=namespace or "default",
            timeout_seconds=3600
        ):
            handle_event(event)
    except Exception as e:
        print(f"监控异常: {str(e)}")

def handle_event(event):
    obj = event["object"]
    print(f"[{event['type']}] {obj.kind} {obj.metadata.name}")
    
    # 特定事件处理
    if event["type"] == "DELETED":
        send_alert(f"{obj.kind} {obj.metadata.name} 被删除")
    elif obj.kind == "Pod" and obj.status.phase == "Failed":
        analyze_pod_failure(obj)

3.3.2 自定义指标收集

python复制def collect_metrics():
    metrics_api = client.CustomObjectsApi()
    
    # 获取节点指标
    node_metrics = metrics_api.list_cluster_custom_object(
        "metrics.k8s.io", "v1beta1", "nodes"
    )
    
    # 获取Pod指标
    pod_metrics = metrics_api.list_namespaced_custom_object(
        "metrics.k8s.io", "v1beta1", "default", "pods"
    )
    
    # 处理指标数据
    for metric in node_metrics["items"]:
        print(f"节点 {metric['metadata']['name']}:")
        print(f"  CPU使用: {metric['usage']['cpu']}")
        print(f"  内存使用: {metric['usage']['memory']}")

4. 生产环境最佳实践

4.1 错误处理与重试机制

python复制from retrying import retry
from kubernetes.client.rest import ApiException

@retry(
    stop_max_attempt_number=3,
    wait_exponential_multiplier=1000,
    wait_exponential_max=10000,
    retry_on_exception=lambda e: isinstance(e, ApiException) and e.status == 500
)
def safe_k8s_operation():
    try:
        client.CoreV1Api().list_namespaced_pod("default")
    except ApiException as e:
        if e.status == 404:
            # 不重试404错误
            raise
        print(f"API调用失败: {e.reason}")
        raise

重试策略建议：

指数退避：初始延迟1秒，最大延迟10秒
仅重试服务端错误（5xx）和限流（429）
不重试客户端错误（4xx）

4.2 性能优化技巧

批量操作：使用批量API减少请求次数

python复制# 批量创建Pod
client.CoreV1Api().create_namespaced_pod_collection(
    namespace="default",
    body=[pod1, pod2, pod3]
)

资源缓存：对不常变动的资源（如Node信息）进行本地缓存

python复制from cachetools import TTLCache

node_cache = TTLCache(maxsize=100, ttl=300)

def get_nodes():
    if "nodes" not in node_cache:
        node_cache["nodes"] = client.CoreV1Api().list_node().items
    return node_cache["nodes"]

连接池配置：调整默认连接参数

python复制import urllib3
urllib3.util.connection.create_connection = lambda *args, **kwargs: create_connection_with_timeout(*args, timeout=10, **kwargs)

4.3 安全加固方案

最小权限原则：为服务账号配置精确的RBAC权限

yaml复制apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

敏感信息管理：使用Secret而非ConfigMap存储凭证

python复制from kubernetes.client import V1Secret

secret = V1Secret(
    metadata={"name": "db-credentials"},
    string_data={
        "username": "admin",
        "password": "s3cret"
    }
)
client.CoreV1Api().create_namespaced_secret("default", secret)

网络策略：限制Pod间通信

python复制from kubernetes.client import V1NetworkPolicy

network_policy = V1NetworkPolicy(
    metadata={"name": "allow-frontend"},
    spec={
        "pod_selector": {"match_labels": {"role": "frontend"}},
        "ingress": [{
            "from": [{
                "pod_selector": {"match_labels": {"role": "backend"}}
            }]
        }]
    }
)
client.NetworkingV1Api().create_namespaced_network_policy("default", network_policy)

5. 实战案例：自动化运维平台

5.1 集群健康检查系统

python复制def cluster_health_check():
    results = {
        "nodes": [],
        "pods": [],
        "deployments": []
    }
    
    # 检查节点状态
    for node in client.CoreV1Api().list_node().items:
        node_status = {
            "name": node.metadata.name,
            "ready": next(
                (s.status for s in node.status.conditions 
                 if s.type == "Ready"), "Unknown"
            ),
            "issues": []
        }
        
        if node_status["ready"] != "True":
            node_status["issues"].append("节点未就绪")
        
        # 检查磁盘压力
        if any(s.type == "DiskPressure" and s.status == "True" 
               for s in node.status.conditions):
            node_status["issues"].append("磁盘压力")
        
        results["nodes"].append(node_status)
    
    # 检查Pod状态
    for pod in client.CoreV1Api().list_pod_for_all_namespaces().items:
        if pod.status.phase not in ["Running", "Succeeded"]:
            pod_status = {
                "name": pod.metadata.name,
                "namespace": pod.metadata.namespace,
                "status": pod.status.phase,
                "issues": []
            }
            
            for container in pod.status.container_statuses or []:
                if container.state.waiting:
                    pod_status["issues"].append(
                        f"容器{container.name}等待: {container.state.waiting.reason}"
                    )
                elif container.state.terminated:
                    pod_status["issues"].append(
                        f"容器{container.name}终止: {container.state.terminated.reason}"
                    )
            
            results["pods"].append(pod_status)
    
    # 检查Deployment状态
    for dep in client.AppsV1Api().list_deployment_for_all_namespaces().items:
        if dep.status.unavailable_replicas:
            results["deployments"].append({
                "name": dep.metadata.name,
                "namespace": dep.metadata.namespace,
                "unavailable": dep.status.unavailable_replicas
            })
    
    return results

5.2 自动扩缩容控制器

python复制from prometheus_api_client import PrometheusConnect

class AutoScaler:
    def __init__(self):
        self.prom = PrometheusConnect(url="http://prometheus:9090")
        self.metrics_cache = TTLCache(maxsize=100, ttl=30)
    
    def get_metric(self, query):
        if query not in self.metrics_cache:
            self.metrics_cache[query] = self.prom.custom_query(query)
        return self.metrics_cache[query]
    
    def scale_deployment(self, dep_name, namespace, min_replicas, max_replicas):
        # 获取当前负载指标
        cpu_query = f'sum(rate(container_cpu_usage_seconds_total{{namespace="{namespace}", pod=~"{dep_name}-.*"}}[1m]))'
        cpu_usage = float(self.get_metric(cpu_query)[0]["value"][1])
        
        # 计算期望副本数
        current_replicas = client.AppsV1Api().read_namespaced_deployment_scale(
            dep_name, namespace
        ).status.replicas
        
        target_replicas = current_replicas
        if cpu_usage > 0.8:  # 扩容阈值
            target_replicas = min(current_replicas * 2, max_replicas)
        elif cpu_usage < 0.2:  # 缩容阈值
            target_replicas = max(current_replicas // 2, min_replicas)
        
        # 执行扩缩容
        if target_replicas != current_replicas:
            patch = {"spec": {"replicas": target_replicas}}
            client.AppsV1Api().patch_namespaced_deployment(
                dep_name, namespace, patch
            )
            print(f"已将 {dep_name} 从 {current_replicas} 调整为 {target_replicas} 副本")

5.3 集成FastAPI构建管理界面

python复制from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.get("/api/nodes")
def list_nodes():
    return client.CoreV1Api().list_node().items

@app.get("/api/pods")
def list_pods(namespace: str = "default"):
    return client.CoreV1Api().list_namespaced_pod(namespace).items

@app.post("/api/scale")
def scale_deployment(
    name: str, 
    namespace: str, 
    replicas: int
):
    patch = {"spec": {"replicas": replicas}}
    return client.AppsV1Api().patch_namespaced_deployment(
        name, namespace, patch
    )

@app.get("/api/metrics")
def get_metrics():
    return cluster_health_check()

6. 疑难问题排查指南

6.1 常见错误代码处理

错误代码	原因	解决方案
401 Unauthorized	认证失败	检查kubeconfig或服务账号token
403 Forbidden	权限不足	检查RBAC规则，增加必要权限
404 Not Found	资源不存在	检查资源名称和命名空间
409 Conflict	资源版本冲突	先获取最新资源再更新
422 Unprocessable	字段验证失败	检查请求体格式和必填字段
429 Too Many	请求限流	实现指数退避重试机制
500 Server Error	服务端错误	检查API Server日志

6.2 连接问题诊断

认证失败：

python复制try:
    config.load_kube_config()
    client.CoreV1Api().list_node()
except Exception as e:
    print(f"认证失败: {str(e)}")
    # 检查kubeconfig路径和环境变量
    print(f"当前kubeconfig: {os.getenv('KUBECONFIG')}")

网络连通性：

python复制import requests
try:
    response = requests.get(
        "https://your-k8s-api:6443/api",
        verify="/path/to/ca.crt",
        headers={"Authorization": "Bearer your-token"}
    )
    print(f"API Server可达，版本: {response.json()['versions']}")
except Exception as e:
    print(f"连接失败: {str(e)}")

6.3 性能问题排查

客户端日志开启：

python复制import logging
logging.basicConfig()
logging.getLogger("kubernetes").setLevel(logging.DEBUG)

请求耗时分析：

python复制from datetime import datetime

start = datetime.now()
client.CoreV1Api().list_pod_for_all_namespaces()
print(f"请求耗时: {(datetime.now() - start).total_seconds()}秒")

内存泄漏检测：

python复制import tracemalloc

tracemalloc.start()
# 执行可疑操作
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
    print(stat)

7. 进阶开发技巧

7.1 自定义资源(CRD)操作

python复制# 创建自定义资源定义
crd = {
    "apiVersion": "apiextensions.k8s.io/v1",
    "kind": "CustomResourceDefinition",
    "metadata": {"name": "crontabs.stable.example.com"},
    "spec": {
        "group": "stable.example.com",
        "versions": [{"name": "v1", "served": True, "storage": True}],
        "scope": "Namespaced",
        "names": {
            "plural": "crontabs",
            "singular": "crontab",
            "kind": "CronTab"
        }
    }
}

client.ApiextensionsV1Api().create_custom_resource_definition(crd)

# 操作自定义资源
custom_api = client.CustomObjectsApi()
custom_api.create_namespaced_custom_object(
    group="stable.example.com",
    version="v1",
    namespace="default",
    plural="crontabs",
    body={
        "apiVersion": "stable.example.com/v1",
        "kind": "CronTab",
        "metadata": {"name": "my-crontab"},
        "spec": {"cronSpec": "* * * * *", "image": "my-cron-image"}
    }
)

7.2 多集群管理方案

python复制from kubernetes.config import kube_config

class MultiClusterManager:
    def __init__(self):
        self.contexts = kube_config.list_kube_config_contexts()[1]
        self.clients = {}
    
    def get_client(self, context_name):
        if context_name not in self.clients:
            client_config = type.__call__(client.Configuration)
            kube_config.load_kube_config(
                context=context_name,
                client_configuration=client_config
            )
            self.clients[context_name] = client.ApiClient(client_config)
        return self.clients[context_name]
    
    def list_pods(self, context_name, namespace):
        api = client.CoreV1Api(self.get_client(context_name))
        return api.list_namespaced_pod(namespace)

7.3 异步API使用

python复制import asyncio
from kubernetes_asyncio import client, config

async def async_operations():
    await config.load_kube_config()
    v1 = client.CoreV1Api()
    
    # 并发获取多个资源
    pods, nodes = await asyncio.gather(
        v1.list_namespaced_pod("default"),
        v1.list_node()
    )
    
    print(f"Got {len(pods.items)} pods and {len(nodes.items)} nodes")

asyncio.run(async_operations())

8. 生态工具集成

8.1 与Prometheus监控集成

python复制from prometheus_api_client import PrometheusConnect

def get_pod_metrics(namespace, pod_name):
    prom = PrometheusConnect()
    
    cpu_query = f'sum(rate(container_cpu_usage_seconds_total{{namespace="{namespace}", pod="{pod_name}"}}[1m])) by (pod)'
    mem_query = f'sum(container_memory_working_set_bytes{{namespace="{namespace}", pod="{pod_name}"}}) by (pod)'
    
    return {
        "cpu": prom.custom_query(cpu_query),
        "memory": prom.custom_query(mem_query)
    }

8.2 与Airflow调度集成

python复制from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def k8s_task():
    # 执行Kubernetes操作
    client.CoreV1Api().list_namespaced_pod("default")

with DAG(
    "k8s_operations",
    schedule_interval="@daily",
    start_date=datetime(2023, 1, 1)
) as dag:
    
    task = PythonOperator(
        task_id="list_pods",
        python_callable=k8s_task
    )

8.3 与Pandas数据分析集成

python复制import pandas as pd

def get_cluster_data():
    nodes = client.CoreV1Api().list_node()
    data = []
    
    for node in nodes.items:
        allocatable = node.status.allocatable
        data.append({
            "name": node.metadata.name,
            "cpu": allocatable["cpu"],
            "memory": allocatable["memory"],
            "pods": allocatable["pods"]
        })
    
    return pd.DataFrame(data)

df = get_cluster_data()
print(df.describe())

9. 性能调优实战

9.1 客户端配置优化

python复制from kubernetes.client import Configuration

# 自定义配置
conf = Configuration()
conf.host = "https://your-k8s-api:6443"
conf.ssl_ca_cert = "/path/to/ca.crt"
conf.api_key = {"authorization": "Bearer your-token"}

# 调优参数
conf.retries = 3  # 重试次数
conf.assert_hostname = False  # 禁用主机名验证
conf.tcp_keepalive = True  # 启用TCP keepalive

# 应用配置
client.Configuration.set_default(conf)

9.2 批量操作优化

python复制from kubernetes.utils import create_from_dict

def bulk_create_resources(yaml_files):
    successes = []
    failures = []
    
    for file in yaml_files:
        try:
            with open(file) as f:
                create_from_dict(client.ApiClient(), yaml.safe_load(f))
            successes.append(file)
        except Exception as e:
            failures.append((file, str(e)))
    
    return {"successes": successes, "failures": failures}

9.3 高效查询技巧

字段选择器：减少返回数据量

python复制# 只查询特定节点的Pod
client.CoreV1Api().list_namespaced_pod(
    namespace="default",
    field_selector="spec.nodeName=worker-1"
)

标签选择器：精确过滤资源

python复制# 查询带有特定标签的Pod
client.CoreV1Api().list_namespaced_pod(
    namespace="default",
    label_selector="app=frontend,env=prod"
)

资源版本：实现高效watch

python复制# 从特定资源版本开始监听
w = watch.Watch()
for event in w.stream(
    client.CoreV1Api().list_namespaced_pod,
    namespace="default",
    resource_version="12345"
):
    print(event)

10. 安全最佳实践

10.1 认证与授权

服务账号最小权限：

python复制from kubernetes.client import V1Role, V1RoleBinding

# 创建只读角色
role = V1Role(
    metadata={"name": "pod-reader"},
    rules=[{
        "apiGroups": [""],
        "resources": ["pods"],
        "verbs": ["get", "list", "watch"]
    }]
)

# 绑定角色到服务账号
role_binding = V1RoleBinding(
    metadata={"name": "read-pods"},
    subjects=[{
        "kind": "ServiceAccount",
        "name": "monitoring",
        "namespace": "default"
    }],
    role_ref={
        "kind": "Role",
        "name": "pod-reader",
        "apiGroup": "rbac.authorization.k8s.io"
    }
)

client.RbacAuthorizationV1Api().create_namespaced_role("default", role)
client.RbacAuthorizationV1Api().create_namespaced_role_binding("default", role_binding)

10.2 网络隔离

python复制from kubernetes.client import V1NetworkPolicy

# 只允许前端Pod访问后端服务
network_policy = V1NetworkPolicy(
    metadata={"name": "backend-allow-frontend"},
    spec={
        "pod_selector": {"match_labels": {"app": "backend"}},
        "ingress": [{
            "from": [{
                "pod_selector": {"match_labels": {"app": "frontend"}}
            }]
        }]
    }
)

client.NetworkingV1Api().create_namespaced_network_policy("default", network_policy)

10.3 敏感数据保护

python复制from kubernetes.client import V1Secret

# 安全创建Secret
secret = V1Secret(
    metadata={"name": "db-secret"},
    string_data={
        "username": "admin",
        "password": "s3cretP@ss"
    }
)

client.CoreV1Api().create_namespaced_secret("default", secret)

# 在Pod中挂载Secret
pod_spec = {
    "containers": [{
        "name": "app",
        "image": "my-app",
        "volume_mounts": [{
            "name": "secret-vol",
            "mount_path": "/etc/secrets",
            "read_only": True
        }]
    }],
    "volumes": [{
        "name": "secret-vol",
        "secret": {"secret_name": "db-secret"}
    }]
}

11. 调试与测试策略

11.1 单元测试方案

python复制import unittest
from unittest.mock import MagicMock

class TestK8sOperations(unittest.TestCase):
    def setUp(self):
        self.mock_api = MagicMock()
        self.pod_list = MagicMock()
        self.pod_list.items = [
            MagicMock(metadata=MagicMock(name="pod-1")),
            MagicMock(metadata=MagicMock(name="pod-2"))
        ]
    
    def test_list_pods(self):
        self.mock_api.list_namespaced_pod.return_value = self.pod_list
        pods = self.mock_api.list_namespaced_pod("default")
        self.assertEqual(len(pods.items), 2)
        self.assertEqual(pods.items[0].metadata.name, "pod-1")

if __name__ == "__main__":
    unittest.main()

11.2 集成测试环境

python复制import tempfile
import shutil
from kubernetes.config import kube_config

class TestCluster:
    def __enter__(self):
        self.temp_dir = tempfile.mkdtemp()
        self.kubeconfig = os.path.join(self.temp_dir, "kubeconfig")
        
        # 创建测试kubeconfig
        with open(self.kubeconfig, "w") as f:
            f.write("""
apiVersion: v1
clusters:
- cluster: {server: 'https://test-cluster:6443'}
  name: test
contexts:
- context: {cluster: test, user: test}
  name: test
current-context: test
kind: Config
users:
- name: test
  user: {token: test-token}
            """)
        
        os.environ["KUBECONFIG"] = self.kubeconfig
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        shutil.rmtree(self.temp_dir)
        os.environ.pop("KUBECONFIG", None)

# 使用示例
with TestCluster():
    config.load_kube_config()
    # 执行测试代码

11.3 端到端测试流程

python复制import pytest
from kubernetes import client, config

@pytest.fixture(scope="module")
def k8s_client():
    config.load_kube_config()
    return client.CoreV1Api()

def test_pod_lifecycle(k8s_client):
    # 测试Pod创建
    pod_manifest = {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {"name": "test-pod"},
        "spec": {
            "containers": [{
                "name": "test",
                "image": "busybox",
                "command": ["sleep", "3600"]
            }]
        }
    }
    
    # 创建Pod
    pod = k8s_client.create_namespaced_pod("default", pod_manifest)
    assert pod.metadata.name == "test-pod"
    
    # 验证Pod状态
    pod = k8s_client.read_namespaced_pod("test-pod", "default")
    assert pod.status.phase in ["Pending", "Running"]
    
    # 清理
    k8s_client.delete_namespaced_pod("test-pod", "default")

12. 版本升级与兼容性

12.1 客户端版本策略

Python Kubernetes客户端版本与Kubernetes集群版本的对应关系：

客户端版本	兼容K8s版本	支持API版本
28.x	1.28	v1.28
27.x	1.27	v1.27
26.x	1.26	v1.26

升级建议：

生产环境：客户端版本 ≤ 集群版本
开发环境：可以使用较新客户端，但需测试API兼容性

12.2 废弃API迁移

python复制# 旧版Deployment API (extensions/v1beta1已废弃)
# client.ExtensionsV1beta1Api().create_namespaced_deployment(...)

# 新版Deployment API (apps/v1)
client.AppsV1Api().create_namespaced_deployment(...)

常见废弃API迁移路径：

extensions/v1beta1 → apps/v1 (Deployment, DaemonSet等)
networking.k8s.io/v1beta1 → networking.k8s.io/v1 (Ingress)
rbac.authorization.k8s.io/v1beta1 → rbac.authorization.k8s.io/v1 (RBAC资源)

12.3 多版本API支持

python复制def get_deployment(name, namespace):
    try:
        # 优先使用新API
        return client.AppsV1Api

Python Kubernetes客户端实战：集群管理与自动化运维