最近在帮客户部署自动化网络管理方案时,发现Ansible与华为CE交换机的集成文档虽然不少,但实际操作中总会遇到各种"坑"。特别是当你在非标准环境(比如树莓派)或者特定Linux发行版上部署时,问题会更加明显。本文将基于Ubuntu 20.04 LTS环境,完整记录从系统准备到成功运行Ansible Playbook的全过程,重点解决那些官方文档没提但实际一定会遇到的典型问题。
Ubuntu 20.04默认已经预装了Python 3.8,这为我们省去了不少麻烦。但要想让Ansible正常管理华为CE交换机,还需要特别注意几个关键组件的版本兼容性。
首先更新系统包并安装基础依赖:
bash复制sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip python3-dev libssl-dev
华为CE模块依赖ncclient库进行NETCONF通信,而默认的pip源安装可能会遇到SSL验证问题。建议使用国内镜像源并指定版本:
bash复制pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple ncclient==0.6.13
注意:如果遇到"Could not fetch URL"错误,可能需要先升级pip本身:
python3 -m pip install --upgrade pip
常见问题排查:
ModuleNotFoundError: No module named 'paramiko':执行pip3 install paramikoERROR: Could not build wheels for cryptography:需要安装开发依赖sudo apt install build-essential libssl-dev libffi-dev虽然Ubuntu仓库提供了Ansible包,但版本可能较旧。建议通过pip安装最新版:
bash复制pip3 install ansible==6.7.0
验证安装是否成功:
bash复制ansible --version
# 应显示类似以下信息
# ansible [core 2.14.3]
# config file = None
# configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
# ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
# ansible collection location = /home/user/.ansible/collections:/usr/share/ansible/collections
华为CE模块已经包含在Ansible的官方集合中,无需单独下载。可以通过以下命令验证:
bash复制ansible-doc -t module -l | grep ce_
# 应列出所有华为CE相关模块
Ansible的配置文件优先级为:
./ansible.cfg(当前目录)~/ansible.cfg(用户目录)/etc/ansible/ansible.cfg(全局配置)建议在项目目录创建独立的ansible.cfg:
ini复制[defaults]
host_key_checking = False
interpreter_python = /usr/bin/python3
deprecation_warnings = False
[persistent_connection]
connect_timeout = 30
command_timeout = 30
hosts文件(清单文件)配置示例:
ini复制[cloudengine]
ce_switch1 ansible_host=192.168.1.100
ce_switch2 ansible_host=192.168.1.101
[cloudengine:vars]
ansible_connection=network_cli
ansible_network_os=ce
ansible_user=admin
ansible_password=YourSecurePassword
ansible_become=yes
ansible_become_method=enable
ansible_become_password=EnablePassword
重要安全提示:实际环境中应使用
ansible-vault加密密码,而不是明文存储
下面是一个完整的Playbook示例,用于批量配置交换机接口:
yaml复制---
- name: Configure Huawei CE Switch Interfaces
hosts: cloudengine
gather_facts: no
vars:
interface_list:
- GigabitEthernet1/0/1
- GigabitEthernet1/0/2
- GigabitEthernet1/0/3
tasks:
- name: Ensure interfaces are administratively up
community.network.ce_interface:
interface: "{{ item }}"
admin_state: up
state: present
loop: "{{ interface_list }}"
register: interface_result
ignore_errors: yes
- name: Display interface change results
debug:
var: interface_result.results
执行Playbook时建议添加详细日志:
bash复制ANSIBLE_LOG_PATH=./ansible.log ansible-playbook -v switch_config.yml
典型错误处理:
认证失败:
code复制FAILED! => {"msg": "Authentication failed."}
检查:
权限不足:
code复制ERROR: Failed to enter privileged mode
解决方案:
ansible_become_password正确模块不可用:
code复制ERROR! couldn't resolve module/action 'ce_interface'
解决方法:
bash复制ansible-galaxy collection install community.network
对于大规模网络环境,需要特别注意Ansible的执行效率。以下是几个实用技巧:
连接优化:
ini复制# ansible.cfg中添加
[ssh_connection]
pipelining = True
scp_if_ssh = True
并行执行控制:
bash复制# 限制并发数为10
ansible-playbook -f 10 site.yml
使用策略插件:
yaml复制# playbook中指定
strategy: free
对于频繁执行的任务,可以启用事实缓存。首先安装redis:
bash复制sudo apt install redis-server
pip3 install redis
然后在ansible.cfg中配置:
ini复制[defaults]
fact_caching = redis
fact_caching_timeout = 3600
fact_caching_connection = localhost:6379:0
完善的日志系统对排错至关重要。以下是一个日志收集Playbook示例:
yaml复制- name: Collect Switch Logs
hosts: cloudengine
gather_facts: no
tasks:
- name: Get current configuration
community.network.ce_config:
backup: yes
backup_dir: "/tmp/backups"
filename: "{{ inventory_hostname }}_config.cfg"
- name: Capture interface status
community.network.ce_command:
commands: display interface brief
register: interface_output
- name: Save output to file
copy:
content: "{{ interface_output.stdout[0] }}"
dest: "/tmp/logs/{{ inventory_hostname }}_interface.log"
可以设置定期任务自动执行:
bash复制# 每周一凌晨3点执行
0 3 * * 1 /usr/bin/ansible-playbook /path/to/log_collection.yml
在实际项目中,我发现华为CE交换机对并发SSH连接数有限制。当管理超过50台设备时,建议:
forks参数控制在20以下yaml复制- name: Pause between tasks
pause:
seconds: 2