别再只调包了！手把手教你用TensorFlow 1.x和Keras从零搭建CNN，搞定西储大学轴承数据故障诊断

清枫破

从零构建CNN模型：TensorFlow 1.x实战西储大学轴承故障诊断

轴承故障诊断是工业设备健康管理的重要环节。传统的振动信号分析方法依赖人工特征提取，而深度学习技术能够自动学习信号特征，显著提升诊断效率。本文将带您从零开始，使用TensorFlow 1.x和Keras构建一个完整的1D-CNN模型，处理西储大学的.mat格式振动数据。

1. 环境配置与数据准备

1.1 开发环境搭建

对于TensorFlow 1.x项目，环境配置需要特别注意版本兼容性。推荐使用conda创建隔离的Python环境：

bash复制conda create -n tf1_env python=3.7
conda activate tf1_env
pip install tensorflow==1.15.0 keras==2.3.1 h5py==2.10.0 scipy==1.2.1

关键库版本说明：

库名称	推荐版本	作用说明
TensorFlow	1.15.0	深度学习框架基础
Keras	2.3.1	高层API接口
h5py	2.10.0	处理.mat格式数据
scipy	1.2.1	科学计算与.mat文件读取

注意：TensorFlow 1.x与2.x在API设计上有显著差异，本文所有代码均基于1.x版本实现。

1.2 数据加载与探索

西储大学轴承数据集包含多种故障状态下的振动信号，存储为.mat格式。我们先了解数据结构：

python复制from scipy.io import loadmat
import numpy as np

def load_mat_file(filepath):
    """加载单个.mat文件并提取振动信号"""
    data = loadmat(filepath)
    for key in data.keys():
        if 'DE' in key:  # DE表示驱动端振动数据
            return data[key].ravel()
    raise ValueError("未找到振动信号数据")

典型的数据结构特征：

采样频率：12kHz（每秒12000个数据点）
信号长度：通常包含多个旋转周期的振动数据
故障类型：包括内圈故障、外圈故障、滚动体故障等

2. 数据预处理流程

2.1 信号切片与增强

原始振动信号通常很长，需要切分为适合CNN处理的片段：

python复制def slice_signal(signal, window_size=864, step=28):
    """将长信号切分为固定长度的片段"""
    slices = []
    for start in range(0, len(signal)-window_size, step):
        slices.append(signal[start:start+window_size])
    return np.array(slices)

数据增强技巧：

随机切片：从不同位置截取信号片段
添加噪声：注入高斯噪声提升模型鲁棒性
幅度缩放：随机调整信号幅度

2.2 标签编码与数据集划分

采用分层抽样保证各类别比例一致：

python复制from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import StratifiedShuffleSplit

def prepare_labels(filenames):
    """根据文件名生成类别标签"""
    labels = []
    for name in filenames:
        if 'Normal' in name: labels.append(0)
        elif 'IR' in name: labels.append(1)  # 内圈故障
        elif 'OR' in name: labels.append(2)  # 外圈故障
        elif 'Ball' in name: labels.append(3) # 滚动体故障
    return np.array(labels).reshape(-1, 1)

# One-hot编码
encoder = OneHotEncoder(categories='auto')
one_hot_labels = encoder.fit_transform(labels).toarray()

数据集划分比例建议：

数据集	比例	作用
训练集	60%	模型参数训练
验证集	20%	超参数调优
测试集	20%	最终性能评估

3. CNN模型架构设计

3.1 1D-CNN层设计原理

对于振动信号这种一维时序数据，1D卷积能有效捕捉局部特征：

python复制from keras.layers import Conv1D, BatchNormalization, MaxPooling1D

def build_conv_block(input_layer, filters=64, kernel_size=3):
    """构建卷积块：Conv1D + BatchNorm + ReLU + MaxPooling"""
    x = Conv1D(filters, kernel_size, padding='same')(input_layer)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPooling1D(pool_size=2)(x)
    return x

关键参数选择依据：

卷积核大小：通常选择3-5个采样点，能捕捉短期振动特征
滤波器数量：逐层增加（如64→128→256），提取更抽象特征
步长设置：通常为1，保持时间分辨率

3.2 完整网络架构

构建包含三个卷积块的深层网络：

python复制from keras.models import Model
from keras.layers import Input, Dense, Flatten, Dropout

def build_model(input_shape=(864,1)):
    """构建完整CNN模型"""
    inputs = Input(shape=input_shape)
    
    # 卷积模块
    x = build_conv_block(inputs, filters=64)
    x = build_conv_block(x, filters=128)
    x = build_conv_block(x, filters=256)
    
    # 分类头
    x = Flatten()(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.5)(x)
    outputs = Dense(4, activation='softmax')(x)  # 4类故障
    
    return Model(inputs, outputs)

网络结构可视化：

code复制Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 864, 1)            0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 864, 64)           256       
_________________________________________________________________
batch_normalization_1 (Batch (None, 864, 64)           256       
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 432, 64)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 432, 128)          24704     
_________________________________________________________________
... (中间层省略) ...
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 516       
=================================================================
Total params: 210,180
Trainable params: 209,924
Non-trainable params: 256

4. 模型训练与调优

4.1 训练策略配置

使用动态学习率和模型检查点：

python复制from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

def get_callbacks():
    """配置训练回调函数"""
    checkpoint = ModelCheckpoint(
        'best_model.h5',
        monitor='val_accuracy',
        save_best_only=True,
        mode='max'
    )
    lr_reducer = ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        verbose=1
    )
    return [checkpoint, lr_reducer]

优化器参数设置：

python复制from keras.optimizers import Adam

model.compile(
    optimizer=Adam(lr=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

4.2 训练过程监控

启动训练并可视化结果：

python复制history = model.fit(
    x_train, y_train,
    batch_size=64,
    epochs=100,
    validation_data=(x_val, y_val),
    callbacks=get_callbacks()
)

# 绘制训练曲线
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.title('Accuracy over epochs')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.title('Loss over epochs')
plt.legend()

典型训练曲线分析：

理想情况：训练和验证指标同步提升，最终趋于稳定
过拟合迹象：训练指标持续提升而验证指标停滞或下降
欠拟合表现：两者均提升缓慢

4.3 常见问题解决

问题1：梯度消失/爆炸

解决方案：添加BatchNorm层，使用Xavier/Glorot初始化

问题2：类别不平衡

解决方案：在损失函数中使用类别权重

python复制from sklearn.utils.class_weight import compute_class_weight

class_weights = compute_class_weight('balanced', np.unique(labels), labels)
class_weights = dict(enumerate(class_weights))

问题3：模型收敛慢

调整策略：增大学习率或使用学习率warmup

python复制def warmup_scheduler(epoch, lr):
    if epoch < 10:
        return lr * (epoch + 1) / 10
    return lr

5. 模型评估与部署

5.1 性能评估指标

除了准确率，还应关注：

python复制from sklearn.metrics import classification_report

y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

print(classification_report(
    y_true, y_pred_classes,
    target_names=['Normal', 'IR', 'OR', 'Ball']
))

关键指标解读：

精确率(Precision)：预测为正类中实际为正的比例
召回率(Recall)：实际正类中被正确预测的比例
F1分数：精确率和召回率的调和平均

5.2 模型轻量化

为工业部署考虑，可以进行模型压缩：

python复制from keras import backend as K

def prune_weights(model, pruning_percent=0.2):
    """权重剪枝"""
    weights = model.get_weights()
    for i in range(len(weights)):
        if len(weights[i].shape) > 1:  # 只剪枝非偏置项
            threshold = np.percentile(np.abs(weights[i]), pruning_percent*100)
            mask = np.abs(weights[i]) > threshold
            weights[i] = weights[i] * mask
    model.set_weights(weights)

其他优化手段：

量化训练：使用16位浮点数代替32位
知识蒸馏：用大模型指导小模型训练
层融合：合并卷积和BN层

5.3 实际应用建议

在工业场景中部署时：

实时性要求：考虑使用C++重写推理代码
数据漂移：定期用新数据微调模型
异常检测：结合无监督方法识别未知故障类型

python复制def detect_anomaly(signal, model, threshold=0.9):
    """检测未知故障类型"""
    prob = model.predict(np.expand_dims(signal, axis=0))
    if np.max(prob) < threshold:
        return "Unknown Fault"
    return "Known Fault"

已经到底了哦