在目标检测领域,注意力机制已经成为提升模型性能的关键技术。从早期的Squeeze-and-Excitation(SE)模块到后来的Coordinate Attention(CoordAttention),注意力机制不断演进,为计算机视觉任务带来了显著的性能提升。本文将系统梳理注意力机制的发展脉络,并重点介绍如何将最新的CoordAttention模块集成到YOLOv8框架中。
注意力机制的核心思想是让模型学会"关注"输入数据中最重要的部分。在计算机视觉领域,这一概念已经发展出多种实现形式,每种都有其独特的设计理念和应用场景。
SE(Squeeze-and-Excitation)模块是注意力机制在计算机视觉中的早期成功应用。它的核心结构包括两个关键操作:
python复制class SEBlock(nn.Module):
def __init__(self, channel, reduction=16):
super(SEBlock, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y
SE模块的主要贡献在于:
然而,SE模块也存在明显局限:
Convolutional Block Attention Module(CBAM)在SE的基础上引入了空间注意力,形成了双注意力机制:
python复制class CBAM(nn.Module):
def __init__(self, channels, reduction=16):
super(CBAM, self).__init__()
self.channel_attention = ChannelAttention(channels, reduction)
self.spatial_attention = SpatialAttention()
def forward(self, x):
x = self.channel_attention(x)
x = self.spatial_attention(x)
return x
CBAM的主要优势:
但CBAM仍有改进空间:
CoordAttention(CA)是CVPR 2021提出的新型注意力机制,它创新性地将位置信息嵌入到通道注意力中:
python复制class CoordAtt(nn.Module):
def __init__(self, inp, reduction=32):
super(CoordAtt, self).__init__()
self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
self.pool_w = nn.AdaptiveAvgPool2d((1, None))
mip = max(8, inp // reduction)
self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)
self.bn1 = nn.BatchNorm2d(mip)
self.act = h_swish()
self.conv_h = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
self.conv_w = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
def forward(self, x):
identity = x
n, c, h, w = x.size()
x_h = self.pool_h(x)
x_w = self.pool_w(x).permute(0, 1, 3, 2)
y = torch.cat([x_h, x_w], dim=2)
y = self.conv1(y)
y = self.bn1(y)
y = self.act(y)
x_h, x_w = torch.split(y, [h, w], dim=2)
x_w = x_w.permute(0, 1, 3, 2)
a_h = self.conv_h(x_h).sigmoid()
a_w = self.conv_w(x_w).sigmoid()
return identity * a_w * a_h
CoordAttention的创新点:
三种注意力机制的性能对比如下:
| 指标 | SE | CBAM | CoordAttention |
|---|---|---|---|
| Top-1 Acc(%) | 75.2 | 76.5 | 77.3 |
| mAP@0.5 | 42.8 | 44.2 | 45.7 |
| 参数量增加 | 0.01x | 0.02x | 0.01x |
| 计算量增加 | 0.5% | 1.2% | 0.6% |
YOLOv8是Ultralytics公司推出的最新目标检测框架,相比前代有诸多改进:
YOLOv8的主要创新点包括:
YOLOv8使用YAML文件定义模型结构,主要包含两部分:
典型的YOLOv8配置文件结构如下:
yaml复制# YOLOv8 backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f, [128, True]] # 2
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f, [256, True]] # 4
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f, [512, True]] # 6
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [1024, True]] # 8
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8 head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)
将CoordAttention集成到YOLOv8需要三个关键步骤:模块实现、模型注册和配置文件修改。
首先需要在ultralytics/nn/attention/attention.py中添加CoordAttention的实现:
python复制import torch
import torch.nn as nn
import torch.nn.functional as F
class h_sigmoid(nn.Module):
def __init__(self, inplace=True):
super(h_sigmoid, self).__init__()
self.relu = nn.ReLU6(inplace=inplace)
def forward(self, x):
return self.relu(x + 3) / 6
class h_swish(nn.Module):
def __init__(self, inplace=True):
super(h_swish, self).__init__()
self.sigmoid = h_sigmoid(inplace=inplace)
def forward(self, x):
return x * self.sigmoid(x)
class CoordAtt(nn.Module):
def __init__(self, inp, reduction=32):
super(CoordAtt, self).__init__()
self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
self.pool_w = nn.AdaptiveAvgPool2d((1, None))
mip = max(8, inp // reduction)
self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)
self.bn1 = nn.BatchNorm2d(mip)
self.act = h_swish()
self.conv_h = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
self.conv_w = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
def forward(self, x):
identity = x
n, c, h, w = x.size()
x_h = self.pool_h(x)
x_w = self.pool_w(x).permute(0, 1, 3, 2)
y = torch.cat([x_h, x_w], dim=2)
y = self.conv1(y)
y = self.bn1(y)
y = self.act(y)
x_h, x_w = torch.split(y, [h, w], dim=2)
x_w = x_w.permute(0, 1, 3, 2)
a_h = self.conv_h(x_h).sigmoid()
a_w = self.conv_w(x_w).sigmoid()
return identity * a_w * a_h
在tasks.py中注册CoordAttention模块,使其能够被配置文件识别:
python复制from ultralytics.nn.attention.attention import *
def parse_model(d, ch, verbose=True): # model_dict, input_channels(3)
# ...
if m in (Classify, Conv, ConvTranspose, GhostConv, Bottleneck,
GhostBottleneck, SPP, SPPF, DWConv, Focus,
BottleneckCSP, C1, C2, C2f, C3, C3TR,
C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d,
C3x, RepC3, CoordAtt): # 添加CoordAtt到支持的模块列表
c1, c2 = ch[f], args[0]
# ...
有三种主要的CoordAttention集成方案,各有优缺点:
yaml复制backbone:
# ... 其他层保持不变
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 1, CoordAtt, [1024]] # 10
特点:
yaml复制head:
# ... 前面的层不变
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, CoordAtt, [256]] # 16
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 19 (P4/16-medium)
- [-1, 1, CoordAtt, [512]] # 20
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 23 (P5/32-large)
- [-1, 1, CoordAtt, [1024]] # 24
- [[16, 20, 24], 1, Detect, [nc]] # Detect(P3, P4, P5)
特点:
yaml复制backbone:
# ... 其他层保持不变
- [-1, 3, C2f, [256, True]] # 4
- [-1, 1, CoordAtt, [256]] # 5
- [-1, 1, Conv, [512, 3, 2]] # 6-P4/16
- [-1, 6, C2f, [512, True]] # 7
- [-1, 1, CoordAtt, [512]] # 8
- [-1, 1, Conv, [1024, 3, 2]] # 9-P5/32
- [-1, 3, C2f, [1024, True]] # 10
- [-1, 1, CoordAtt, [1024]] # 11
head:
# ... 类似方案2,在多个位置添加CoordAtt
特点:
在实际应用中,CoordAttention的加入通常能带来1-3%的mAP提升,具体效果取决于数据集和模型规模。
在COCO数据集上的测试结果:
| 模型 | mAP@0.5 | 参数量(M) | GFLOPs |
|---|---|---|---|
| YOLOv8n | 37.2 | 3.2 | 8.9 |
| +SE | 38.1 | 3.3 | 9.1 |
| +CBAM | 38.4 | 3.4 | 9.4 |
| +CoordAtt | 39.0 | 3.3 | 9.2 |
| YOLOv8x | 50.7 | 68.2 | 258.5 |
| +CoordAtt | 51.9 | 68.5 | 260.3 |
位置选择:
超参数调整:
yaml复制- [-1, 1, CoordAtt, [1024, 16]] # 第二个参数是reduction ratio
训练技巧:
计算优化:
torch.jit.script编译注意力模块版本兼容性问题:
显存不足:
性能提升不明显:
在自定义数据集上,建议先在小规模实验验证效果,再决定是否大规模应用。注意力机制并非万能,在某些简单数据集上可能收效甚微。