RK3588作为瑞芯微旗舰级芯片,凭借6TOPS算力和四核A76+四核A55架构,成为边缘计算的热门选择。而YOLOv5+DeepSORT这对组合,一个负责实时目标检测,一个专注多目标追踪,在智能监控、无人零售等场景表现抢眼。去年我在某园区安防项目中首次尝试这个方案,实测1080P视频流处理速度达到22FPS,比传统方案快3倍。
这个组合的强大之处在于:
注意:选择YOLOv5s版本而非更大的m/l版本,在RK3588上能获得最佳性价比。我曾用yolov5m做过对比测试,精度仅提升5%但帧率下降40%
先来个全家桶式的依赖安装,这些是后续所有工作的基础:
bash复制sudo apt-get update
sudo apt-get install -y build-essential cmake git libgtk2.0-dev \
pkg-config libavcodec-dev libavformat-dev libswscale-dev \
python3-dev python3-numpy libtbb2 libtbb-dev \
libjpeg-dev libpng-dev libtiff-dev libdc1394-22-dev
遇到libjasper-dev安装失败时,别慌,用这个方案:
bash复制sudo add-apt-repository "deb http://mirrors.aliyun.com/ubuntu-ports focal main multiverse"
sudo apt update
sudo apt install libjasper1 libjasper-dev
FFmpeg是视频处理的核心,推荐从源码编译以获得最佳性能。去年给某工厂部署时,预编译版本出现视频花屏问题,自编译版本完美解决:
bash复制wget https://www.nasm.us/pub/nasm/releasebuilds/2.15.05/nasm-2.15.05.tar.gz
tar xvf nasm-2.15.05.tar.gz
cd nasm-2.15.05
./configure --prefix=/usr/local
make -j8 && sudo make install
bash复制git clone https://code.videolan.org/videolan/x264.git
cd x264
./configure --enable-shared --prefix=/usr/local
make -j8 && sudo make install
bash复制./configure --prefix=/usr/local --enable-shared --enable-gpl \
--enable-libx264 --extra-cflags="-I/usr/local/include" \
--extra-ldflags="-L/usr/local/lib"
make -j8 && sudo make install
踩坑记录:如果运行时报错"libx264 not found",记得执行:
bash复制sudo sh -c "echo /usr/local/lib > /etc/ld.so.conf.d/custom.conf" sudo ldconfig
在RK3588上编译OpenCV要特别注意NPU加速支持,这是我的cmake配置:
bash复制cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D WITH_TBB=ON \
-D WITH_V4L=ON \
-D WITH_QT=OFF \
-D WITH_OPENGL=ON \
-D WITH_CUDA=OFF \
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
-D BUILD_EXAMPLES=OFF \
-D BUILD_opencv_python3=ON \
-D BUILD_opencv_python2=OFF \
-D WITH_FFMPEG=ON \
-D ENABLE_NEON=ON \
-D ENABLE_VFPV3=ON \
-D WITH_LIBV4L=ON \
-D OPENCV_ENABLE_NONFREE=ON \
-D BUILD_TESTS=OFF ..
关键参数说明:
在RK3588上编译OpenCV大约需要2小时,这几个技巧能节省40%时间:
bash复制sudo apt install ccache
export PATH="/usr/lib/ccache:$PATH"
bash复制sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
bash复制make -j8 && sudo make install
官方提供的rknn-toolkit2工具链是转换关键,具体步骤:
python复制import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
torch.save(model.state_dict(), 'yolov5s.pt')
python复制import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
model.eval()
dummy_input = torch.randn(1, 3, 640, 640)
torch.onnx.export(model, dummy_input, "yolov5s.onnx",
input_names=["images"],
output_names=["output"],
dynamic_axes={"images": {0: "batch"}, "output": {0: "batch"}})
python复制from rknn.api import RKNN
rknn = RKNN()
rknn.config(target_platform='rk3588')
rknn.load_onnx(model='yolov5s.onnx')
rknn.build(do_quantization=True, dataset='./dataset.txt')
rknn.export_rknn('./yolov5s.rknn')
实测发现:量化时使用500张以上覆盖场景的图片,精度损失可控制在1%以内
DeepSORT需要两个模型:
特征提取模型转换示例:
python复制rknn = RKNN()
rknn.config(target_platform='rk3588')
rknn.load_tensorflow(tf_pb='mars-small128.pb',
inputs=['images'],
outputs=['features'],
input_size_list=[[128,64,3]])
rknn.build(do_quantization=True)
rknn.export_rknn('mars-small128.rknn')
关键参数调整:
推荐使用周同学的适配版本为基础:
bash复制git clone https://github.com/Zhou-sx/yolov5_Deepsort_rknn.git
必须修改的关键点:
cmake复制set(OpenCV_DIR "/usr/local/lib/cmake/opencv4")
set(RKNN_RT_LIB "/usr/lib/librknnrt.so")
cpp复制// 在videoio.cpp中修改
video_probs.Video_fourcc = cv::VideoWriter::fourcc('M','J','P','G');
cpp复制// common.h中修改
#define YOLO_MODEL_PATH "/models/yolov5s.rknn"
#define DEEPSORT_MODEL_PATH "/models/mars-small128.rknn"
通过三个月的实战总结出这些优化方案:
cpp复制std::vector<cv::Mat> frame_pool(10);
for(auto& mat : frame_pool) {
mat.create(1080, 1920, CV_8UC3);
}
cpp复制std::thread capture_thread([](){
// 专门负责视频采集
});
std::thread process_thread([](){
// 专门负责推理处理
});
bash复制sudo echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
实测优化前后对比:
| 优化项 | 原始性能 | 优化后 | 提升幅度 |
|---|---|---|---|
| 帧率 | 15 FPS | 28 FPS | 86% |
| 延迟 | 210ms | 90ms | 57% |
| CPU占用 | 85% | 45% | 47% |
以超市客流分析为例,完整部署流程:
cpp复制cv::VideoCapture cap;
cap.open("rtsp://admin:password@192.168.1.64:554/stream1");
cpp复制YOLOv5 detector;
detector.load_model("yolov5s.rknn");
DeepSORT tracker;
tracker.init("mars-small128.rknn");
cpp复制while(true) {
cv::Mat frame;
cap >> frame;
auto detections = detector.detect(frame);
auto tracks = tracker.update(detections);
for(auto& track : tracks) {
draw_tracking_result(frame, track);
count_people(track); // 客流统计
}
cv::imshow("Result", frame);
if(cv::waitKey(1) == 27) break;
}
常见问题解决方案:
cpp复制cap.set(cv::CAP_PROP_BUFFERSIZE, 5);
cpp复制detector.set_param("low_light_mode", true);
当系统运行不稳定时,这些调试方法很管用:
bash复制watch -n 1 cat /sys/kernel/debug/rknpu/load
bash复制valgrind --tool=memcheck --leak-check=full ./yolov5_deepsort
bash复制sudo perf record -g ./yolov5_deepsort
sudo perf report
最近帮客户排查的一个典型问题:
cpp复制// 每1000帧重置一次NPU
if(frame_count % 1000 == 0) {
rknn_destroy(ctx);
ctx = rknn_init();
}
上线后还需要持续优化:
python复制# 使用自有数据微调
python train.py --data supermarket.yaml --weights yolov5s.pt \
--epochs 50 --batch-size 16 --img 640
python复制# 收集误检样本
python detect.py --source bad_cases/ --save-txt \
--weights best.pt
python复制# 用小模型学习大模型输出
python train.py --data supermarket.yaml \
--weights yolov5s.pt \
--teacher yolov5m.pt \
--distill