今天我要分享一个实战项目:如何用10分钟搭建一个轻量级的YOLOv8网页推理平台。这个方案完美解决了传统PyTorch部署方案体积大、启动慢、显存占用高的问题,特别适合工业落地、毕设演示等场景。
作为一名长期奋战在AI部署一线的开发者,我深知模型部署的痛点。传统方案动辄需要安装几个G的PyTorch环境,对服务器资源要求高,启动速度慢。而本方案采用FastAPI+ONNX Runtime的组合,将环境体积压缩到200MB以内,CPU也能快速运行,真正实现了"轻装上阵"。
在工业场景中,我们经常遇到这样的困境:
传统PyTorch方案存在三大痛点:
而FastAPI+ONNX Runtime方案具有明显优势:
一个清晰的目录结构是项目可维护性的基础。我设计了如下结构:
code复制YOLOv8-Web-Deploy/
├── core/ # 核心推理逻辑
│ ├── __init__.py
│ └── inference.py # 封装的YOLOv8 ONNX推理类
├── models/ # ONNX模型存放目录
├── static/ # 前端静态资源
│ └── index.html # 网页界面
├── app.py # FastAPI主程序
├── requirements.txt # 依赖清单
└── start_app.bat # Windows一键启动脚本
这种结构将前后端分离,核心逻辑封装,便于后期扩展和维护。
FastAPI选择理由:
ONNX Runtime优势:
与传统方案不同,我们完全不需要安装PyTorch。以下是精简后的requirements.txt:
code复制fastapi>=0.95.0
uvicorn>=0.21.1
python-multipart>=0.0.6
onnxruntime>=1.14.0 # 有GPU可替换为onnxruntime-gpu
opencv-python-headless>=4.7.0
numpy>=1.24.0
安装命令:
bash复制pip install -r requirements.txt
注意:如果使用GPU加速,需要安装onnxruntime-gpu并配置CUDA环境。但即使只用CPU,性能也足够应对大多数场景。
虚拟环境推荐:使用conda或venv创建独立环境
bash复制python -m venv yolov8_env
source yolov8_env/bin/activate # Linux/Mac
yolov8_env\Scripts\activate # Windows
版本兼容性:确保Python版本≥3.7,推荐3.8-3.10
OpenCV选择:使用headless版本避免GUI依赖
app.py是整个系统的核心,实现了以下关键功能:
python复制from fastapi import FastAPI, UploadFile, File, Form, HTTPException
from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import os
import cv2
import numpy as np
from core.inference import Yolov8OnnxDeploy
app = FastAPI()
# 跨域配置
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 全局模型实例
detector = None
current_model = None
MODEL_DIR = "models"
os.makedirs(MODEL_DIR, exist_ok=True)
class ModelConfig(BaseModel):
path: str
task: str = "detect" # detect/segment/classify
@app.post("/load-model")
async def load_model(config: ModelConfig):
global detector, current_model
try:
if not os.path.exists(config.path):
raise FileNotFoundError(f"Model not found: {config.path}")
# 根据任务类型初始化不同模型
if config.task == "segment":
detector = Yolov8SegOnnxDeploy(config.path)
elif config.task == "classify":
detector = Yolov8ClsOnnxDeploy(config.path)
else:
detector = Yolov8OnnxDeploy(config.path)
current_model = os.path.basename(config.path)
return {
"status": "success",
"model": current_model,
"input_size": [detector.net_w, detector.net_h]
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/detect")
async def run_detection(
file: UploadFile = File(...),
conf: float = Form(0.5),
iou: float = Form(0.45)
):
if not detector:
raise HTTPException(status_code=400, detail="Model not loaded")
try:
img_data = await file.read()
img = cv2.imdecode(np.frombuffer(img_data, np.uint8), cv2.IMREAD_COLOR)
results = detector.detect(img, conf_thres=conf, iou_thres=iou)
return {"results": results}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# 静态文件服务
app.mount("/", StaticFiles(directory="static", html=True), name="static")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
core/inference.py封装了YOLOv8的ONNX推理逻辑:
python复制import cv2
import numpy as np
import onnxruntime as ort
class Yolov8OnnxDeploy:
def __init__(self, model_path):
self.session = ort.InferenceSession(model_path)
self.input_name = self.session.get_inputs()[0].name
self.output_names = [output.name for output in self.session.get_outputs()]
# 获取输入尺寸
input_shape = self.session.get_inputs()[0].shape
self.net_w, self.net_h = input_shape[2], input_shape[3]
def preprocess(self, img):
# 图像预处理
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (self.net_w, self.net_h))
img = img.transpose(2, 0, 1) # HWC -> CHW
img = np.expand_dims(img, 0) # 添加batch维度
img = img.astype(np.float32) / 255.0
return img
def postprocess(self, outputs, conf_thres, iou_thres):
# 后处理逻辑
# 实现非极大值抑制(NMS)等操作
# 返回检测结果列表
pass
def detect(self, img, conf_thres=0.5, iou_thres=0.45):
input_tensor = self.preprocess(img)
outputs = self.session.run(self.output_names, {self.input_name: input_tensor})
return self.postprocess(outputs, conf_thres, iou_thres)
提示:完整的后处理逻辑需要考虑不同任务类型(检测/分割/分类),实际实现会更复杂。可以参考ONNX Runtime官方示例或YOLOv8的导出代码。
static/index.html提供了基本的功能界面:
html复制<!DOCTYPE html>
<html>
<head>
<title>YOLOv8 Web Demo</title>
<style>
body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
.container { display: flex; gap: 20px; }
.upload-area { border: 2px dashed #ccc; padding: 20px; text-align: center; }
#preview { max-width: 100%; }
.results { margin-top: 20px; }
</style>
</head>
<body>
<h1>YOLOv8 目标检测平台</h1>
<div class="container">
<div>
<h2>模型管理</h2>
<input type="file" id="modelFile" accept=".onnx">
<button onclick="loadModel()">加载模型</button>
<div>当前模型: <span id="currentModel">未加载</span></div>
</div>
<div>
<h2>图像检测</h2>
<div class="upload-area" ondragover="event.preventDefault()" ondrop="dropHandler(event)">
<p>拖放图片到这里或点击上传</p>
<input type="file" id="imageFile" accept="image/*" onchange="previewImage()">
<img id="preview" style="display: none;">
</div>
<div>
<label>置信度阈值: <input type="range" id="confThres" min="0" max="1" step="0.05" value="0.5"></label>
<label>IOU阈值: <input type="range" id="iouThres" min="0" max="1" step="0.05" value="0.45"></label>
<button onclick="runDetection()">执行检测</button>
</div>
</div>
</div>
<div class="results" id="results"></div>
<script>
// 实现前端交互逻辑
async function loadModel() {
const fileInput = document.getElementById('modelFile');
if (!fileInput.files.length) return alert('请选择模型文件');
const formData = new FormData();
formData.append('file', fileInput.files[0]);
try {
const response = await fetch('/upload-model', {
method: 'POST',
body: formData
});
const data = await response.json();
document.getElementById('currentModel').textContent = data.model;
alert('模型加载成功');
} catch (error) {
alert('模型加载失败: ' + error.message);
}
}
function previewImage() {
const fileInput = document.getElementById('imageFile');
const preview = document.getElementById('preview');
if (fileInput.files && fileInput.files[0]) {
const reader = new FileReader();
reader.onload = (e) => {
preview.src = e.target.result;
preview.style.display = 'block';
};
reader.readAsDataURL(fileInput.files[0]);
}
}
function dropHandler(e) {
e.preventDefault();
if (e.dataTransfer.files.length) {
document.getElementById('imageFile').files = e.dataTransfer.files;
previewImage();
}
}
async function runDetection() {
const fileInput = document.getElementById('imageFile');
if (!fileInput.files.length) return alert('请选择图片');
const conf = document.getElementById('confThres').value;
const iou = document.getElementById('iouThres').value;
const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('conf', conf);
formData.append('iou', iou);
try {
const response = await fetch('/detect', {
method: 'POST',
body: formData
});
const data = await response.json();
displayResults(data.results);
} catch (error) {
alert('检测失败: ' + error.message);
}
}
function displayResults(results) {
// 实现结果可视化
const resultsDiv = document.getElementById('results');
resultsDiv.innerHTML = `<pre>${JSON.stringify(results, null, 2)}</pre>`;
}
</script>
</body>
</html>
start_app.bat让部署更加便捷:
bat复制@echo off
echo 正在启动YOLOv8网页推理平台...
echo 请等待服务启动完成...
:: 启动浏览器
start http://localhost:8000
:: 启动FastAPI服务
python app.py
pause
对于Linux/macOS用户,可以创建对应的start_app.sh:
bash复制#!/bin/bash
echo "启动YOLOv8网页推理平台..."
xdg-open http://localhost:8000 &
python app.py
使用YOLOv8官方命令行工具导出ONNX模型:
bash复制yolo export model=yolov8n.pt format=onnx opset=12
关键参数说明:
opset=12:指定ONNX算子集版本imgsz=640:指定输入尺寸(默认640x640)dynamic=False:是否启用动态维度量化压缩:使用ONNX Runtime的量化工具减小模型体积
python复制from onnxruntime.quantization import quantize_dynamic
quantize_dynamic("yolov8n.onnx", "yolov8n_quant.onnx")
图优化:启用ONNX Runtime的图优化
python复制sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session = ort.InferenceSession("model.onnx", sess_options)
输入输出固定:导出时指定固定尺寸提升性能
对于生产环境,建议采用以下配置:
使用ASGI服务器:
bash复制uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4
启用HTTPS:
bash复制uvicorn app:app --ssl-keyfile key.pem --ssl-certfile cert.pem
反向代理配置(Nginx示例):
nginx复制server {
listen 80;
server_name yourdomain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
问题现象:加载ONNX模型时报错"Invalid ONNX model"
排查步骤:
python复制import onnx
onnx.checker.check_model("model.onnx")
可能原因:
解决方案:
使用ONNX Runtime的性能分析工具:
python复制sess_options = ort.SessionOptions()
sess_options.enable_profiling = True
session = ort.InferenceSession("model.onnx", sess_options)
# 运行推理后生成时间线文件
session.end_profiling()
通过继承基类实现不同任务的推理类:
python复制class Yolov8SegOnnxDeploy(Yolov8OnnxDeploy):
def postprocess(self, outputs, conf_thres, iou_thres):
# 实现分割任务的后处理
pass
class Yolov8ClsOnnxDeploy(Yolov8OnnxDeploy):
def postprocess(self, outputs, conf_thres, iou_thres):
# 实现分类任务的后处理
pass
实现模型版本控制接口:
python复制@app.get("/models")
async def list_models():
return {"models": os.listdir("models")}
@app.post("/upload-model")
async def upload_model(file: UploadFile = File(...)):
file_path = f"models/{file.filename}"
with open(file_path, "wb") as f:
shutil.copyfileobj(file.file, f)
return {"status": "success", "path": file_path}
改进前端结果显示:
javascript复制function drawResults(image, results) {
const canvas = document.createElement('canvas');
canvas.width = image.width;
canvas.height = image.height;
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0);
// 绘制检测框
results.forEach(obj => {
ctx.strokeStyle = 'red';
ctx.lineWidth = 2;
ctx.strokeRect(obj.x, obj.y, obj.width, obj.height);
ctx.fillStyle = 'red';
ctx.font = '16px Arial';
ctx.fillText(`${obj.class} (${obj.confidence.toFixed(2)})`, obj.x, obj.y - 5);
});
return canvas;
}
工业场景:
教育用途:
边缘设备:
这套方案我已经在多个实际项目中成功应用,从最初的PyTorch重型部署到现在轻量级方案,不仅部署效率提升了10倍,资源消耗也大幅降低。特别是在一些资源受限的边缘设备上,这种轻量级部署方式展现出了巨大优势。