Word教案大文件上传是教育行业信息化系统中最常见的高频痛点场景。某省级在线教育平台的后台数据显示,教师用户每周平均产生3.2次超过500MB的复合文档上传操作,其中包含图文混排的教案、嵌入的多媒体资源以及批注痕迹等元数据。传统表单上传方案在面对这类场景时存在三个致命缺陷:
采用Blob.prototype.slice方法实现浏览器端文件分片,配合SparkMD5生成文件指纹。关键参数设计如下:
javascript复制// 分片大小根据网络环境动态计算
const CHUNK_SIZE = navigator.connection
? Math.max(1, Math.floor(navigator.connection.downlink * 1024 * 0.8))
: 2 * 1024 * 1024; // 默认2MB
// Web Worker计算文件hash
const calculateHash = (file) => {
return new Promise(resolve => {
const spark = new SparkMD5.ArrayBuffer()
const reader = new FileReader()
reader.readAsArrayBuffer(file)
reader.onload = e => {
spark.append(e.target.result)
resolve(spark.end())
}
})
}
Java服务端采用Spring WebFlux响应式编程模型处理并发上传请求,核心组件包括:
java复制@PostMapping("/upload/chunk")
public Mono<ResponseEntity<UploadResult>> uploadChunk(
@RequestParam("file") FilePart filePart,
@RequestParam("chunkNumber") int chunkNumber,
@RequestParam("totalChunks") int totalChunks,
@RequestParam("identifier") String identifier) {
return filePart.transferTo(Paths.get(getChunkPath(identifier, chunkNumber)))
.thenReturn(ResponseEntity.ok(new UploadResult(chunkNumber, true)));
}
java复制public void mergeFiles(String identifier, String fileName) throws IOException {
Path output = Paths.get(uploadDir, fileName);
try (OutputStream os = Files.newOutputStream(output, StandardOpenOption.CREATE)) {
IntStream.range(0, getTotalChunks(identifier))
.mapToObj(i -> getChunkPath(identifier, i))
.filter(Files::exists)
.forEach(chunkPath -> {
Files.copy(chunkPath, os);
deleteQuietly(chunkPath);
});
}
}
采用状态机模式管理上传生命周期,关键状态转换包括:
code复制[准备] → [哈希计算] → [分片上传] → [合并请求] → [完成]
↓ ↑
└── [失败重试] ←─┘
实现代码框架:
javascript复制class Uploader {
constructor(file) {
this.state = 'READY'
this.retryCount = 0
}
async start() {
try {
this.state = 'HASHING'
this.fileHash = await calculateHash(this.file)
this.state = 'UPLOADING'
await this.uploadChunks()
this.state = 'MERGING'
await this.mergeRequest()
this.state = 'DONE'
} catch (e) {
this.handleError(e)
}
}
}
基于IndexedDB构建本地分片缓存:
javascript复制const dbPromise = idb.openDB('upload-manager', 1, {
upgrade(db) {
db.createObjectStore('chunks', {
keyPath: ['fileHash', 'chunkNumber']
})
}
})
async function saveChunk(chunk) {
const db = await dbPromise
await db.put('chunks', {
fileHash: chunk.fileHash,
chunkNumber: chunk.number,
blob: chunk.blob
})
}
采用令牌桶算法限制并发请求数:
javascript复制class ConcurrentController {
constructor(max) {
this.max = max
this.queue = []
this.active = 0
}
async acquire() {
if (this.active < this.max) {
this.active++
return
}
return new Promise(resolve => {
this.queue.push(resolve)
})
}
release() {
this.active--
if (this.queue.length) {
const next = this.queue.shift()
next()
}
}
}
在浏览器端对文本类分片进行Gzip压缩:
javascript复制async function compressChunk(blob) {
if (!blob.type.includes('text')) return blob
const cs = new CompressionStream('gzip')
const compressedStream = blob.stream().pipeThrough(cs)
return new Response(compressedStream).blob()
}
服务端校验分片有效性:
java复制public boolean validateChunk(Path chunkPath, long expectedSize) {
try {
if (Files.size(chunkPath) != expectedSize) {
return false;
}
byte[] head = Files.readAllBytes(chunkPath, 0, 4);
return !isForbiddenHeader(head);
} catch (IOException e) {
return false;
}
}
private boolean isForbiddenHeader(byte[] head) {
// 检测可执行文件头
byte[][] forbidden = {
{0x4D, 0x5A}, // EXE
{0x23, 0x21}, // Shell脚本
{0x7F, 0x45, 0x4C} // ELF
};
return Arrays.stream(forbidden)
.anyMatch(pattern -> startsWith(head, pattern));
}
JWT令牌绑定分片请求:
java复制@PostMapping("/upload/chunk")
public Mono<ResponseEntity<?>> uploadChunk(
@RequestHeader("Authorization") String token,
@RequestParam("file") FilePart filePart,
/* 其他参数 */) {
return jwtDecoder.decode(token)
.flatMap(claims -> {
if (!hasUploadPermission(claims)) {
return Mono.error(new AccessDeniedException());
}
return processUpload(filePart);
});
}
在负载均衡层需要调整关键参数:
nginx复制proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
client_max_body_size 0; # 禁用大小限制
针对海量小分片场景,采用分层存储策略:
关键监控维度与采集方式:
| 指标名称 | 采集方式 | 报警阈值 |
|---|---|---|
| 分片上传成功率 | Prometheus计数器 | <99% (5分钟) |
| 合并操作耗时 | Micrometer Timer | >30s (P99) |
| 存储空间使用率 | JMX DiskUsage | >85% |
| 并发上传连接数 | Netty Channel Metrics | >5000 |
针对IE等老旧浏览器的降级策略:
javascript复制function getUploader(file) {
if (window.File && window.Blob && window.FileReader) {
return new ModernUploader(file)
}
return new LegacyFormUploader(file) // 回退到传统表单上传
}
降级方案核心逻辑:
监听connection.onchange事件:
javascript复制navigator.connection.addEventListener('change', () => {
const newChunkSize = calculateOptimalChunkSize()
uploader.adjustChunkSize(newChunkSize)
})
使用Background Fetch API:
javascript复制if ('BackgroundFetchManager' in window) {
const bgFetch = await navigator.serviceWorker.ready
.then(swReg => swReg.backgroundFetch.fetch('upload-1', chunks, {
title: '教案上传中',
icons: [{src: '/icon.png', sizes: '72x72'}]
}))
}
上传完成后自动生成缩略图:
java复制public void generateThumbnail(Path docPath) throws Exception {
ProcessBuilder pb = new ProcessBuilder(
"libreoffice",
"--headless",
"--convert-to", "png:writer_png_Export",
"--outdir", tempDir.toString(),
docPath.toString()
);
Process p = pb.start();
p.waitFor(30, TimeUnit.SECONDS);
}
基于git-diff原理实现文档变更可视化:
javascript复制function highlightChanges(oldDoc, newDoc) {
const diff = Diff.createTwoFilesPatch(
'old', 'new',
oldDoc.textContent,
newDoc.textContent
);
return parseDiff(diff);
}
在实际部署中我们发现,当分片大小设置为网络带宽的80%时(通过Network Information API获取),传输效率可提升约35%。但需要注意iOS Safari对Blob.slice的实现有特殊限制,需要额外做兼容处理。