最近在重构一个老旧的后台服务时,遇到了需要通过HTTP协议下载文件的场景。作为一个C++老手,我决定自己动手实现这个功能模块,而不是引入臃肿的第三方库。本文将分享我用纯C++实现HTTP文件下载的完整方案,包含核心代码实现和避坑指南。
这个方案特别适合以下场景:
为什么不直接用cURL等成熟库?在我们的特定场景下:
实现文件下载需要理解几个关键协议细节:
使用Berkeley套接字API建立TCP连接:
cpp复制int sock = socket(AF_INET, SOCK_STREAM, 0);
sockaddr_in server_addr{};
server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(80);
inet_pton(AF_INET, "目标IP", &server_addr.sin_addr);
connect(sock, (sockaddr*)&server_addr, sizeof(server_addr));
注意:实际代码中需要添加完善的错误处理,这里为简洁省略
构造符合规范的GET请求:
cpp复制std::string request = "GET " + url_path + " HTTP/1.1\r\n"
"Host: " + host + "\r\n"
"Connection: close\r\n"
"User-Agent: MyDownloader/1.0\r\n"
"\r\n";
处理响应头部的状态码和内容长度:
cpp复制// 读取响应头
std::string response;
char buffer[1024];
while (read(sock, buffer, sizeof(buffer)) > 0) {
response.append(buffer);
if (response.find("\r\n\r\n") != std::string::npos)
break;
}
// 解析状态码
size_t code_pos = response.find(" ");
int status_code = std::stoi(response.substr(code_pos + 1, 3));
// 获取内容长度
size_t cl_pos = response.find("Content-Length: ");
size_t content_length = 0;
if (cl_pos != std::string::npos) {
content_length = std::stoul(response.substr(cl_pos + 16));
}
核心下载循环实现:
cpp复制FILE* file = fopen("output.bin", "wb");
char buf[4096];
size_t total_received = 0;
while (total_received < content_length) {
ssize_t received = recv(sock, buf, sizeof(buf), 0);
if (received <= 0) break;
fwrite(buf, 1, received, file);
total_received += received;
// 可添加进度回调
if (progress_callback) {
progress_callback(total_received, content_length);
}
}
fclose(file);
对于Transfer-Encoding: chunked的响应:
cpp复制while (true) {
// 读取块大小行
std::string chunk_size_line;
read_line(sock, chunk_size_line);
unsigned long chunk_size = std::stoul(chunk_size_line, nullptr, 16);
if (chunk_size == 0) break;
// 读取块数据
while (chunk_size > 0) {
size_t to_read = std::min(chunk_size, sizeof(buf));
ssize_t received = recv(sock, buf, to_read, 0);
fwrite(buf, 1, received, file);
chunk_size -= received;
}
// 跳过CRLF
char crlf[2];
recv(sock, crlf, 2, 0);
}
通过Range头部实现:
cpp复制std::string request = "GET " + path + " HTTP/1.1\r\n"
"Host: " + host + "\r\n"
"Range: bytes=" + std::to_string(existing_size) + "-\r\n"
"\r\n";
将文件分成多个区间并行下载:
cpp复制struct DownloadRange {
size_t start;
size_t end;
std::string temp_file;
};
std::vector<DownloadRange> split_ranges(size_t total_size, int threads) {
std::vector<DownloadRange> ranges;
size_t chunk_size = total_size / threads;
// ...分配区间逻辑
return ranges;
}
根据网络环境调整缓冲区大小:
cpp复制// 设置套接字缓冲区
int buf_size = 256 * 1024; // 256KB
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &buf_size, sizeof(buf_size));
使用sendfile系统调用(Linux特有):
cpp复制#include <sys/sendfile.h>
int out_fd = open("output.bin", O_WRONLY);
off_t offset = 0;
sendfile(out_fd, sock, &offset, content_length);
HTTP状态码处理策略:
| 状态码 | 处理方式 |
|---|---|
| 200 OK | 正常处理 |
| 206 Partial Content | 断点续传响应 |
| 301/302 | 处理重定向 |
| 404 | 报错终止 |
| 500 | 重试或终止 |
健壮的重试机制实现:
cpp复制for (int retry = 0; retry < max_retries; ++retry) {
try {
download_file(url);
break;
} catch (const NetworkException& e) {
if (retry == max_retries - 1) throw;
std::this_thread::sleep_for(std::chrono::seconds(1 << retry));
}
}
项目目录结构示例:
code复制http_downloader/
├── include/
│ ├── http_client.h
│ └── progress_listener.h
├── src/
│ ├── http_client.cpp
│ └── main.cpp
└── CMakeLists.txt
核心接口设计:
cpp复制class HttpClient {
public:
void download(const std::string& url,
const std::string& save_path,
ProgressListener* listener = nullptr);
void set_timeout(int milliseconds);
void set_max_retries(int count);
};
遇到的真实案例:
实测发现的瓶颈点:
通过OpenSSL添加安全层:
cpp复制SSL_CTX* ctx = SSL_CTX_new(TLS_client_method());
SSL* ssl = SSL_new(ctx);
SSL_set_fd(ssl, sock);
SSL_connect(ssl);
// 之后使用SSL_read/SSL_write替代recv/send
结合libuv实现事件驱动:
cpp复制uv_tcp_t socket;
uv_tcp_init(loop, &socket);
uv_tcp_connect(&connect_req, &socket, addr, on_connect);
这个实现从最初的简单版本到现在的生产可用版本,我花了约3周时间进行各种边界情况测试和性能优化。最深刻的体会是:网络编程中,错误处理代码往往比正常流程代码多3倍。建议在实现核心功能后,立即着手构建完善的测试用例,特别是模拟各种网络异常情况。