深度学习作为机器学习的一个分支,近年来在计算机视觉、自然语言处理等领域取得了突破性进展。要理解深度学习的本质,我们需要从最基础的数据结构——张量(Tensor)开始。
张量是深度学习框架中的核心数据结构,可以简单理解为多维数组。在PyTorch中,张量不仅存储数据,还支持自动微分等深度学习必需的操作。
张量根据维度可以分为几类:
python复制import torch
# 创建不同维度的张量
scalar = torch.tensor(3.14) # 0维
vector = torch.arange(5) # 1维
matrix = torch.ones((2,3)) # 2维
tensor_3d = torch.rand((2,3,4)) # 3维
PyTorch提供了多种创建张量的方式:
python复制# 从Python列表创建
data = [[1,2],[3,4]]
x = torch.tensor(data)
# 创建特定形状的张量
zeros = torch.zeros((2,3)) # 全0张量
ones = torch.ones((2,3)) # 全1张量
rand = torch.rand((2,3)) # 均匀分布随机数
randn = torch.randn((2,3)) # 标准正态分布随机数
# 类似NumPy的创建方式
arange = torch.arange(0,10,2) # 0到10(不含),步长2
linspace = torch.linspace(0,1,5) # 0到1,均匀分成5份
改变张量形状是深度学习中的常见操作:
python复制x = torch.arange(12)
print(x.shape) # 输出: torch.Size([12])
# 改变形状
y = x.reshape(3,4) # 变为3行4列
print(y.shape) # 输出: torch.Size([3,4])
# 自动推断维度
z = x.reshape(-1,3) # -1表示自动计算该维度大小
print(z.shape) # 输出: torch.Size([4,3])
注意事项:
- reshape操作不会改变原始张量,而是返回一个新的视图
- 改变形状时元素总数必须保持不变
- 对于连续内存的张量,reshape操作几乎不消耗额外内存
张量支持类似NumPy的索引和切片操作:
python复制x = torch.arange(12).reshape(3,4)
# 基本索引
print(x[1]) # 第2行
print(x[:,2]) # 第3列
print(x[1,2]) # 第2行第3列的元素
# 切片操作
print(x[0:2]) # 第1到2行
print(x[::2]) # 每隔一行取一次
print(x[:,1:3]) # 所有行的第2到3列
# 高级索引
indices = torch.tensor([0,2])
print(x[indices]) # 获取第1和第3行
PyTorch支持各种数学运算:
python复制x = torch.tensor([1.0,2,4,8])
y = torch.tensor([2,2,2,2])
# 逐元素运算
print(x + y) # 加法
print(x - y) # 减法
print(x * y) # 乘法
print(x / y) # 除法
print(x ** y) # 幂运算
# 矩阵乘法
A = torch.rand(2,3)
B = torch.rand(3,4)
print(torch.mm(A,B)) # 矩阵乘法
广播机制允许不同形状的张量进行运算:
python复制a = torch.arange(3).reshape((3,1))
b = torch.arange(2).reshape((1,2))
print(a + b) # 自动扩展为3x2矩阵
广播规则:
对张量进行统计计算:
python复制x = torch.arange(12).reshape(3,4).float()
print(x.sum()) # 所有元素和
print(x.mean()) # 所有元素均值
print(x.max()) # 最大值
print(x.min()) # 最小值
print(x.std()) # 标准差
# 指定维度计算
print(x.sum(dim=0)) # 沿列方向求和(结果形状为[4])
print(x.mean(dim=1)) # 沿行方向求均值(结果形状为[3])
在实际深度学习项目中,数据预处理往往占据大部分时间。良好的数据预处理能显著提升模型性能。
python复制import os
import pandas as pd
# 创建目录和文件
os.makedirs(os.path.join('.','data'), exist_ok=True)
data_file = os.path.join('.','data','house_tiny.csv')
# 写入数据
with open(data_file, 'w') as f:
f.write('NumRooms,Alley,Price\n') # 列名
f.write('NA,Pave,127500\n') # 样本数据
f.write('2,NA,106000\n')
f.write('4,NA,178100\n')
f.write('NA,NA,140000\n')
python复制data = pd.read_csv(data_file)
print(data)
输出:
code复制 NumRooms Alley Price
0 NaN Pave 127500
1 2.0 NaN 106000
2 4.0 NaN 178100
3 NaN NaN 140000
对于数值型缺失值,常用方法是均值填充:
python复制inputs = data.iloc[:,0:2]
outputs = data.iloc[:,2]
# 计算均值并填充
inputs['NumRooms'] = inputs['NumRooms'].fillna(inputs['NumRooms'].mean())
print(inputs)
输出:
code复制 NumRooms Alley
0 3.0 Pave
1 2.0 NaN
2 4.0 NaN
3 3.0 NaN
对于类别型特征,可以将NaN视为一个独立类别:
python复制inputs['Alley'] = inputs['Alley'].fillna('Unknown')
print(inputs)
或者使用独热编码:
python复制inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)
输出:
code复制 NumRooms Alley_Pave Alley_nan
0 3.0 1 0
1 2.0 0 1
2 4.0 0 1
3 3.0 0 1
最后将处理好的数据转换为PyTorch张量:
python复制X = torch.tensor(inputs.values)
y = torch.tensor(outputs.values)
print(X)
print(y)
输出:
code复制tensor([[3., 1., 0.],
[2., 0., 1.],
[4., 0., 1.],
[3., 0., 1.]], dtype=torch.float64)
tensor([127500, 106000, 178100, 140000])
深度学习已经在多个领域取得了显著成果,下面介绍几个典型应用。
ImageNet数据集是图像分类领域的基准数据集,包含1000类约100万张图片。深度学习模型在此数据集上的错误率已低于人类水平。
python复制# 使用预训练模型进行图像分类的示例代码
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
# 加载预训练模型
model = models.resnet50(pretrained=True)
model.eval()
# 图像预处理
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
# 加载并预处理图像
img = Image.open("image.jpg")
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)
# 预测
out = model(batch_t)
_, index = torch.max(out, 1)
目标检测不仅要识别图像中的物体,还要定位它们的位置。分割则更进一步,为每个像素分配类别标签。
python复制# 使用Mask R-CNN进行实例分割的示例
model = models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
# 预测
predictions = model([img_t])
现代语言模型可以生成高质量的文本内容:
python复制from transformers import GPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
input_text = "深度学习是"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
# 生成文本
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
神经机器翻译系统已经达到接近人类水平的质量:
python复制from transformers import MarianMTModel, MarianTokenizer
src_text = "Hello, how are you?"
model_name = 'Helsinki-NLP/opus-mt-en-zh'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
print(tokenizer.decode(translated[0], skip_special_tokens=True))
GAN和扩散模型可以生成逼真的图像:
python复制from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
image = pipe("a photograph of an astronaut riding a horse").images[0]
image.save("astronaut_rides_horse.png")
文本到语音系统可以生成自然的人声:
python复制from transformers import VitsModel, VitsTokenizer
model = VitsModel.from_pretrained("facebook/mms-tts-eng")
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng")
text = "Hello, this is a text to speech example."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
output = model(**inputs).waveform
PyTorch的自动微分系统(autograd)是训练神经网络的核心:
python复制x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x + 1
y.backward() # 计算导数
print(x.grad) # 输出: 7.0 (2*2 + 3)
利用GPU可以大幅加速深度学习计算:
python复制device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 将模型和数据移动到GPU
model = models.resnet18().to(device)
inputs = torch.randn(1,3,224,224).to(device)
# 在GPU上执行计算
output = model(inputs)
使用PyTorch的nn.Module可以方便地定义自己的网络:
python复制import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # 展平除batch外的所有维度
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
保存和加载模型权重:
python复制# 保存
torch.save(net.state_dict(), 'model.pth')
# 加载
model = Net() # 必须先定义相同的网络结构
model.load_state_dict(torch.load('model.pth'))
model.eval()
在训练过程中实时增强数据可以提高模型泛化能力:
python复制from torchvision import transforms
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
动态调整学习率可以改善训练效果:
python复制from torch.optim import lr_scheduler
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100):
train(...)
validate(...)
scheduler.step()
防止模型过拟合:
python复制best_loss = float('inf')
patience = 5
counter = 0
for epoch in range(100):
train_loss = train(...)
val_loss = validate(...)
if val_loss < best_loss:
best_loss = val_loss
counter = 0
torch.save(model.state_dict(), 'best_model.pth')
else:
counter += 1
if counter >= patience:
print("Early stopping")
break
使用FP16可以加速训练并减少内存占用:
python复制from torch.cuda.amp import GradScaler, autocast
scaler = GradScaler()
for epoch in range(100):
for inputs, labels in train_loader:
optimizer.zero_grad()
with autocast():
outputs = model(inputs)
loss = criterion(outputs, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
python复制from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
for epoch in range(100):
train_loss = train(...)
writer.add_scalar('Loss/train', train_loss, epoch)
if epoch % 10 == 0:
# 可视化模型结构
writer.add_graph(model, torch.randn(1,3,224,224))
简化训练流程的高级框架:
python复制import pytorch_lightning as pl
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.model = Net()
def forward(self, x):
return self.model(x)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
trainer = pl.Trainer(max_epochs=10)
model = LitModel()
trainer.fit(model, train_loader)
将模型导出为通用格式:
python复制dummy_input = torch.randn(1,3,224,224)
torch.onnx.export(model, dummy_input, "model.onnx",
input_names=["input"], output_names=["output"],
dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}})
正确的初始化对训练至关重要:
python复制def init_weights(m):
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
model.apply(init_weights)
加速训练并提高模型稳定性:
python复制class NetWithBN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.bn1 = nn.BatchNorm2d(6)
self.conv2 = nn.Conv2d(6, 16, 5)
self.bn2 = nn.BatchNorm2d(16)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.max_pool2d(x, 2)
x = F.relu(self.bn2(self.conv2(x)))
x = F.max_pool2d(x, 2)
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
构建更深的网络:
python复制class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1,
stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)
out = F.relu(out)
return out
将模型转换为可独立运行的脚本:
python复制scripted_model = torch.jit.script(model)
scripted_model.save('model_scripted.pt')
# 加载时不需要原始类定义
loaded_model = torch.jit.load('model_scripted.pt')
使用ONNX Runtime进行高效推理:
python复制import onnxruntime as ort
ort_session = ort.InferenceSession("model.onnx")
inputs = {ort_session.get_inputs()[0].name: input_array}
outputs = ort_session.run(None, inputs)
使用TensorRT优化模型:
python复制# 转换模型
trt_model = torch2trt(model, [dummy_input])
# 保存和加载
torch.save(trt_model.state_dict(), 'model_trt.pth')
trt_model.load_state_dict(torch.load('model_trt.pth'))
随着模型规模不断增大,分布式训练技术变得越来越重要:
python复制# 使用PyTorch的分布式数据并行
model = nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
减少对标注数据的依赖:
python复制# 对比学习示例
positive_pair = augment(image) # 同一图像的两个增强版本
negative_pair = augment(other_image)
features = model(torch.cat([positive_pair, negative_pair]))
# 计算对比损失...
自动化模型设计:
python复制from torchvision.models import efficientnet
# 使用预定义的EfficientNet变体
model = efficientnet.efficientnet_b0(pretrained=True)
在移动设备上部署模型:
python复制# 量化模型减小尺寸
quantized_model = torch.quantization.quantize_dynamic(
model, {nn.Linear}, dtype=torch.qint8)