Python文件操作全面指南：从基础到高级实践-代码聚汇网

1. Python文件操作基础入门

作为一名Python开发者，文件操作是我们日常工作中最基础也最重要的技能之一。无论是处理日志文件、读取配置文件，还是进行数据持久化存储，都离不开文件操作。Python提供了非常完善的文件操作API，让我们能够轻松实现各种文件处理需求。

1.1 文件操作的基本流程

文件操作通常遵循"打开-操作-关闭"的标准流程。在Python中，我们使用内置的open()函数来打开文件，它会返回一个文件对象，通过这个对象我们可以进行各种读写操作。

文件操作的基本步骤包括：

使用open()函数打开文件，获取文件对象
调用文件对象的读写方法进行操作
调用close()方法关闭文件，释放系统资源

重要提示：文件操作完成后必须显式关闭文件，否则可能会导致资源泄露或数据丢失。虽然Python有垃圾回收机制，但不能依赖它来及时关闭文件。

1.2 open()函数详解

open()函数是Python文件操作的入口，它的完整语法如下：

python复制open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

对于初学者来说，最常用的参数是前三个：

file：文件路径（字符串类型）
mode：文件打开模式（字符串类型）
encoding：文件编码格式（字符串类型）

一个典型的文件打开操作如下：

python复制f = open('example.txt', 'r', encoding='utf-8')

2. 文件打开模式深度解析

2.1 基本打开模式

Python提供了多种文件打开模式，通过mode参数指定。最常用的模式包括：

模式	描述	文件不存在时行为
'r'	只读模式（默认）	抛出FileNotFoundError
'w'	写入模式	创建新文件
'a'	追加模式	创建新文件
'x'	独占创建模式	抛出FileExistsError

每种模式都有其特定的使用场景：

'r'模式适用于只需要读取文件内容的场景
'w'模式适用于需要清空文件重新写入的场景
'a'模式适用于需要在文件末尾追加内容的场景
'x'模式适用于需要确保文件不存在的场景

2.2 组合模式与二进制模式

除了基本模式外，还可以通过组合字符实现更复杂的操作：

模式	描述
'r+'	读写模式（指针在开头）
'w+'	读写模式（先清空文件）
'a+'	读写模式（指针在末尾）

对于二进制文件操作，需要在模式字符串后添加'b'：

'rb'：二进制只读
'wb'：二进制写入
'ab'：二进制追加

二进制模式常用于处理图片、音频、视频等非文本文件。

2.3 编码问题与最佳实践

在文本文件操作中，编码问题是最常见的坑之一。建议始终明确指定encoding参数，而不是依赖系统默认编码。

推荐做法：

python复制# 明确指定UTF-8编码
with open('file.txt', 'r', encoding='utf-8') as f:
    content = f.read()

避免的做法：

python复制# 依赖系统默认编码，可能导致跨平台问题
f = open('file.txt', 'r')

3. 文件读写操作实战

3.1 文件读取方法

Python提供了多种文件读取方法，适用于不同场景：

read()：一次性读取全部内容

python复制with open('file.txt', 'r') as f:
    content = f.read()  # 返回整个文件内容的字符串

read(size)：读取指定字节数

python复制with open('file.txt', 'r') as f:
    chunk = f.read(1024)  # 读取1024字节

readline()：逐行读取

python复制with open('file.txt', 'r') as f:
    line = f.readline()  # 读取一行

readlines()：读取所有行（返回列表）

python复制with open('file.txt', 'r') as f:
    lines = f.readlines()  # 返回包含所有行的列表

3.2 高效读取大文件

对于大文件，一次性读取可能会消耗过多内存。推荐使用迭代方式逐行处理：

python复制with open('large_file.txt', 'r') as f:
    for line in f:  # 文件对象本身是可迭代的
        process(line)  # 处理每一行

这种方法内存友好，因为它不会一次性加载整个文件到内存中。

3.3 文件写入操作

文件写入主要通过write()和writelines()方法实现：

write()：写入字符串

python复制with open('output.txt', 'w') as f:
    f.write('Hello, World!\n')  # 写入一行

writelines()：写入字符串列表

python复制lines = ['第一行\n', '第二行\n', '第三行\n']
with open('output.txt', 'w') as f:
    f.writelines(lines)  # 写入多行

注意：write()方法不会自动添加换行符，需要手动添加'\n'。

4. 文件指针与随机访问

4.1 文件指针基础

文件指针表示当前在文件中的位置，读写操作都会影响指针位置。理解指针行为对文件操作至关重要。

打开文件时，指针初始位置取决于模式：
- 'r'/'r+'：文件开头
- 'a'/'a+'：文件末尾
- 'w'/'w+'：文件开头（但会先清空文件）
每次读写操作后，指针会移动到操作结束的位置

4.2 seek()和tell()方法

tell()：获取当前指针位置

python复制with open('file.txt', 'r') as f:
    print(f.tell())  # 输出当前指针位置
    f.read(10)
    print(f.tell())  # 输出移动后的指针位置

seek(offset, whence)：移动指针

offset：偏移量（字节数）
whence：基准位置（0=文件头，1=当前位置，2=文件尾）

python复制with open('file.txt', 'rb') as f:
    f.seek(10)  # 移动到第10字节处
    f.seek(-5, 2)  # 移动到文件末尾前5字节处

注意：在文本模式下，seek()的offset参数只能是f.tell()返回的值或0，其他值可能导致未定义行为。

4.3 指针操作实战案例

案例：读取文件的最后N行

python复制def tail(filename, n=10):
    """返回文件的最后n行"""
    with open(filename, 'rb') as f:
        # 移动到文件末尾
        f.seek(0, 2)
        end = f.tell()
        
        # 从后向前查找换行符
        lines = []
        pos = end
        while len(lines) <= n and pos > 0:
            f.seek(max(pos - 1024, 0), 0)
            chunk = f.read(min(pos, 1024))
            lines.extend(chunk.splitlines(True))
            pos -= 1024
        
        return [line.decode('utf-8') for line in lines[-n:]]

5. 上下文管理器与资源管理

5.1 with语句的优势

传统的文件操作需要手动关闭文件，容易忘记或出错：

python复制f = open('file.txt', 'r')
try:
    content = f.read()
finally:
    f.close()  # 必须确保文件被关闭

使用with语句可以自动管理资源：

python复制with open('file.txt', 'r') as f:
    content = f.read()
# 离开with块后文件自动关闭

with语句的优点：

代码更简洁
确保文件一定会被关闭
即使发生异常也能正确关闭文件

5.2 上下文管理器原理

with语句背后的机制是上下文管理器协议，任何实现了__enter__()和__exit__()方法的对象都可以作为上下文管理器。

文件对象已经实现了这两个方法：

enter()：返回文件对象本身
exit()：关闭文件

5.3 同时管理多个资源

with语句可以同时管理多个资源：

python复制with open('input.txt', 'r') as fin, open('output.txt', 'w') as fout:
    for line in fin:
        fout.write(line.upper())

这种写法既简洁又安全，确保两个文件都会被正确关闭。

6. 文件系统操作与os模块

6.1 常用文件操作

Python的os模块提供了丰富的文件系统操作功能：

文件重命名

python复制import os
os.rename('old.txt', 'new.txt')

删除文件

python复制os.remove('file.txt')

检查文件存在

python复制if os.path.exists('file.txt'):
    print("文件存在")

6.2 目录操作

python复制os.mkdir('new_dir')  # 创建单级目录
os.makedirs('path/to/new_dir')  # 创建多级目录

python复制os.rmdir('empty_dir')  # 删除空目录

python复制files = os.listdir('.')  # 列出当前目录内容

6.3 路径操作

os.path模块提供了跨平台的路径操作函数：

路径拼接

python复制full_path = os.path.join('dir', 'subdir', 'file.txt')

获取绝对路径

python复制abs_path = os.path.abspath('file.txt')

路径分解

python复制dirname = os.path.dirname('/path/to/file.txt')  # '/path/to'
basename = os.path.basename('/path/to/file.txt')  # 'file.txt'

7. 高级文件操作技巧

7.1 临时文件处理

Python的tempfile模块可以方便地创建临时文件：

python复制import tempfile

# 创建临时文件
with tempfile.NamedTemporaryFile(delete=False) as tmp:
    tmp.write(b'Hello, World!')
    tmp_path = tmp.name  # 获取临时文件路径

# 临时文件会在with块结束后自动删除（除非delete=False）

7.2 内存映射文件

对于大文件，可以使用mmap模块进行内存映射，提高访问效率：

python复制import mmap

with open('large_file.bin', 'r+b') as f:
    # 创建内存映射
    mm = mmap.mmap(f.fileno(), 0)
    
    # 像操作内存一样操作文件
    print(mm[:10])  # 读取前10字节
    
    mm.close()

7.3 文件锁

在多进程/多线程环境中，可能需要文件锁来协调访问：

python复制import fcntl

with open('shared_file.txt', 'a') as f:
    # 获取排他锁
    fcntl.flock(f, fcntl.LOCK_EX)
    
    # 执行需要互斥的操作
    f.write('Exclusive access\n')
    
    # 释放锁
    fcntl.flock(f, fcntl.LOCK_UN)

8. 常见问题与解决方案

8.1 编码问题处理

处理不同编码的文件

python复制# 尝试UTF-8，失败后尝试其他编码
encodings = ['utf-8', 'gbk', 'iso-8859-1']
for enc in encodings:
    try:
        with open('file.txt', 'r', encoding=enc) as f:
            content = f.read()
        break
    except UnicodeDecodeError:
        continue
else:
    raise ValueError("无法解码文件")

忽略解码错误

python复制with open('file.txt', 'r', encoding='utf-8', errors='ignore') as f:
    content = f.read()  # 忽略无法解码的字符

8.2 大文件处理优化

分块读取大文件

python复制CHUNK_SIZE = 1024 * 1024  # 1MB

with open('large_file.bin', 'rb') as f:
    while True:
        chunk = f.read(CHUNK_SIZE)
        if not chunk:
            break
        process_chunk(chunk)

使用生成器处理文件行

python复制def read_lines(filename):
    with open(filename, 'r') as f:
        for line in f:
            yield line.strip()

# 使用生成器逐行处理
for line in read_lines('large_file.txt'):
    process(line)

8.3 跨平台路径处理

使用pathlib模块（Python 3.4+）

python复制from pathlib import Path

# 创建Path对象
p = Path('dir/file.txt')

# 读取内容
content = p.read_text(encoding='utf-8')

# 写入内容
p.write_text('Hello, World!', encoding='utf-8')

路径操作

python复制p = Path('dir/subdir')
new_p = p / 'file.txt'  # 路径拼接

if new_p.exists():
    print(f"文件大小: {new_p.stat().st_size}字节")

9. 性能优化与最佳实践

9.1 缓冲策略选择

open()函数的buffering参数控制缓冲策略：

0：无缓冲（二进制模式）
1：行缓冲（文本模式）
1：指定缓冲区大小（字节）
-1：使用系统默认缓冲

优化建议：

python复制# 大文件读取使用大缓冲区
with open('large_file.bin', 'rb', buffering=1024*1024) as f:
    data = f.read()

9.2 批量操作优化

批量写入减少I/O操作

python复制# 不推荐：多次小量写入
with open('file.txt', 'w') as f:
    for item in data:
        f.write(str(item) + '\n')

# 推荐：批量写入
with open('file.txt', 'w') as f:
    f.writelines(f"{item}\n" for item in data)

使用内存缓冲

python复制from io import StringIO

buffer = StringIO()
buffer.write('Hello, ')
buffer.write('World!')

with open('file.txt', 'w') as f:
    f.write(buffer.getvalue())

9.3 文件操作最佳实践

总是使用with语句管理文件资源
明确指定文件编码（特别是文本文件）
处理大文件时使用迭代方式而非一次性读取
使用os.path或pathlib进行路径操作，确保跨平台兼容性
对关键文件操作添加异常处理
定期备份重要文件

10. 实战项目：日志文件分析器

10.1 项目需求

开发一个日志分析工具，能够：

读取日志文件
统计错误/警告数量
提取特定时间段的日志
生成摘要报告

10.2 实现代码

python复制import re
from collections import defaultdict
from datetime import datetime

def analyze_log_file(log_path):
    """分析日志文件并生成报告"""
    error_pattern = re.compile(r'ERROR|WARN', re.IGNORECASE)
    date_pattern = re.compile(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}')
    
    stats = defaultdict(int)
    error_lines = []
    
    with open(log_path, 'r', encoding='utf-8') as f:
        for line in f:
            # 统计错误/警告
            if error_pattern.search(line):
                stats['errors'] += 1
                error_lines.append(line.strip())
            
            # 提取日期
            match = date_pattern.search(line)
            if match:
                date_str = match.group()
                try:
                    date = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
                    stats['last_date'] = max(stats.get('last_date', date), date)
                    stats['first_date'] = min(stats.get('first_date', date), date)
                except ValueError:
                    pass
    
    # 生成报告
    report = {
        'total_errors': stats['errors'],
        'time_span': (stats['first_date'], stats['last_date']),
        'sample_errors': error_lines[:5]  # 前5个错误
    }
    
    return report

# 使用示例
report = analyze_log_file('app.log')
print(f"总错误数: {report['total_errors']}")
print(f"时间范围: {report['time_span'][0]} 到 {report['time_span'][1]}")
print("示例错误:")
for error in report['sample_errors']:
    print(f"- {error}")

10.3 项目扩展

添加命令行参数解析
支持多个日志文件批量处理
实现日志文件轮转处理
添加可视化报告生成功能

11. 文件操作安全注意事项

11.1 输入验证

处理用户提供的文件路径时，必须进行验证：

python复制import os

def safe_open_file(path):
    """安全地打开文件"""
    # 验证路径是否在允许的目录下
    base_dir = '/var/data'
    abs_path = os.path.abspath(path)
    if not abs_path.startswith(base_dir):
        raise ValueError("非法文件路径")
    
    # 验证文件是否存在
    if not os.path.exists(abs_path):
        raise FileNotFoundError(f"文件不存在: {abs_path}")
    
    return open(abs_path, 'r')

11.2 竞争条件防范

文件操作中常见的竞争条件问题：

检查后使用(TOCTOU)问题：

python复制# 不安全的写法
if os.path.exists('file.txt'):
    # 在这段时间内文件可能被删除或修改
    with open('file.txt', 'r') as f:
        content = f.read()

# 安全的写法
try:
    with open('file.txt', 'r') as f:
        content = f.read()
except FileNotFoundError:
    handle_missing_file()

临时文件安全创建：

python复制import tempfile
import os

# 安全创建临时文件
fd, path = tempfile.mkstemp()
try:
    with os.fdopen(fd, 'w') as tmp:
        tmp.write('敏感数据')
    # 处理文件
finally:
    os.unlink(path)  # 确保文件被删除

11.3 权限管理

检查文件权限：

python复制import os
import stat

file_stat = os.stat('file.txt')
if file_stat.st_mode & stat.S_IROTH:
    print("文件对其他人可读")

设置安全权限：

python复制# 创建仅对所有者可读写的文件
with open('secure.txt', 'w') as f:
    f.write('敏感数据')
os.chmod('secure.txt', 0o600)  # -rw-------

12. 现代文件操作实践（Python 3.10+）

12.1 pathlib的高级用法

pathlib模块提供了更面向对象的路径操作方式：

python复制from pathlib import Path

# 创建目录（如果不存在）
data_dir = Path('data')
data_dir.mkdir(exist_ok=True)

# 创建文件并写入内容
log_file = data_dir / 'app.log'
log_file.write_text('日志内容...', encoding='utf-8')

# 递归查找文件
for py_file in Path('.').rglob('*.py'):
    print(f"找到Python文件: {py_file}")

12.2 使用walrus运算符简化代码

Python 3.8+的walrus运算符(:=)可以简化一些文件操作：

python复制# 传统写法
with open('file.txt', 'r') as f:
    while (line := f.readline()):
        process(line)

# 查找包含特定内容的行
with open('log.txt', 'r') as f:
    matches = [line for line in f if 'ERROR' in (line := line.strip())]

12.3 结构化模式匹配处理文件

Python 3.10+的模式匹配可以优雅地处理不同文件类型：

python复制from pathlib import Path

def handle_file(file_path):
    match Path(file_path).suffix.lower():
        case '.txt':
            process_text_file(file_path)
        case '.csv':
            process_csv_file(file_path)
        case '.json':
            process_json_file(file_path)
        case _:
            raise ValueError(f"不支持的文件类型: {file_path}")

13. 文件操作性能基准测试

13.1 不同读取方式性能比较

python复制import timeit

def test_read():
    with open('large_file.txt', 'r') as f:
        content = f.read()

def test_readlines():
    with open('large_file.txt', 'r') as f:
        lines = f.readlines()

def test_iter():
    with open('large_file.txt', 'r') as f:
        for line in f:
            pass

# 性能测试
print("read():", timeit.timeit(test_read, number=10))
print("readlines():", timeit.timeit(test_readlines, number=10))
print("迭代:", timeit.timeit(test_iter, number=10))

13.2 缓冲策略影响

python复制def test_no_buffering():
    with open('file.bin', 'rb', buffering=0) as f:
        while f.read(1024):
            pass

def test_buffered():
    with open('file.bin', 'rb') as f:
        while f.read(1024):
            pass

print("无缓冲:", timeit.timeit(test_no_buffering, number=10))
print("有缓冲:", timeit.timeit(test_buffered, number=10))

13.3 优化建议

对于顺序读取大文件，使用迭代方式最节省内存
随机访问或小文件，一次性读取可能更高效
适当增加缓冲区大小可以提高I/O性能
二进制模式通常比文本模式更快（省去编码解码）

14. 文件操作在数据处理中的应用

14.1 CSV文件处理

使用csv模块读写CSV文件

python复制import csv

# 读取CSV
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['value'])

# 写入CSV
with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Age', 'City'])
    writer.writerows(data)

使用pandas处理大型CSV

python复制import pandas as pd

# 分块读取大CSV
chunk_size = 10000
for chunk in pd.read_csv('large.csv', chunksize=chunk_size):
    process(chunk)

14.2 JSON文件处理

读写JSON文件

python复制import json

# 读取JSON
with open('data.json', 'r') as f:
    data = json.load(f)

# 写入JSON
with open('output.json', 'w') as f:
    json.dump(data, f, indent=2)

流式处理大JSON文件

python复制import ijson

with open('large.json', 'rb') as f:
    # 流式解析JSON数组
    for item in ijson.items(f, 'item'):
        process(item)

14.3 二进制数据序列化

使用pickle序列化Python对象

python复制import pickle

data = {'a': [1, 2, 3], 'b': 'string'}

# 序列化到文件
with open('data.pkl', 'wb') as f:
    pickle.dump(data, f)

# 从文件反序列化
with open('data.pkl', 'rb') as f:
    loaded = pickle.load(f)

更高效的替代方案

python复制import pickle
import gzip

# 压缩存储
with gzip.open('data.pkl.gz', 'wb') as f:
    pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

15. 文件操作常见陷阱与解决方案

15.1 文件句柄泄露

问题表现：

程序打开文件后未关闭
随着程序运行，打开的文件越来越多
最终导致"Too many open files"错误

解决方案：

总是使用with语句
确保所有代码路径都能关闭文件
使用资源跟踪工具检测泄露

15.2 编码不一致

问题表现：

文件读取时出现UnicodeDecodeError
写入的文件在其他系统显示乱码

解决方案：

明确指定文件编码
处理文件时统一使用UTF-8编码
添加编码检测逻辑

python复制import chardet

def detect_encoding(file_path):
    with open(file_path, 'rb') as f:
        raw = f.read(1024)
        return chardet.detect(raw)['encoding']

15.3 跨平台路径问题

问题表现：

在Windows开发的代码在Linux上无法运行
路径分隔符不一致（\ vs /）

解决方案：

使用os.path或pathlib进行路径操作
避免硬编码路径分隔符
测试跨平台兼容性

python复制from pathlib import Path

# 跨平台安全路径
config_path = Path('config') / 'app.ini'

15.4 文件权限问题

问题表现：

无法读取/写入文件
权限被拒绝错误

解决方案：

检查并设置合适的文件权限
程序运行时使用适当的用户权限
处理权限错误异常

python复制try:
    with open('/etc/config', 'r') as f:
        content = f.read()
except PermissionError:
    print("没有权限读取该文件")

16. 文件操作调试技巧

16.1 文件操作日志记录

添加详细的日志记录有助于调试文件操作问题：

python复制import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('file_ops')

def safe_read_file(path):
    logger.debug(f"尝试打开文件: {path}")
    try:
        with open(path, 'r') as f:
            content = f.read()
        logger.debug(f"成功读取 {len(content)} 字节")
        return content
    except Exception as e:
        logger.error(f"文件读取失败: {e}")
        raise

16.2 文件描述符跟踪

在Unix-like系统上，可以跟踪程序打开的文件描述符：

bash复制# 查看进程打开的文件
lsof -p <pid>

# 跟踪系统调用
strace -e trace=open,close,read,write python script.py

16.3 使用文件操作包装器

创建调试用的文件操作包装器：

python复制class DebugFile:
    def __init__(self, file_obj):
        self.file = file_obj
    
    def __getattr__(self, name):
        return getattr(self.file, name)
    
    def read(self, size=-1):
        print(f"读取 {size} 字节")
        return self.file.read(size)
    
    def write(self, data):
        print(f"写入 {len(data)} 字节")
        return self.file.write(data)
    
    def close(self):
        print("关闭文件")
        return self.file.close()

# 使用示例
with DebugFile(open('debug.txt', 'w')) as f:
    f.write('调试信息')

17. 文件操作在Web开发中的应用

17.1 文件上传处理

使用web框架处理文件上传：

python复制from flask import Flask, request

app = Flask(__name__)

@app.route('/upload', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return '没有文件部分', 400
    
    file = request.files['file']
    if file.filename == '':
        return '未选择文件', 400
    
    # 安全保存文件
    upload_path = Path('uploads') / secure_filename(file.filename)
    upload_path.parent.mkdir(exist_ok=True)
    file.save(upload_path)
    
    return '文件上传成功', 200

17.2 静态文件服务

高效提供静态文件服务：

python复制from flask import send_from_directory

@app.route('/static/<path:filename>')
def serve_static(filename):
    # 安全检查
    safe_path = Path('static') / filename
    if not safe_path.resolve().parent.samefile('static'):
        return '非法路径', 403
    
    return send_from_directory('static', filename)

17.3 日志文件轮转

实现日志文件自动轮转：

python复制import logging
from logging.handlers import RotatingFileHandler

# 设置日志轮转（每个文件10MB，保留5个备份）
handler = RotatingFileHandler(
    'app.log', maxBytes=10*1024*1024, backupCount=5
)
handler.setFormatter(logging.Formatter(
    '%(asctime)s - %(levelname)s - %(message)s'
))

logger = logging.getLogger()
logger.addHandler(handler)
logger.setLevel(logging.INFO)

18. 文件操作与并发编程

18.1 多线程文件操作

线程安全的文件写入：

python复制from threading import Lock

write_lock = Lock()

def thread_safe_write(filename, content):
    with write_lock:
        with open(filename, 'a') as f:
            f.write(content + '\n')

18.2 多进程文件操作

使用队列协调多进程文件写入：

python复制from multiprocessing import Process, Queue

def writer_process(filename, queue):
    with open(filename, 'a') as f:
        while True:
            line = queue.get()
            if line is None:  # 终止信号
                break
            f.write(line + '\n')

def main():
    queue = Queue()
    writer = Process(target=writer_process, args=('output.txt', queue))
    writer.start()
    
    # 多个生产者进程...
    queue.put("数据1")
    queue.put("数据2")
    
    # 结束写入
    queue.put(None)
    writer.join()

18.3 异步文件IO

使用aiofiles进行异步文件操作：

python复制import aiofiles
import asyncio

async def async_write():
    async with aiofiles.open('async.txt', 'w') as f:
        await f.write('异步写入内容')

async def async_read():
    async with aiofiles.open('async.txt', 'r') as f:
        content = await f.read()
        print(content)

asyncio.run(async_write())
asyncio.run(async_read())

19. 文件操作与数据库集成

19.1 从文件批量导入数据库

高效导入CSV到数据库：

python复制import csv
import sqlite3

def import_csv_to_db(csv_path, db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    # 创建表
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS data (
            id INTEGER PRIMARY KEY,
            name TEXT,
            value REAL
        )
    ''')
    
    # 批量导入
    with open(csv_path, 'r') as f:
        reader = csv.DictReader(f)
        to_insert = [(row['name'], float(row['value'])) for row in reader]
        
        cursor.executemany(
            'INSERT INTO data (name, value) VALUES (?, ?)',
            to_insert
        )
    
    conn.commit()
    conn.close()

19.2 数据库导出到文件

将查询结果导出为CSV：

python复制def export_db_to_csv(db_path, table_name, output_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    # 获取数据
    cursor.execute(f'SELECT * FROM {table_name}')
    rows = cursor.fetchall()
    columns = [desc[0] for desc in cursor.description]
    
    # 写入CSV
    with open(output_path, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(columns)
        writer.writerows(rows)
    
    conn.close()

19.3 大文件分块处理与数据库

处理大文件并存储到数据库：

python复制def process_large_file(file_path, db_path, batch_size=1000):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute('CREATE TABLE IF NOT EXISTS records (data TEXT)')
    
    batch = []
    with open(file_path, 'r') as f:
        for line in f:
            processed = process_line(line.strip())
            batch.append((processed,))
            
            if len(batch) >= batch_size:
                cursor.executemany(
                    'INSERT INTO records VALUES (?)',
                    batch
                )
                conn.commit()
                batch = []
    
    # 插入剩余记录
    if batch:
        cursor.executemany('INSERT INTO records VALUES (?)', batch)
        conn.commit()
    
    conn.close()

20. 文件操作未来发展趋势

20.1 云存储集成

现代Python应用越来越多地集成云存储：

python复制from google.cloud import storage

def upload_to_gcs(bucket_name, source_path, destination_name):
    client = storage.Client()
    bucket = client.bucket(bucket_name)
    blob = bucket.blob(destination_name)
    
    blob.upload_from_filename(source_path)
    print(f"文件上传到 gs://{bucket_name}/{destination_name}")

20.2 内存文件系统

使用内存文件系统加速临时文件操作：

python复制from io import StringIO, BytesIO

# 文本内存文件
text_buffer = StringIO()
text_buffer.write('内存中的文本内容')
print(text_buffer.getvalue())

# 二进制内存文件
binary_buffer = BytesIO()
binary_buffer.write(b'\x00\x01\x02')
print(binary_buffer.getvalue())

20.3 异步文件系统访问

Python正在增强对异步文件系统的支持：

python复制import anyio

async def async_file_ops():
    # 使用anyio进行异步文件操作
    async with await anyio.open_file('async.txt', 'w') as f:
        await f.write('异步写入内容')
    
    async with await anyio.open_file('async.txt', 'r') as f:
        content = await f.read()
        print(content)

20.4 文件操作工具推荐

高级文件操作库：

shutil：高级文件操作（复制、移动、删除目录树等）
glob：文件模式匹配
fnmatch：Unix风格文件名模式匹配

文件监控：

watchdog：监控文件系统事件
inotify（Linux）：内核级文件事件通知

压缩文件处理：

gzip/zlib/bz2/lzma：标准库压缩模块
zipfile：ZIP文件处理
tarfile：TAR文件处理

掌握Python文件操作是每个Python开发者的基本功。从基础的文件读写到高级的文件系统操作，再到与各种技术的集成，文件操作贯穿了整个