数据分析师常常面临一个尴尬场景:精心处理的数据结果,在交付给非技术同事时变成一堆难以理解的数字。Pandas的to_html和to_excel方法正是解决这个痛点的利器——它们能让DataFrame自动变身可视化网页和交互式报表,让数据"说人话"。下面我们通过一个电商用户行为分析的完整案例,演示如何用5行核心代码完成专业级数据交付。
假设我们有一份经过清洗的电商用户行为数据,包含用户ID、访问日期、停留时长和转化率等关键指标。原始DataFrame虽然包含所有信息,但对运营团队来说可读性较差:
python复制import pandas as pd
df = pd.read_csv('user_behavior.csv')
print(df.head(3))
code复制 user_id date duration conversion
0 10001 2023-05-01 327 0.12
1 10002 2023-05-01 455 0.18
2 10003 2023-05-02 112 0.05
只需一行代码就能生成可分享的网页版数据:
python复制df.to_html('report_basic.html',
float_format='{:.1%}'.format, # 转化率显示为百分比
index=False) # 隐藏默认索引
生成的HTML表格虽然功能完整,但样式较为简单。我们可以通过classes参数引入Bootstrap等CSS框架:
python复制df.to_html('report_styled.html',
classes=['table', 'table-striped', 'table-hover'],
border=0,
float_format='{:.1%}'.format)
要让报表更具专业感,可以添加自定义CSS和交互元素。以下是一个完整案例:
python复制# 生成带样式和交互的HTML
html_template = """
<!DOCTYPE html>
<html>
<head>
<title>用户行为分析报告</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
<style>
.highlight { background-color: #ffecb3 !important; }
.positive { color: #28a745; }
.negative { color: #dc3545; }
</style>
</head>
<body>
<div class="container mt-4">
<h2 class="mb-4">五月用户行为分析</h2>
%s
</div>
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
<script>
$(document).ready(function(){
$('td:contains("0.0%")').addClass('negative');
$('td:contains("0.1")').addClass('positive');
});
</script>
</body>
</html>
"""
with open('report_pro.html', 'w') as f:
f.write(html_template % df.to_html(classes='table', index=False))
这个报表实现了:
相比HTML,Excel报表更适合需要进一步分析的场景。Pandas结合openpyxl可以实现:
python复制df.to_excel('report_basic.xlsx',
sheet_name='用户行为',
float_format='%.2f') # 保留两位小数
通过openpyxl引擎,我们可以添加条件格式、图表等高级功能:
python复制with pd.ExcelWriter('report_advanced.xlsx', engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='原始数据')
# 获取workbook对象进行格式设置
workbook = writer.book
worksheet = writer.sheets['原始数据']
# 添加条件格式
from openpyxl.formatting.rule import ColorScaleRule
rule = ColorScaleRule(start_type='min', start_color='FF0000',
end_type='max', end_color='00FF00')
worksheet.conditional_formatting.add('D2:D100', rule)
# 添加图表
from openpyxl.chart import BarChart, Reference
chart = BarChart()
data = Reference(worksheet, min_col=3, max_col=4, min_row=1, max_row=10)
chart.add_data(data, titles_from_data=True)
worksheet.add_chart(chart, "F2")
最终生成的Excel文件包含:
让我们通过一个完整案例,演示从原始数据到交付物的全流程:
python复制# 读取原始数据
raw_df = pd.read_csv('user_logs.csv')
# 数据清洗
clean_df = (raw_df
.dropna(subset=['user_id', 'timestamp'])
.assign(date=lambda x: pd.to_datetime(x['timestamp']).dt.date)
.groupby(['date', 'user_id'])
.agg({'duration':'sum', 'purchase':'max'})
.reset_index()
.assign(conversion=lambda x: x['purchase'] / x['duration'] * 1000)
)
python复制# 按日期分组生成日报
for date, group in clean_df.groupby('date'):
html_report = f"""
<!DOCTYPE html>
<html>
<head>
<title>{date} 用户行为日报</title>
<style>
.summary {{
background: #f8f9fa;
padding: 15px;
margin-bottom: 20px;
border-radius: 5px;
}}
.metric {{
font-size: 1.2em;
font-weight: bold;
}}
</style>
</head>
<body>
<div class="summary">
<h2>{date} 关键指标</h2>
<div>活跃用户: <span class="metric">{len(group)}</span></div>
<div>平均停留: <span class="metric">{group['duration'].mean():.0f}</span>秒</div>
<div>转化率: <span class="metric">{group['conversion'].mean():.2f}%</span></div>
</div>
{group.to_html(classes='table', index=False)}
</body>
</html>
"""
with open(f'report_{date}.html', 'w') as f:
f.write(html_report)
python复制# 生成周汇总数据
weekly_df = (clean_df
.groupby(pd.to_datetime(clean_df['date']).dt.to_period('W'))
.agg({'user_id':'count', 'duration':'mean', 'conversion':'mean'})
.rename(columns={'user_id':'活跃用户'})
)
# 创建带格式的Excel周报
with pd.ExcelWriter('weekly_report.xlsx', engine='openpyxl') as writer:
weekly_df.to_excel(writer, sheet_name='周汇总')
# 获取工作表对象
workbook = writer.book
worksheet = writer.sheets['周汇总']
# 设置数字格式
for row in worksheet.iter_rows(min_row=2, max_row=len(weekly_df)+1, min_col=3, max_col=4):
for cell in row:
cell.number_format = '0.00'
# 添加迷你图
from openpyxl.chart.sparkline import SparklineGroup, Sparkline
sparkline_group = SparklineGroup(
displayEmptyCellsAs='gap',
type='column'
)
data_range = f'B2:B{len(weekly_df)+1}'
sparkline = Sparkline(
dataRange=data_range,
rangeAddress=data_range
)
sparkline_group.append(sparkline)
worksheet.add_sparkline(f'F2', sparkline_group)
当DataFrame较大时,可以采取以下优化措施:
python复制# 分块处理大数据
chunk_size = 10000
with pd.ExcelWriter('large_report.xlsx', engine='openpyxl') as writer:
for i, chunk in enumerate(pd.read_csv('large_data.csv', chunksize=chunk_size)):
chunk.to_excel(writer, sheet_name=f'Part {i+1}', index=False)
# 使用高效格式
df.to_parquet('temp.parquet') # 比CSV快10倍
结合定时任务实现日报自动生成:
python复制import schedule
import time
def generate_daily_report():
today = pd.Timestamp.now().date()
df = get_todays_data() # 自定义数据获取函数
df.to_html(f'reports/daily_{today}.html')
# 每天下午6点生成报告
schedule.every().day.at("18:00").do(generate_daily_report)
while True:
schedule.run_pending()
time.sleep(60)
将Pandas报表与其他工具集成:
python复制# 与邮件系统集成
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
def send_report_email():
msg = MIMEMultipart()
msg['Subject'] = '每日数据报告'
msg.attach(MIMEText(open('report.html').read(), 'html'))
with smtplib.SMTP('smtp.example.com') as server:
server.sendmail('data@company.com', 'team@company.com', msg.as_string())
# 与Web框架集成
from flask import Flask, send_file
app = Flask(__name__)
@app.route('/report')
def download_report():
df.to_excel('temp_report.xlsx')
return send_file('temp_report.xlsx', as_attachment=True)