用Python分析Spotify听歌数据：从API接入到可视化-代码聚汇网

用Python分析Spotify听歌数据：从API接入到可视化

云海天狼

1. 项目概述：音乐数据的秘密花园

作为一名长期使用Spotify的音乐爱好者，我发现自己每年都会收到平台的年度听歌报告，但总感觉这些数据只是冰山一角。直到去年开始用Python分析自己的听歌数据，才发现原来播放记录里藏着这么多有趣的信息——从最常听的音乐风格变化，到不同时间段的口味偏好，甚至能精确到每分钟的播放行为分析。这个项目不需要复杂的设备，只要一台能运行Python的电脑和你的Spotify账号授权，就能开启这段音乐数据探索之旅。

2. 核心工具与技术栈选择

2.1 Spotify Web API的接入

Spotify为开发者提供了完善的Web API接口，通过OAuth 2.0授权机制可以安全地获取用户数据。我选择使用官方推荐的Spotipy库（Python wrapper for the Spotify Web API），相比直接调用API能减少约70%的代码量。安装只需一行命令：

bash复制pip install spotipy

授权流程需要先在Spotify开发者仪表板创建应用，获取CLIENT_ID和CLIENT_SECRET。这里有个实用技巧：将凭证存储在环境变量中而非代码里，既安全又方便多环境使用：

python复制import os
import spotipy
from spotipy.oauth2 import SpotifyOAuth

os.environ['SPOTIPY_CLIENT_ID'] = 'your_client_id'
os.environ['SPOTIPY_CLIENT_SECRET'] = 'your_client_secret'
os.environ['SPOTIPY_REDIRECT_URI'] = 'http://localhost:8888/callback'

scope = "user-library-read user-top-read user-read-recently-played"
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

2.2 数据分析工具选型

Pandas是处理结构化数据的首选，配合Matplotlib和Seaborn可视化能快速生成专业图表。对于时间序列分析，我特别推荐使用Plotly的交互式图表，它能让你缩放查看特定时间段的播放细节。以下是基础配置：

python复制import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
plt.style.use('ggplot')  # 让图表更美观

3. 数据获取与清洗实战

3.1 获取完整听歌历史

Spotify API对数据获取有限制（最近50首播放记录），要获取完整历史需要定期存储数据。我写了个自动化脚本每周同步一次：

python复制def get_recent_tracks(limit=50):
    results = sp.current_user_recently_played(limit=limit)
    tracks = []
    for item in results['items']:
        track = item['track']
        tracks.append({
            'played_at': item['played_at'],
            'name': track['name'],
            'artist': ', '.join([a['name'] for a in track['artists']]),
            'duration_ms': track['duration_ms'],
            'popularity': track['popularity'],
            'uri': track['uri']
        })
    return pd.DataFrame(tracks)

3.2 数据增强与特征工程

原始数据只有基础信息，通过二次请求可以补充音乐特征：

python复制def add_audio_features(df):
    features = []
    for uri in df['uri'].unique():
        try:
            feat = sp.audio_features(uri)[0]
            features.append(feat)
        except:
            continue
    
    feat_df = pd.DataFrame(features)
    return pd.merge(df, feat_df, on='uri', how='left')

关键音频特征包括：

danceability（舞蹈性）：0-1值，越高越适合跳舞
energy（能量感）：反映歌曲强度
valence（愉悦度）：表示音乐情绪
tempo（速度）：BPM值
key/mode（调性）：音乐调式信息

4. 深度分析案例与可视化

4.1 听歌时间模式分析

python复制# 转换时间戳并提取特征
df['played_at'] = pd.to_datetime(df['played_at'])
df['hour'] = df['played_at'].dt.hour
df['day_of_week'] = df['played_at'].dt.day_name()

# 绘制24小时播放热力图
hourly_counts = df.groupby('hour').size()
plt.figure(figsize=(12,6))
sns.barplot(x=hourly_counts.index, y=hourly_counts.values)
plt.title('播放次数随时间分布')
plt.xlabel('小时')
plt.ylabel('播放次数')

4.2 音乐口味变迁分析

使用滚动窗口分析音乐特征变化：

python复制# 计算30天滚动平均值
df.set_index('played_at', inplace=True)
rolling_features = df[['danceability','energy','valence']].rolling('30D').mean()

# 多特征趋势可视化
plt.figure(figsize=(14,8))
for col in rolling_features.columns:
    sns.lineplot(data=rolling_features, x=rolling_features.index, y=col, label=col)
plt.title('音乐特征趋势变化')
plt.ylabel('特征值')
plt.xlabel('日期')

5. 高级分析技巧

5.1 艺人网络关系图

通过共同出现频率构建艺人关系网络：

python复制from itertools import combinations
import networkx as nx

# 生成共现矩阵
artist_pairs = []
for _, group in df.groupby(pd.Grouper(freq='D')):
    artists = list(group['artist'].unique())
    artist_pairs.extend(combinations(artists, 2))

# 创建网络图
G = nx.Graph()
for pair in artist_pairs:
    if G.has_edge(*pair):
        G.edges[pair]['weight'] += 1
    else:
        G.add_edge(*pair, weight=1)

# 可视化
plt.figure(figsize=(16,12))
pos = nx.spring_layout(G, k=0.3)
nx.draw_networkx_nodes(G, pos, node_size=50)
nx.draw_networkx_edges(G, pos, alpha=0.2)

5.2 播放预测模型

使用Prophet预测未来播放模式：

python复制from prophet import Prophet

# 准备时间序列数据
play_counts = df.resample('D').size().reset_index()
play_counts.columns = ['ds', 'y']

# 训练预测模型
model = Prophet(seasonality_mode='multiplicative')
model.fit(play_counts)

# 生成预测
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# 可视化结果
fig = model.plot(forecast)

6. 实用技巧与避坑指南

API调用优化：

使用sp._session.cache_path启用请求缓存
批量获取音频特征（最多100首/请求）
设置合理的rate limit处理（建议5请求/秒）

数据存储方案：

python复制# 使用SQLite持久化存储
import sqlite3
conn = sqlite3.connect('spotify_data.db')
df.to_sql('play_history', conn, if_exists='append', index=False)

常见错误处理：

401错误：检查token是否过期（默认1小时）
429错误：实现指数退避重试机制
数据缺失：添加try-catch块跳过错误项

可视化美化技巧：

python复制# 创建雷达图展示音乐特征
def create_radar_chart(features):
    categories = list(features.keys())
    values = list(features.values())
    
    angles = np.linspace(0, 2*np.pi, len(categories), endpoint=False)
    values += values[:1]
    angles += angles[:1]
    
    fig, ax = plt.subplots(figsize=(8,8), subplot_kw=dict(polar=True))
    ax.fill(angles, values, color='skyblue', alpha=0.6)
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(categories)
    return fig

7. 项目扩展方向

实时听歌仪表盘：

使用Dash或Streamlit构建Web应用
添加自动刷新机制（每10分钟更新）
集成Spotify Web Playback SDK实现控制功能

音乐推荐系统：

python复制from sklearn.neighbors import NearestNeighbors

# 基于音频特征构建推荐模型
features = df[['danceability','energy','valence','tempo']]
model = NearestNeighbors(n_neighbors=5).fit(features)

# 为指定歌曲找相似曲目
def find_similar_tracks(track_id):
    track_features = sp.audio_features(track_id)[0]
    input_features = [[track_features[k] for k in ['danceability','energy','valence','tempo']]]
    distances, indices = model.kneighbors(input_features)
    return df.iloc[indices[0]]

跨平台数据整合：

同步Last.fm等其他平台数据
使用AudioDB API补充专辑信息
构建个人音乐知识图谱