这个基于Python和Django的景点人流量预测与可视化分析系统,是我在旅游大数据领域的一次实践探索。系统通过机器学习算法对景点人流量进行预测,并结合可视化技术直观展示分析结果,为旅游管理者提供数据驱动的决策支持工具。
核心功能包括:
技术选型上,我选择了Django作为后端框架,因为它提供了完整的MVT架构和ORM支持,能快速构建数据密集型应用。前端可视化采用Echarts,机器学习部分使用Scikit-learn库实现。
后端架构:
机器学习部分:
前端技术:
系统主要包含以下数据表:
sql复制CREATE TABLE `tourist` (
`id` int NOT NULL AUTO_INCREMENT,
`city` varchar(50) NOT NULL COMMENT '城市',
`name` varchar(100) NOT NULL COMMENT '景点名称',
`level` varchar(10) DEFAULT NULL COMMENT '景点等级',
`score` decimal(3,1) DEFAULT NULL COMMENT '评分',
`price` decimal(10,2) DEFAULT NULL COMMENT '价格',
`sales` int DEFAULT NULL COMMENT '人流量',
`address` varchar(255) DEFAULT NULL COMMENT '地址',
`describe` text COMMENT '描述',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
sql复制CREATE TABLE `user` (
`id` int NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`password` varchar(255) NOT NULL,
`email` varchar(100) DEFAULT NULL,
`phone` varchar(20) DEFAULT NULL,
`address` varchar(255) DEFAULT NULL,
`gender` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
预测功能的核心代码如下:
python复制def predict(request):
if request.method == 'POST':
# 数据库连接配置
db_host = settings.DATABASE_HOST
db_username = settings.DATABASE_USER
db_password = settings.DATABASE_PSW
db_port = settings.DATABASE_PORT
db_name = settings.DATABASE_NAME
# 创建数据库连接
engine = create_engine(
f'mysql+pymysql://{db_username}:{db_password}@{db_host}:{db_port}/{db_name}'
)
# 数据预处理
data = pd.read_sql('tourist', con=engine)
data = data[['level', 'score', 'price', 'sales']].copy()
data['level'] = data['level'].fillna('0A')
# 类型转换和缺失值处理
numeric_cols = ['score', 'price', 'sales']
data[numeric_cols] = data[numeric_cols].apply(pd.to_numeric, errors='coerce')
data = data.dropna()
# 特征编码
label_encoder = LabelEncoder()
data['level'] = label_encoder.fit_transform(data['level'].astype(str))
# 特征标准化
scaler = StandardScaler()
features = ['level', 'score', 'price']
X = data[features]
y = data['sales']
X_scaled = scaler.fit_transform(X)
# 模型训练
model = LinearRegression()
model.fit(X_scaled, y)
# 获取用户输入
level = request.POST.get('level')
price = float(request.POST.get('price'))
score = float(request.POST.get('score'))
# 预测处理
try:
new_data = pd.DataFrame({
'level': [level],
'score': [score],
'price': [price]
})
new_data['level'] = label_encoder.transform(new_data['level'])
X_new = scaler.transform(new_data[features])
y_pred = round(model.predict(X_new)[0])
return JsonResponse({'prediction': y_pred})
except Exception as e:
return JsonResponse({'error': str(e)}, status=400)
return render(request, 'predict.html')
Echarts可视化部分的关键代码:
javascript复制// 景点评分分布柱状图
function initScoreChart() {
const chartDom = document.getElementById('score-chart');
const myChart = echarts.init(chartDom);
$.get('/api/score-distribution/', function(data) {
const option = {
title: { text: '景点评分分布' },
tooltip: {},
xAxis: {
data: data.score_ranges,
name: '评分区间'
},
yAxis: { name: '景点数量' },
series: [{
name: '数量',
type: 'bar',
data: data.counts,
itemStyle: {
color: function(params) {
const colorList = ['#c23531','#2f4554','#61a0a8','#d48265','#91c7ae'];
return colorList[params.dataIndex % colorList.length];
}
}
}]
};
myChart.setOption(option);
});
}
// 城市流量热力图
function initCityHeatmap() {
const chartDom = document.getElementById('city-heatmap');
const myChart = echarts.init(chartDom);
$.get('/api/city-traffic/', function(data) {
const option = {
title: { text: '城市流量分布' },
tooltip: {
position: 'top'
},
visualMap: {
min: 0,
max: data.max_value,
calculable: true,
orient: 'horizontal',
left: 'center',
bottom: '5%'
},
series: [{
name: '流量热度',
type: 'heatmap',
data: data.heat_data,
label: {
show: true
},
emphasis: {
itemStyle: {
shadowBlur: 10,
shadowColor: 'rgba(0, 0, 0, 0.5)'
}
}
}]
};
myChart.setOption(option);
});
}
在人流量预测中,我们采用多元线性回归模型:
code复制人流量 = β₀ + β₁×等级 + β₂×评分 + β₃×价格 + ε
模型训练过程:
实际应用中,我们发现模型在以下场景表现最佳:
python复制# 不好的写法:N+1查询问题
spots = Tourist.objects.all()
for spot in spots:
print(spot.city.name)
# 优化写法:使用select_related
spots = Tourist.objects.select_related('city').all()
python复制from django.core.cache import cache
def get_traffic_data():
data = cache.get('traffic_data')
if not data:
data = expensive_db_query()
cache.set('traffic_data', data, timeout=3600)
return data
python复制from celery import shared_task
@shared_task
def generate_report(report_id):
report = Report.objects.get(id=report_id)
# 生成报表的耗时操作
report.status = 'completed'
report.save()
推荐部署方案:
部署步骤:
bash复制sudo apt update
sudo apt install python3-pip python3-dev libmysqlclient-dev nginx
ini复制[Unit]
Description=Gunicorn service
After=network.target
[Service]
User=ubuntu
Group=www-data
WorkingDirectory=/home/ubuntu/project
ExecStart=/home/ubuntu/venv/bin/gunicorn --workers 3 --bind unix:/tmp/project.sock project.wsgi:application
[Install]
WantedBy=multi-user.target
nginx复制server {
listen 80;
server_name your_domain.com;
location / {
include proxy_params;
proxy_pass http://unix:/tmp/project.sock;
}
location /static/ {
alias /home/ubuntu/project/static/;
}
}
推荐监控指标:
可以使用Prometheus + Grafana搭建监控系统:
yaml复制# prometheus.yml 配置示例
scrape_configs:
- job_name: 'django'
metrics_path: '/metrics'
static_configs:
- targets: ['localhost:8000']
问题现象:
解决方案:
python复制# 添加节假日特征
def add_holiday_feature(df):
holidays = ['2023-01-01', '2023-05-01', ...]
df['is_holiday'] = df['date'].isin(holidays).astype(int)
return df
python复制from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(
n_estimators=100,
max_depth=10,
random_state=42
)
优化方案:
python复制class Tourist(models.Model):
# 按城市首字母分表
class Meta:
db_table = 'tourist_%s' % city[0].lower()
python复制# 使用iterator()处理大数据集
for spot in Tourist.objects.all().iterator():
process(spot)
sql复制ALTER TABLE tourist ADD INDEX idx_city_sales (city, sales);
python复制from sklearn.neighbors import NearestNeighbors
def recommend_spots(spot_id, n=5):
spot = Tourist.objects.get(id=spot_id)
features = get_features_matrix()
nn = NearestNeighbors(n_neighbors=n+1).fit(features)
distances, indices = nn.kneighbors([features[spot_id]])
return indices[0][1:]
这个项目从技术选型到功能实现,每个环节都经过仔细考量。在实际开发中,最大的挑战是平衡模型的准确性和系统性能。通过合理的架构设计和持续优化,最终实现了既满足业务需求又保持良好用户体验的系统。