这个项目是我去年为本地一所高校开发的毕业生就业服务平台,核心目标是解决学生求职信息分散、企业校招效率低下的痛点。平台采用Python+Flask技术栈,实现了简历投递、职位搜索、企业招聘管理和数据分析等核心功能。经过三个月的开发和两个月的试运行,目前系统已稳定服务2000+学生和30余家合作企业。
选择Flask框架主要基于三个考量:首先,高校就业系统的业务逻辑相对明确但需求变化频繁,Flask的轻量级特性便于快速迭代;其次,学校IT部门的技术储备以Python为主,降低后期维护成本;最后,通过合理的架构设计,Flask完全可以支撑万级用户量的并发需求(实测在2核4G服务器上可稳定处理300+ QPS)。
系统设计了三类角色及其核心交互流程:
特别注意:学生与企业账号必须严格隔离,这是系统安全的基础。我们采用role字段区分用户类型,所有接口都必须进行角色验证。
| 需求场景 | MySQL方案 | MongoDB方案 |
|---|---|---|
| 简历存储 | 大文本字段+文件系统路径 | 直接存储结构化简历数据 |
| 复杂查询 | 联表查询性能好 | 需要建立适当索引 |
| 扩展性 | 分库分表方案成熟 | 天然支持水平扩展 |
| 最终选择 | 主库MySQL+简历附件存MongoDB | 混合方案兼顾性能与灵活性 |
python复制# 使用Flask-Login扩展实现
class User(UserMixin):
def __init__(self, id, role):
self.id = id
self.role = role
@login_manager.user_loader
def load_user(user_id):
user = db.session.query(User).get(user_id)
return User(user.id, user.role) if user else None
# 角色验证装饰器
def role_required(role):
def decorator(f):
@wraps(f)
def decorated_function(*args, **kwargs):
if current_user.role != role:
abort(403)
return f(*args, **kwargs)
return decorated_function
return decorator
python复制# 使用Elasticsearch构建搜索服务
def build_search_index():
es = Elasticsearch()
mappings = {
"properties": {
"skills": {"type": "keyword"},
"education": {
"type": "nested",
"properties": {
"school": {"type": "text"},
"major": {"type": "keyword"}
}
}
}
}
es.indices.create(index='resumes', body={"mappings": mappings})
# 复合条件查询示例
def search_resumes(keywords, min_gpa=None, required_skills=[]):
query = {
"bool": {
"must": [{"match": {"content": keywords}}],
"filter": []
}
}
if min_gpa:
query["bool"]["filter"].append({"range": {"gpa": {"gte": min_gpa}}})
if required_skills:
query["bool"]["filter"].append({"terms": {"skills": required_skills}})
return es.search(index='resumes', body={"query": query})
就业趋势分析采用Pandas+Matplotlib组合:
python复制def generate_employment_report():
# 从数据库读取原始数据
df = pd.read_sql("""
SELECT j.industry, COUNT(a.id) as applications
FROM jobs j LEFT JOIN applications a ON j.id = a.job_id
GROUP BY j.industry
""", db.engine)
# 生成可视化图表
plt.figure(figsize=(12,6))
df.set_index('industry')['applications'].sort_values().plot(
kind='barh',
title='各行业岗位申请热度分布'
)
plt.tight_layout()
return plt.gcf()
我们最终采用的架构:
code复制 +-----------------+
| 阿里云SLB |
+--------+--------+
|
+----------------+----------------+
| |
+-------+-------+ +---------+---------+
| Nginx(2C4G) | | Nginx(2C4G) |
+-------+-------+ +---------+---------+
| |
+-------+-------+ +---------+---------+
| Gunicorn | | Gunicorn |
| (4 workers) | | (4 workers) |
+-------+-------+ +---------+---------+
| |
+-------+-------+ +---------+---------+
| MySQL主库 | | MySQL从库 |
+-------+-------+ +---------+---------+
| |
+-------+-------+ +---------+---------+
| Redis缓存 | | MongoDB集群 |
+---------------+ +-------------------+
数据库层面:
sql复制CREATE INDEX idx_job_search ON jobs(title, company_id, status)
INCLUDE (salary_min, salary_max, location);
应用层面:
python复制@cache.cached(timeout=300, key_prefix='hot_jobs')
def get_hot_jobs():
return Job.query.order_by(Job.view_count.desc()).limit(10).all()
前端优化:
初期直接使用Flask接收文件上传,在并发量较大时出现内存溢出。最终解决方案:
nginx复制client_max_body_size 20M;
location /upload {
upload_pass @flask;
upload_store /tmp/uploads;
upload_set_form_field $upload_field_name.name "$upload_file_name";
upload_set_form_field $upload_field_name.path "$upload_tmp_path";
}
javascript复制function uploadInChunks(file) {
const chunkSize = 5 * 1024 * 1024; // 5MB
let offset = 0;
while (offset < file.size) {
const chunk = file.slice(offset, offset + chunkSize);
const formData = new FormData();
formData.append('chunk', chunk);
formData.append('offset', offset);
await axios.post('/upload', formData);
offset += chunkSize;
}
}
简历投递时出现超发问题(职位剩余数量判断失效)。采用乐观锁解决:
python复制def apply_job(job_id):
job = Job.query.filter_by(id=job_id).with_for_update().first()
if job.remaining_positions <= 0:
raise BusinessError("职位已招满")
try:
job.remaining_positions -= 1
db.session.add(Application(...))
db.session.commit()
except Exception:
db.session.rollback()
raise
使用WebSocket实现状态变更实时推送:
python复制# Flask-SocketIO集成
@socketio.on('connect')
def handle_connect():
if current_user.is_authenticated:
join_room(f'user_{current_user.id}')
def notify_user(user_id, message):
emit('notification',
{'type': 'application_update', 'data': message},
room=f'user_{user_id}'
)
基于TF-IDF和Word2Vec的简单实现:
python复制from sklearn.feature_extraction.text import TfidfVectorizer
from gensim.models import Word2Vec
class ResumeMatcher:
def __init__(self):
self.tfidf = TfidfVectorizer()
self.w2v = Word2Vec.load('word2vec.model')
def fit(self, job_descriptions):
self.tfidf.fit(job_descriptions)
def predict(self, resume_text, job_description):
# TF-IDF相似度
tfidf_sim = cosine_similarity(
self.tfidf.transform([resume_text]),
self.tfidf.transform([job_description])
)[0][0]
# 词向量相似度
resume_words = [w for w in resume_text.split() if w in self.w2v.wv]
job_words = [w for w in job_description.split() if w in self.w2v.wv]
w2v_sim = self.w2v.wv.n_similarity(resume_words, job_words)
return 0.6 * tfidf_sim + 0.4 * w2v_sim
这个项目从技术选型到最终上线历时三个月,最大的体会是:高校信息系统开发必须平衡技术先进性与运维简便性。比如我们放弃了Kubernetes部署方案而选择简单的云服务器,不是因为技术能力不足,而是要适应学校IT部门的运维习惯。