刚接触SQL窗口函数时,我完全被那些OVER、PARTITION BY、ORDER BY的语法搞晕了。直到在真实业务场景中遇到需要计算移动平均、排名和累计求和的问题,才真正理解窗口函数的强大之处。这篇文章记录了我从零开始掌握窗口函数的关键知识点和踩过的坑,特别适合已经会基础SQL但想进阶数据分析的开发者。
窗口函数与传统聚合函数的本质区别在于:它不会将多行合并为一行,而是为每一行都返回一个计算结果。这种特性让我们能够实现诸如"查看每个员工的薪水在其部门中的排名"、"计算最近30天的滚动销售额"这类复杂分析。根据我的实践,掌握窗口函数能让你的SQL技能直接提升一个Level。
窗口函数的典型语法如下:
sql复制函数名(列) OVER (
[PARTITION BY 分组列]
[ORDER BY 排序列 [ASC|DESC]]
[ROWS|RANGE 框架定义]
)
以计算部门薪资排名的经典案例为例:
sql复制SELECT
employee_name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
这里的关键点:
PARTITION BY department 表示按部门分组计算ORDER BY salary DESC 指定降序排列RANK() 是排名函数,会给相同薪资的员工相同排名注意:不同数据库对窗口函数的支持略有差异,MySQL 8.0+、PostgreSQL、Oracle、SQL Server都支持,但语法细节可能不同。
根据我的使用经验,窗口函数主要分为这几类:
排名函数:
ROW_NUMBER():连续编号(1,2,3...)RANK():相同值同排名,会跳号(1,2,2,4...)DENSE_RANK():相同值同排名,不跳号(1,2,2,3...)分布函数:
PERCENT_RANK():相对排名百分比CUME_DIST():累积分布值前后值函数:
LAG(列, 偏移量):获取前一行的值LEAD(列, 偏移量):获取后一行的值FIRST_VALUE(列):窗口第一行的值LAST_VALUE(列):窗口最后一行的值聚合函数:
SUM()/AVG()/COUNT()/MIN()/MAX()等聚合函数配合窗口使用假设我们有一个电商订单表orders,包含字段:order_id, user_id, order_date, amount。以下是几个实用案例:
案例1:计算每个用户的累计消费金额
sql复制SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders;
案例2:找出每个用户购买金额最高的订单
sql复制WITH ranked_orders AS (
SELECT
*,
RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) as rn
FROM orders
)
SELECT * FROM ranked_orders WHERE rn = 1;
案例3:计算7天移动平均销售额
sql复制SELECT
sales_date,
daily_sales,
AVG(daily_sales) OVER (
ORDER BY sales_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS 7day_moving_avg
FROM daily_sales_data;
这里的关键是ROWS BETWEEN 6 PRECEDING AND CURRENT ROW,它定义了计算平均值的窗口范围是当前行及其前6行。
窗口框架定义了函数计算的行范围,常见形式有:
ROWS BETWEEN N PRECEDING AND M FOLLOWINGRANGE BETWEEN INTERVAL '3' DAY PRECEDING AND CURRENT ROW实际案例:计算每个员工与部门平均薪资的差异
sql复制SELECT
employee_name,
department,
salary,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;
优化示例:
sql复制SELECT
product_id,
date,
sales,
AVG(sales) OVER w AS avg_sales,
SUM(sales) OVER w AS total_sales
FROM sales_data
WINDOW w AS (PARTITION BY product_id ORDER BY date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW);
这是最容易混淆的点之一,通过这个例子可以清晰区分:
sql复制SELECT
score,
ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
RANK() OVER (ORDER BY score DESC) as rank_val,
DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank_val
FROM test_scores;
对于分数[100,100,90,80]的结果会是:
窗口函数对NULL值的处理方式因函数而异:
示例:
sql复制SELECT
product,
sales,
RANK() OVER (ORDER BY COALESCE(sales,0) DESC) as sales_rank
FROM products;
SQL的执行顺序会影响窗口函数的结果:
这意味着WHERE条件会在窗口函数之前应用,如果需要先计算窗口函数再过滤,必须使用子查询或CTE。
计算次日留存率是一个经典案例:
sql复制WITH user_first_activity AS (
SELECT
user_id,
MIN(activity_date) AS first_date
FROM user_activities
GROUP BY user_id
),
user_activity_status AS (
SELECT
a.first_date,
COUNT(DISTINCT a.user_id) AS new_users,
COUNT(DISTINCT CASE WHEN b.activity_date = a.first_date + INTERVAL '1 day' THEN b.user_id END) AS retained_users
FROM user_first_activity a
LEFT JOIN user_activities b ON a.user_id = b.user_id
GROUP BY a.first_date
)
SELECT
first_date,
new_users,
retained_users,
ROUND(retained_users * 100.0 / new_users, 2) AS retention_rate
FROM user_activity_status
ORDER BY first_date;
使用窗口函数可以轻松实现漏斗转化率计算:
sql复制WITH funnel_steps AS (
SELECT
user_id,
SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS viewed,
SUM(CASE WHEN event_type = 'cart' THEN 1 ELSE 0 END) AS carted,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchased
FROM user_events
GROUP BY user_id
)
SELECT
COUNT(user_id) AS total_users,
SUM(viewed) AS total_views,
SUM(carted) AS total_carts,
SUM(purchased) AS total_purchases,
ROUND(SUM(carted) * 100.0 / NULLIF(SUM(viewed),0), 2) AS view_to_cart_rate,
ROUND(SUM(purchased) * 100.0 / NULLIF(SUM(carted),0), 2) AS cart_to_purchase_rate
FROM funnel_steps;
不同数据库对窗口函数的实现有细微差别:
MySQL:
RANGE间隔的某些高级用法PostgreSQL:
SQL Server:
TOP N WITH TIES结合窗口函数Oracle:
兼容性写法示例:
sql复制-- 计算移动平均的跨数据库写法
SELECT
date,
value,
AVG(value) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM time_series_data;
当窗口函数结果不符合预期时,我的调试步骤是:
对于复杂的窗口函数查询,我推荐:
示例:
sql复制WITH monthly_sales AS (
SELECT
product_id,
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS monthly_amount
FROM orders
GROUP BY product_id, DATE_TRUNC('month', order_date)
),
sales_growth AS (
SELECT
product_id,
month,
monthly_amount,
LAG(monthly_amount, 1) OVER (PARTITION BY product_id ORDER BY month) AS prev_amount,
monthly_amount * 100.0 / NULLIF(LAG(monthly_amount, 1) OVER (
PARTITION BY product_id ORDER BY month
), 0) AS growth_rate
FROM monthly_sales
)
SELECT * FROM sales_growth WHERE growth_rate IS NOT NULL;
掌握窗口函数后,你会发现很多原本需要多次查询或应用代码处理的分析任务,现在一条SQL就能搞定。这不仅能提高工作效率,还能减少数据在数据库和应用间的传输,显著提升性能。