窗口函数(Window Function)是SQL中一种强大的分析工具,它能够在保留原始行明细的同时,对数据的特定"窗口"进行计算。与传统的GROUP BY聚合不同,窗口函数不会合并行,而是将计算结果附加到每一行上。
传统GROUP BY聚合的工作原理:
窗口函数的典型特征:
实际经验:在电商分析中,GROUP BY适合计算每日总销售额,而窗口函数适合分析每笔订单在用户所有订单中的排名或占比。
完整窗口函数语法包含三个核心部分:
sql复制函数名(参数) OVER(
PARTITION BY 分组字段
ORDER BY 排序字段
[窗口范围FRAME子句]
)
基本特性:
典型应用场景:
sql复制SELECT
user_id,
order_id,
created_at,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at) AS order_seq
FROM orders;
RANK()特点:
DENSE_RANK()特点:
实际案例对比:
sql复制SELECT
product_id,
sales_amount,
RANK() OVER(ORDER BY sales_amount DESC) AS rank_val,
DENSE_RANK() OVER(ORDER BY sales_amount DESC) AS dense_rank_val
FROM products;
获取每个用户支付金额最高的3笔订单(允许并列):
sql复制SELECT *
FROM (
SELECT
o.*,
RANK() OVER(PARTITION BY user_id ORDER BY pay_amount DESC) AS rnk
FROM orders
WHERE paid_at IS NOT NULL
) t
WHERE t.rnk <= 3;
注意事项:RANK()可能返回多于N条记录(因并列),如需严格N条应使用ROW_NUMBER()
计算每个用户的总GMV(同时保留订单明细):
sql复制SELECT
user_id,
order_id,
pay_amount,
SUM(pay_amount) OVER(PARTITION BY user_id) AS user_gmv,
pay_amount/SUM(pay_amount) OVER(PARTITION BY user_id) AS amount_ratio
FROM orders
WHERE paid_at IS NOT NULL;
按时间顺序计算用户累计消费金额:
sql复制SELECT
user_id,
order_id,
created_at,
pay_amount,
SUM(pay_amount) OVER(
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
WHERE paid_at IS NOT NULL;
计算每个用户最近3笔订单的平均金额:
sql复制SELECT
user_id,
order_id,
created_at,
pay_amount,
AVG(pay_amount) OVER(
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS avg_last_3
FROM orders
WHERE paid_at IS NOT NULL;
FRAME子句定义了窗口函数计算的行范围,主要语法元素:
ROWS vs RANGE:
边界定义:
常见组合示例:
sql复制-- 从开始到当前行(默认)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- 最近3行(包括当前)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
-- 前后各1行(共3行)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
-- 整个分区
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
窗口函数结果不能直接在WHERE中过滤,必须使用子查询或CTE:
sql复制-- 使用CTE方式
WITH ranked_orders AS (
SELECT
o.*,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM orders o
)
SELECT * FROM ranked_orders WHERE rn = 1;
-- 使用子查询方式
SELECT * FROM (
SELECT
o.*,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM orders o
) t WHERE t.rn = 1;
单次查询可包含多个窗口函数,各自独立定义:
sql复制SELECT
user_id,
order_id,
created_at,
pay_amount,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at) AS order_seq,
SUM(pay_amount) OVER(PARTITION BY user_id) AS user_gmv,
pay_amount/SUM(pay_amount) OVER(PARTITION BY user_id) AS amount_pct,
AVG(pay_amount) OVER(
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS avg_last_3
FROM orders
WHERE paid_at IS NOT NULL;
计算用户访问间隔及会话划分:
sql复制WITH user_visits AS (
SELECT
user_id,
visit_time,
LAG(visit_time) OVER(PARTITION BY user_id ORDER BY visit_time) AS prev_visit
FROM user_activity
)
SELECT
user_id,
visit_time,
prev_visit,
TIMESTAMPDIFF(MINUTE, prev_visit, visit_time) AS mins_since_last,
CASE WHEN TIMESTAMPDIFF(MINUTE, prev_visit, visit_time) > 30 OR prev_visit IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM user_visits;
计算各阶段转化率:
sql复制WITH funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS viewed,
MAX(CASE WHEN event_type = 'cart' THEN 1 ELSE 0 END) AS carted,
MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchased
FROM user_events
GROUP BY user_id
),
funnel_counts AS (
SELECT
SUM(viewed) AS total_views,
SUM(carted) AS total_carts,
SUM(purchased) AS total_purchases
FROM funnel_steps
)
SELECT
total_views,
total_carts,
total_purchases,
total_carts/total_views AS view_to_cart_rate,
total_purchases/total_carts AS cart_to_purchase_rate
FROM funnel_counts;
计算7日移动平均和同比变化:
sql复制WITH daily_metrics AS (
SELECT
report_date,
SUM(sales) AS daily_sales
FROM sales_data
GROUP BY report_date
)
SELECT
report_date,
daily_sales,
AVG(daily_sales) OVER(
ORDER BY report_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS weekly_moving_avg,
daily_sales/LAG(daily_sales, 364) OVER(ORDER BY report_date) AS yoy_change
FROM daily_metrics
ORDER BY report_date;
窗口函数是SQL数据分析的利器,掌握它能大幅提升数据处理效率。在实际应用中,建议先从简单场景入手,逐步尝试复杂分析模式。根据我的经验,合理使用窗口函数通常能将原本需要多步处理的任务简化为单次查询,同时保持代码的可读性和维护性。