窗口函数是SQL中非常强大的工具,而row_number、rank和dense_rank这三个排序函数在实际业务场景中使用频率极高。很多开发者在面试或工作中都会遇到需要精确控制排序逻辑的情况,但往往对这三个函数的区别感到困惑。本文将用一个完整的电商订单分析案例,带你彻底理解它们的差异和应用场景。
在深入探讨这三个排序函数之前,我们需要先明确什么是窗口函数。窗口函数(Window Function)是SQL中一种特殊的函数,它能够在保持原始行不变的同时,对一组相关的行进行计算。与聚合函数不同,窗口函数不会将多行合并为一行,而是为每一行返回一个值。
窗口函数的基本语法结构如下:
sql复制函数名() OVER (
[PARTITION BY 列名1, 列名2...]
[ORDER BY 列名 [ASC|DESC]]
[frame_clause]
)
其中:
PARTITION BY:定义窗口的分区,类似于GROUP BYORDER BY:定义窗口内的排序规则frame_clause:定义窗口框架,即计算时考虑的行范围窗口函数的执行顺序是在WHERE、GROUP BY和HAVING之后,但在ORDER BY之前。这意味着:
为了更好地理解这三个函数的区别,我们创建一个电商订单分析的案例数据集。假设我们有一个订单表,包含以下字段:
sql复制CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
order_date DATE,
amount DECIMAL(10,2),
region VARCHAR(50)
);
-- 插入示例数据
INSERT INTO orders VALUES
(1, 101, 1001, '2023-01-15', 1500.00, '华东'),
(2, 102, 1002, '2023-01-16', 800.00, '华北'),
(3, 101, 1003, '2023-01-17', 1200.00, '华东'),
(4, 103, 1001, '2023-01-18', 1500.00, '华南'),
(5, 104, 1004, '2023-01-19', 2000.00, '华东'),
(6, 102, 1005, '2023-01-20', 800.00, '华北'),
(7, 105, 1006, '2023-01-21', 3000.00, '华南'),
(8, 103, 1007, '2023-01-22', 2500.00, '华南'),
(9, 104, 1008, '2023-01-23', 1800.00, '华东'),
(10, 105, 1009, '2023-01-24', 1200.00, '华南');
这个数据集包含了10个订单,涉及5个客户、9种产品,分布在3个地区。我们将基于这个数据集来演示三个排序函数的不同行为。
ROW_NUMBER()是最简单的排序函数,它为每一行分配一个唯一的序号,即使排序值相同,也会分配不同的序号。
sql复制SELECT
order_id,
customer_id,
amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num
FROM orders;
执行结果示例:
| order_id | customer_id | amount | row_num |
|---|---|---|---|
| 7 | 105 | 3000.00 | 1 |
| 8 | 103 | 2500.00 | 2 |
| 5 | 104 | 2000.00 | 3 |
| 9 | 104 | 1800.00 | 4 |
| 1 | 101 | 1500.00 | 5 |
| 4 | 103 | 1500.00 | 6 |
| 3 | 101 | 1200.00 | 7 |
| 10 | 105 | 1200.00 | 8 |
| 2 | 102 | 800.00 | 9 |
| 6 | 102 | 800.00 | 10 |
关键特点:
RANK()函数会在排序值相同时分配相同的排名,但会跳过后续的排名序号。
sql复制SELECT
order_id,
customer_id,
amount,
RANK() OVER (ORDER BY amount DESC) AS rank_val
FROM orders;
执行结果示例:
| order_id | customer_id | amount | rank_val |
|---|---|---|---|
| 7 | 105 | 3000.00 | 1 |
| 8 | 103 | 2500.00 | 2 |
| 5 | 104 | 2000.00 | 3 |
| 9 | 104 | 1800.00 | 4 |
| 1 | 101 | 1500.00 | 5 |
| 4 | 103 | 1500.00 | 5 |
| 3 | 101 | 1200.00 | 7 |
| 10 | 105 | 1200.00 | 7 |
| 2 | 102 | 800.00 | 9 |
| 6 | 102 | 800.00 | 9 |
关键特点:
DENSE_RANK()函数与RANK()类似,但在排序值相同时不会跳过后续的排名序号。
sql复制SELECT
order_id,
customer_id,
amount,
DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank_val
FROM orders;
执行结果示例:
| order_id | customer_id | amount | dense_rank_val |
|---|---|---|---|
| 7 | 105 | 3000.00 | 1 |
| 8 | 103 | 2500.00 | 2 |
| 5 | 104 | 2000.00 | 3 |
| 9 | 104 | 1800.00 | 4 |
| 1 | 101 | 1500.00 | 5 |
| 4 | 103 | 1500.00 | 5 |
| 3 | 101 | 1200.00 | 6 |
| 10 | 105 | 1200.00 | 6 |
| 2 | 102 | 800.00 | 7 |
| 6 | 102 | 800.00 | 7 |
关键特点:
窗口函数的真正威力在于能够对数据进行分区后排序。让我们看几个实际业务场景中的例子。
sql复制SELECT
order_id,
region,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS region_row_num,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS region_rank,
DENSE_RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS region_dense_rank
FROM orders;
执行结果(华东地区部分):
| order_id | region | amount | region_row_num | region_rank | region_dense_rank |
|---|---|---|---|---|---|
| 7 | 华东 | 3000.00 | 1 | 1 | 1 |
| 5 | 华东 | 2000.00 | 2 | 2 | 2 |
| 9 | 华东 | 1800.00 | 3 | 3 | 3 |
| 1 | 华东 | 1500.00 | 4 | 4 | 4 |
| 3 | 华东 | 1200.00 | 5 | 5 | 5 |
业务价值:
sql复制SELECT * FROM (
SELECT
order_id,
customer_id,
amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS cust_rank
FROM orders
) ranked_orders
WHERE cust_rank = 1;
执行结果:
| order_id | customer_id | amount | cust_rank |
|---|---|---|---|
| 1 | 101 | 1500.00 | 1 |
| 2 | 102 | 800.00 | 1 |
| 8 | 103 | 2500.00 | 1 |
| 5 | 104 | 2000.00 | 1 |
| 7 | 105 | 3000.00 | 1 |
业务价值:
ROW_NUMBER()是实现高效分页查询的理想选择:
sql复制-- 获取第2页数据,每页3条记录
SELECT * FROM (
SELECT
order_id,
customer_id,
amount,
ROW_NUMBER() OVER (ORDER BY order_date DESC) AS row_num
FROM orders
) paginated
WHERE row_num BETWEEN 4 AND 6;
性能提示:
根据业务需求选择合适的排序函数:
| 场景 | 推荐函数 | 理由 |
|---|---|---|
| 分页查询 | ROW_NUMBER() | 需要确定性的排序结果 |
| 比赛排名(允许并列) | RANK() | 反映实际排名位置,如金牌、银牌、铜牌 |
| 客户分层(如金牌/银牌/铜牌客户) | DENSE_RANK() | 保持等级连续性,便于后续分析 |
| 获取每组前N名 | 三者均可 | 取决于如何处理并列情况 |
窗口函数还支持定义更精确的窗口框架:
sql复制-- 计算移动平均
SELECT
order_id,
order_date,
amount,
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM orders;
窗口框架类型:
ROWS BETWEEN ... AND ...:按物理行数定义窗口RANGE BETWEEN ... AND ...:按逻辑值范围定义窗口GROUPS BETWEEN ... AND ...:按分组定义窗口在WHERE子句中引用窗口函数结果:
sql复制-- 错误示例
SELECT order_id, ROW_NUMBER() OVER () AS rn
FROM orders
WHERE rn <= 5; -- 这里会报错
-- 正确做法
SELECT * FROM (
SELECT order_id, ROW_NUMBER() OVER () AS rn
FROM orders
) t WHERE rn <= 5;
忽略NULL值的排序行为:
NULLS FIRST或NULLS LAST明确指定性能问题:
逐步构建查询:
使用CTE提高可读性:
sql复制WITH ranked_orders AS (
SELECT
order_id,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS rnk
FROM orders
)
SELECT * FROM ranked_orders WHERE rnk <= 3;
检查执行计划:
EXPLAIN分析查询性能