在数据库查询中,GROUP BY和窗口函数(Window Function)是两种常用的数据聚合方式,但它们的工作机制和适用场景有着本质区别。很多开发者在使用时容易混淆,特别是在需要同时展示明细数据和聚合结果的场景下。
GROUP BY的核心特性是"折叠"数据 - 它会将相同分组键值的行合并为一行,然后对每个分组应用聚合函数(如SUM、COUNT等)。例如,当我们执行SELECT sex, COUNT(*) FROM user GROUP BY sex时,数据库会:
而窗口函数则完全不同 - 它会在保留原始行明细的同时,计算基于"窗口"的聚合值。比如SELECT *, COUNT(*) OVER(PARTITION BY sex) FROM user会:
关键区别:GROUP BY会减少结果行数,窗口函数则保持原行数但增加聚合列
从提供的示例数据开始,我们先看最基本的GROUP BY用法:
sql复制-- 原始用户表数据
SELECT * FROM user;
-- 按性别分组统计phone_id总和
SELECT SUM(phone_id) AS total, sex FROM user GROUP BY sex;
这个查询会返回:
code复制+-----+---+
|total|sex|
+-----+---+
|3 |1 |
|14 |0 |
+-----+---+
这里有几个需要注意的技术细节:
更复杂的分组统计可以通过组合多种聚合函数实现:
sql复制-- 统计每个性别的用户数和总记录数
SELECT
SUM(1) AS total, -- 每行计为1,求和即为总行数
COUNT(*) AS count_all, -- 标准的计数方法
sex
FROM user
GROUP BY sex;
结果:
code复制+-----+-----------+---+
|total|count_all |sex|
+-----+-----------+---+
|2 |2 |1 |
|3 |3 |0 |
+-----+-----------+---+
这里揭示了SQL中几个重要概念:
SUM(1)与COUNT(*)在GROUP BY中效果相同,但语义不同COUNT(*)计算行数,COUNT(column)计算非NULL值示例中最复杂的查询展示了条件聚合的强大能力:
sql复制SELECT
CASE
WHEN SUM(CASE WHEN sex = '1' THEN 1 ELSE 0 END) = COUNT(*) THEN 'lock'
WHEN SUM(CASE WHEN sex = '1' THEN 1 ELSE 0 END) = 0 THEN 'unLock'
ELSE 'partLock'
END AS lockStatus
FROM user
GROUP BY sex;
这个查询实现了:
技巧:在GROUP BY中使用CASE WHEN可以创建灵活的分组条件,这在数据清洗和业务规则实现中非常有用
窗口函数虽然强大,但在以下场景可能需要替代方案:
通过自连接可以保留原始行同时获取聚合信息:
sql复制SELECT
u.*,
g.user_count,
g.total_phone_id
FROM
user u
JOIN (
SELECT
sex,
COUNT(*) AS user_count,
SUM(phone_id) AS total_phone_id
FROM user
GROUP BY sex
) g ON u.sex = g.sex;
这种方法会产生类似窗口函数COUNT(*) OVER(PARTITION BY sex)的效果,但需要注意:
对于简单的计数需求,相关子查询更简洁:
sql复制SELECT
*,
(SELECT COUNT(*) FROM user u2 WHERE u2.sex = u1.sex) AS same_sex_count
FROM
user u1;
优点:
要使GROUP BY高效工作,合理的索引设计至关重要:
sql复制-- 为user表添加优化索引
CREATE INDEX idx_user_sex ON user(sex);
CREATE INDEX idx_user_sex_phone ON user(sex, phone_id);
使用EXPLAIN检查查询效率:
sql复制EXPLAIN SELECT sex, COUNT(*) FROM user GROUP BY sex;
关注以下指标:
当处理百万级以上数据时:
可能原因:
解决方案:
sql复制-- 统一处理大小写
SELECT LOWER(sex) AS norm_sex, COUNT(*)
FROM user
GROUP BY LOWER(sex);
-- 处理NULL值
SELECT COALESCE(sex, 'unknown') AS sex, COUNT(*)
FROM user
GROUP BY COALESCE(sex, 'unknown');
典型场景:
处理方法:
sql复制-- 更新统计信息
ANALYZE TABLE user;
-- 重建索引
ALTER TABLE user REBUILD INDEX idx_user_sex;
调试技巧:
sql复制-- 使用CTE分步调试
WITH sex_counts AS (
SELECT sex, COUNT(*) AS cnt
FROM user
GROUP BY sex
),
lock_status AS (
SELECT
sex,
CASE
WHEN cnt = SUM(CASE WHEN sex = '1' THEN 1 ELSE 0 END) OVER() THEN 'lock'
-- 其他条件
END AS status
FROM sex_counts
)
SELECT * FROM lock_status;
如果数据库支持(MySQL 8.0+, PostgreSQL, SQL Server等),窗口函数是最佳选择:
sql复制SELECT
*,
COUNT(*) OVER(PARTITION BY sex) AS same_sex_count,
SUM(phone_id) OVER(PARTITION BY sex) AS total_phone_id
FROM user;
优势:
通用表表达式(CTE)可以大幅提升复杂查询的可维护性:
sql复制WITH user_stats AS (
SELECT
sex,
COUNT(*) AS user_count,
SUM(phone_id) AS total_phone
FROM user
GROUP BY sex
)
SELECT
u.*,
us.user_count,
us.total_phone
FROM
user u
JOIN
user_stats us ON u.sex = us.sex;
PostgreSQL等支持更简洁的FILTER语法:
sql复制SELECT
sex,
COUNT(*) FILTER (WHERE phone_id > 3) AS big_phone_users,
COUNT(*) AS total_users
FROM user
GROUP BY sex;
这种语法比CASE WHEN表达式更直观,但目前MySQL还不支持。