在SQL查询中,UNION和UNION ALL都是用于合并多个SELECT语句结果集的操作符,但它们的处理逻辑存在关键差异。我们先从一个实际案例入手:假设我们需要合并两个部门的员工名单,销售部有5人(含1名重复员工),技术部有7人(含2名与销售部重复的员工)。
当使用UNION ALL合并时:
sql复制SELECT name FROM sales_dept
UNION ALL
SELECT name FROM tech_dept;
结果集会包含所有12条记录(5+7),完全保留原始数据,包括重复项。这种操作就像把两个Excel表格简单堆叠在一起,不做任何去重处理。
而使用UNION时:
sql复制SELECT name FROM sales_dept
UNION
SELECT name FROM tech_dept;
数据库会先合并结果,然后执行额外的去重操作。以上述案例为例,最终可能只返回10条记录(去除2个重复项)。这个过程类似于在Excel中使用"删除重复项"功能。
关键区别:UNION ALL保留所有记录(含重复),UNION会自动去除完全相同的行。这个差异直接影响查询性能和结果准确性。
通过EXPLAIN分析相同查询:
sql复制-- 合并多台服务器的错误日志
SELECT * FROM server1_logs WHERE level='ERROR'
UNION ALL
SELECT * FROM server2_logs WHERE level='ERROR'
sql复制-- 计算总销售额(允许重复交易)
SELECT SUM(amount) FROM (
SELECT amount FROM online_orders
UNION ALL
SELECT amount FROM offline_orders
) combined
sql复制-- 获取所有不重复的城市列表
SELECT city FROM customers
UNION
SELECT city FROM suppliers
sql复制-- 合并已去重的用户标签
SELECT DISTINCT tag FROM user_tags_v1
UNION
SELECT DISTINCT tag FROM user_tags_v2
sql复制-- 先各自去重再合并
SELECT DISTINCT col1 FROM table1
UNION ALL
SELECT DISTINCT col1 FROM table2
sql复制-- 等效但可能更高效的写法
SELECT col1 FROM (
SELECT col1 FROM table1
UNION ALL
SELECT col1 FROM table2
) temp GROUP BY col1
当合并不同数据类型的列时:
sql复制-- 错误示例
SELECT text_column FROM table1
UNION
SELECT numeric_column FROM table2
解决方案:
sql复制SELECT CAST(text_column AS CHAR) FROM table1
UNION
SELECT CAST(numeric_column AS CHAR) FROM table2
错误用法:
sql复制-- 错误:仅对最后一个查询排序
SELECT * FROM table1
UNION
SELECT * FROM table2
ORDER BY column1 LIMIT 10
正确做法:
sql复制-- 对合并结果排序
(SELECT * FROM table1)
UNION
(SELECT * FROM table2)
ORDER BY column1 LIMIT 10
重点关注:
当合并分区表时:
sql复制-- 优化前
SELECT * FROM orders_2023
UNION
SELECT * FROM orders_2022
-- 优化后(直接查询分区视图)
SELECT * FROM all_orders
WHERE year IN (2022, 2023)
在分片环境中:
需要横向扩展时用UNION:
sql复制-- 纵向合并不同查询结果
SELECT product_id FROM inventory
UNION
SELECT item_id FROM warehouse
需要关联查询时用JOIN:
sql复制-- 横向关联表数据
SELECT * FROM orders
JOIN customers ON orders.cust_id = customers.id
对于复杂UNION操作:
sql复制-- 分步处理提高可读性
CREATE TEMPORARY TABLE temp_results
SELECT col1 FROM table1 WHERE condition;
INSERT INTO temp_results
SELECT col1 FROM table2 WHERE condition;
-- 最终处理
SELECT DISTINCT * FROM temp_results;
sql复制SELECT /*+ PARALLEL(4) */ col1 FROM tab1
UNION ALL
SELECT /*+ PARALLEL(4) */ col1 FROM tab2
sql复制-- 合并正常订单与退货订单统计
SELECT
'normal' AS order_type,
COUNT(*) AS count
FROM orders
WHERE status = 'completed'
UNION ALL
SELECT
'returned' AS order_type,
COUNT(*) AS count
FROM returns
WHERE processed = true
sql复制-- 合并各渠道用户(去重)
SELECT
email,
MAX(register_date) AS last_date
FROM (
SELECT email, reg_date AS register_date FROM web_users
UNION
SELECT email, create_time FROM app_users
UNION
SELECT email, signup_date FROM wechat_users
) combined
GROUP BY email
sql复制-- 查询最近30天日志(按天分表)
SELECT * FROM logs_20230801 WHERE level='ERROR'
UNION ALL
SELECT * FROM logs_20230802 WHERE level='ERROR'
-- ...其余28天表
ORDER BY timestamp DESC
在数据仓库建设项目中,曾处理过一个典型案例:需要合并5个业务系统的用户表,初始使用UNION导致查询耗时超过15分钟。改为以下方案后降至23秒:
sql复制-- 优化方案
CREATE TABLE temp_users AS
SELECT DISTINCT user_id FROM system1.users WHERE is_active=1;
INSERT INTO temp_users
SELECT DISTINCT user_id FROM system2.customers WHERE status='active';
-- ...其他系统
-- 最终去重
SELECT user_id FROM temp_users GROUP BY user_id;
关键是要理解UNION和UNION ALL不是简单的语法差异,而是对应着完全不同的执行策略和资源消耗模式。根据业务需求合理选择,往往能获得数量级的性能提升。