作为一名长期奋战在数据库开发一线的工程师,我深知数据查询语言(DQL)是每个开发者必须掌握的看家本领。今天我将结合多年实战经验,系统梳理MySQL中SELECT语句的完整语法结构,并分享那些官方文档不会告诉你的性能优化技巧。
基础SELECT语句的完整结构如下,这个框架需要像乘法口诀表一样烂熟于心:
sql复制SELECT [ALL | DISTINCT] # 控制结果集去重
{ * | table.* | [field1 [AS alias1], field2 [AS alias2]] } # 字段选择
FROM table_name [AS table_alias] # 数据来源
[LEFT|RIGHT|INNER JOIN table_name2] # 多表关联
[WHERE ...] # 数据过滤条件
[GROUP BY ...] # 分组依据
[HAVING ...] # 分组后过滤
[ORDER BY ...] # 结果排序
[LIMIT offset, row_count] # 结果分页
关键记忆点:执行顺序是FROM→WHERE→GROUP BY→HAVING→SELECT→ORDER BY→LIMIT,这个顺序直接影响查询性能和结果正确性
新手常犯的错误就是滥用SELECT *,这会导致:
sql复制-- 反面教材(性能杀手)
SELECT * FROM employees;
-- 专业写法(明确列出所需字段)
SELECT employee_id, first_name, department
FROM employees;
给字段或表起别名不只是为了缩短名称:
sql复制-- 常规别名(AS可省略)
SELECT emp_no AS id, CONCAT(first_name,' ',last_name) fullname
FROM employees e;
-- 表别名在多表查询中的必要性
SELECT e.emp_no, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
DISTINCT看似简单实则暗藏玄机:
sql复制-- 单列去重(推荐)
SELECT DISTINCT department FROM employees;
-- 多列去重(性能敏感)
-- 以下两种方式效果相同,但GROUP BY在大数据量时更优
SELECT DISTINCT department, job_title FROM employees;
SELECT department, job_title
FROM employees
GROUP BY department, job_title;
UNION操作必须知道的三个要点:
sql复制-- 合并不同结构的查询结果(列数相同即可)
SELECT product_id, product_name FROM products
UNION ALL
SELECT NULL, category_name FROM categories;
-- 性能对比测试(百万级数据)
-- UNION ALL: 0.8秒 | UNION: 1.3秒
字段计算要避免的三大性能坑:
sql复制-- 不推荐(无法使用索引)
SELECT * FROM orders
WHERE YEAR(order_date) = 2023;
-- 推荐写法(范围查询可利用索引)
SELECT * FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';
-- NULL处理示范
SELECT
order_id,
amount,
discount,
amount * IFNULL(discount, 1) AS final_amount
FROM orders;
数据库中的NULL是个"黑洞",需要特别注意:
sql复制-- 常见错误
SELECT * FROM customers WHERE phone = NULL; -- 错误!永远无结果
-- 正确做法
SELECT * FROM customers WHERE phone IS NULL;
-- 索引使用提示:MySQL中IS NULL可以使用索引,但IS NOT NULL通常不能
BETWEEN和比较运算符的选择标准:
sql复制-- 价格区间查询(包含边界)
SELECT * FROM products
WHERE price BETWEEN 100 AND 500;
-- 日期范围查询(注意时分秒)
SELECT * FROM orders
WHERE create_time BETWEEN '2023-01-01 00:00:00' AND '2023-01-31 23:59:59';
IN子句的底层实现方式:
sql复制-- 小列表(性能良好)
SELECT * FROM employees
WHERE dept_id IN (10, 20, 30);
-- 大列表优化方案
WITH dept_ids AS (
SELECT dept_id FROM target_departments WHERE ...
)
SELECT e.* FROM employees e
JOIN dept_ids d ON e.dept_id = d.dept_id;
模糊查询优化四部曲:
sql复制-- 无法使用索引(全表扫描)
SELECT * FROM products
WHERE name LIKE '%apple%';
-- 可以使用索引(推荐)
SELECT * FROM products
WHERE name LIKE 'apple%';
-- 精确字符匹配
SELECT * FROM products
WHERE name LIKE 'MacBook _ir%'; -- 匹配MacBook Air/MacBook Pro等
根据业务需求选择正确的连接类型:
| 连接类型 | 保留左表 | 保留右表 | 性能 | 使用场景 |
|---|---|---|---|---|
| INNER JOIN | × | × | ★★★ | 需要严格匹配的记录 |
| LEFT JOIN | √ | × | ★★ | 主表记录必须保留 |
| RIGHT JOIN | × | √ | ★★ | 从表记录必须保留 |
| FULL JOIN | √ | √ | ★ | MySQL需用UNION模拟 |
sql复制-- 经典INNER JOIN(只返回匹配记录)
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- LEFT JOIN保留主表所有记录
SELECT d.dept_name, COUNT(e.emp_id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name;
自连接常见于层级数据查询:
sql复制-- 员工与直属经理关系查询
SELECT e.emp_name, m.emp_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
-- 多级分类查询
SELECT c1.category_name, c2.category_name AS parent
FROM categories c1
LEFT JOIN categories c2 ON c1.parent_id = c2.category_id;
外键的实际使用建议:
sql复制-- 创建外键约束(保证数据完整性)
ALTER TABLE orders
ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE;
-- 禁用外键检查(批量导入时提升性能)
SET FOREIGN_KEY_CHECKS = 0;
-- 执行批量操作...
SET FOREIGN_KEY_CHECKS = 1;
对字段使用函数或运算
sql复制-- 错误示例
SELECT * FROM users WHERE YEAR(create_time) = 2023;
隐式类型转换
sql复制-- user_id是varchar类型但用了数字比较
SELECT * FROM users WHERE user_id = 123;
使用OR条件未优化
sql复制-- 优化前
SELECT * FROM products WHERE category_id = 1 OR price > 100;
-- 优化后
SELECT * FROM products WHERE category_id = 1
UNION
SELECT * FROM products WHERE price > 100;
LIKE前导通配符
sql复制SELECT * FROM articles WHERE title LIKE '%数据库%';
不合理的NOT IN
sql复制-- 大数据集NOT IN效率极低
SELECT * FROM customers
WHERE customer_id NOT IN (SELECT customer_id FROM blacklist);
索引列使用IS NOT NULL
sql复制SELECT * FROM orders WHERE coupon_code IS NOT NULL;
常见分页问题及解决方案:
sql复制-- 传统分页(越往后越慢)
SELECT * FROM large_table
ORDER BY create_time DESC
LIMIT 100000, 20; -- 需要扫描100020行
-- 优化方案1:记录上次查询位置
SELECT * FROM large_table
WHERE id > 100000 -- 记住上次看到的最后ID
ORDER BY id
LIMIT 20;
-- 优化方案2:延迟关联
SELECT t.* FROM large_table t
JOIN (
SELECT id FROM large_table
ORDER BY create_time DESC
LIMIT 100000, 20
) AS tmp ON t.id = tmp.id;
GROUP BY和ORDER BY可能导致临时表:
sql复制-- 可能导致磁盘临时表(检查Extra列)
EXPLAIN SELECT department, COUNT(*)
FROM employees
GROUP BY department;
-- 优化方案1:增加适当索引
ALTER TABLE employees ADD INDEX (department);
-- 优化方案2:调整sort_buffer_size
SET sort_buffer_size = 8*1024*1024; -- 8MB
原始低效查询:
sql复制SELECT p.*, c.category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.category_id
WHERE p.product_name LIKE '%手机%'
OR p.description LIKE '%手机%'
ORDER BY p.create_time DESC
LIMIT 0, 20;
优化后的方案:
sql复制-- 步骤1:创建全文索引
ALTER TABLE products
ADD FULLTEXT INDEX ft_search (product_name, description);
-- 步骤2:使用MATCH AGAINST语法
SELECT p.*, c.category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.category_id
WHERE MATCH(p.product_name, p.description) AGAINST('手机' IN BOOLEAN MODE)
ORDER BY p.sales_volume DESC -- 按销量排序更合理
LIMIT 0, 20;
原始报表查询(执行时间8秒):
sql复制SELECT
u.user_id,
u.user_name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.create_time BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY u.user_id
ORDER BY total_amount DESC;
优化方案:
sql复制-- 创建汇总表定期预计算
CREATE TABLE user_order_stats (
user_id INT PRIMARY KEY,
order_count INT,
total_amount DECIMAL(12,2),
last_update TIMESTAMP
);
-- 优化后的查询(执行时间0.2秒)
SELECT
u.user_id,
u.user_name,
s.order_count,
s.total_amount
FROM users u
JOIN user_order_stats s ON u.user_id = s.user_id
WHERE s.last_update >= '2023-12-31'
ORDER BY s.total_amount DESC;
在多年的数据库开发生涯中,我发现90%的SQL性能问题都源于不当的查询写法。记住:数据库不会主动犯错,它只会忠实地执行你给它的指令。每次写查询时多思考一步——这个写法会让数据库做什么?这种思维转变让我避开了无数性能陷阱。