核心概念
子查询和临时表是复杂查询的常用技术,但使用不当会导致严重的性能问题。
子查询分类
| 类型 | 说明 | 性能 |
|---|---|---|
| 标量子查询 | 返回单个值 | 较好 |
| 列子查询 | 返回单列多行 | 一般 |
| 行子查询 | 返回单行多列 | 较好 |
| 表子查询 | 返回多行多列 | 需注意 |
| 相关子查询 | 依赖外层查询 | 最差 |
临时表分类
| 类型 | 创建方式 | 特点 |
|---|---|---|
| 隐式临时表 | MySQL 自动创建 | USING temporary |
| 显式临时表 | CREATE TEMPORARY TABLE | 手动控制 |
| 内存临时表 | 小结果集 | 速度快 |
| 磁盘临时表 | 大结果集 | 受 tmp_table_size 限制 |
子查询优化
1. 相关子查询优化
问题: 相关子查询每行执行一次,性能极差。
-- 错误:相关子查询(每行执行一次)
SELECT u.*,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) AS order_count,
(SELECT SUM(amount) FROM orders WHERE user_id = u.id) AS total_amount,
(SELECT MAX(create_time) FROM orders WHERE user_id = u.id) AS last_order_time
FROM users u;
-- 10 万用户 × 3 次子查询 = 30 万次查询
-- 优化 1:JOIN 聚合
SELECT u.*,
COALESCE(t.order_count, 0) AS order_count,
COALESCE(t.total_amount, 0) AS total_amount,
t.last_order_time
FROM users u
LEFT JOIN (
SELECT user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
MAX(create_time) AS last_order_time
FROM orders
GROUP BY user_id
) t ON u.id = t.user_id;
-- 1 次聚合查询
-- 优化 2:窗口函数(MySQL 8.0+)
SELECT u.*,
COUNT(o.id) OVER (PARTITION BY o.user_id) AS order_count,
SUM(o.amount) OVER (PARTITION BY o.user_id) AS total_amount,
MAX(o.create_time) OVER (PARTITION BY o.user_id) AS last_order_time
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
性能对比:
- 相关子查询:30000ms
- JOIN 聚合:50ms
- 提升 600 倍
2. IN 子查询优化
-- IN 子查询(MySQL 5.6+ 已优化)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 100);
-- 优化 1:EXISTS(某些场景更优)
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 100
);
-- 优化 2:JOIN(通常最优)
SELECT DISTINCT u.* FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
-- 优化 3:半连接(MySQL 自动优化)
-- MySQL 优化器会自动将 IN 子查询转为半连接
执行计划对比:
EXPLAIN SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 100);
-- type: index_subquery(MySQL 优化)
EXPLAIN SELECT DISTINCT u.* FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
-- type: ref(通常更快)
3. NOT IN 子查询优化
-- NOT IN(性能差,且有 NULL 陷阱)
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM orders);
-- 问题:如果 orders.user_id 有 NULL,结果为空
-- 优化 1:NOT EXISTS(推荐)
SELECT * FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- 优化 2:LEFT JOIN(推荐)
SELECT u.* FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.user_id IS NULL;
-- 优化 3:NOT IN + NOT NULL
SELECT * FROM users
WHERE id NOT IN (
SELECT user_id FROM orders WHERE user_id IS NOT NULL
);
性能对比:
- NOT IN:10000ms(且有 NULL 风险)
- NOT EXISTS:50ms
- LEFT JOIN:30ms
4. ANY/SOME/ALL 子查询优化
-- ANY / SOME
SELECT * FROM products
WHERE price > ANY (SELECT price FROM products WHERE category_id = 1);
-- 优化:使用 MIN/MAX
SELECT * FROM products
WHERE price > (SELECT MIN(price) FROM products WHERE category_id = 1);
-- ALL
SELECT * FROM products
WHERE price > ALL (SELECT price FROM products WHERE category_id = 1);
-- 优化:使用 MAX
SELECT * FROM products
WHERE price > (SELECT MAX(price) FROM products WHERE category_id = 1);
5. 派生表优化
-- 派生表(FROM 子句中的子查询)
SELECT t.user_id, t.order_count
FROM (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
) t
WHERE t.order_count > 5;
-- MySQL 5.7+:派生表物化
-- 子查询结果物化为临时表,只执行一次
-- 优化:直接写查询
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
临时表优化
1. 隐式临时表场景
-- 以下场景 MySQL 会自动创建临时表
-- 1. GROUP BY 无法使用索引
SELECT user_id, COUNT(*) FROM orders GROUP BY user_id;
-- Extra: Using temporary
-- 2. ORDER BY 与 GROUP BY 字段不同
SELECT user_id, COUNT(*) AS cnt
FROM orders
GROUP BY user_id
ORDER BY cnt DESC;
-- Extra: Using temporary; Using filesort
-- 3. DISTINCT
SELECT DISTINCT user_id FROM orders;
-- Extra: Using temporary
-- 4. UNION(非 UNION ALL)
SELECT user_id FROM orders
UNION
SELECT user_id FROM users;
-- Extra: Using temporary
-- 5. 子查询在某些版本
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);
-- 可能创建临时表
2. 临时表性能影响
-- 查看临时表使用情况
SHOW STATUS LIKE 'Created_tmp%';
-- Created_tmp_tables: 创建的临时表总数
-- Created_tmp_disk_tables: 磁盘临时表数
-- Created_tmp_files: 创建的临时文件数
-- 理想状态:
-- Created_tmp_disk_tables / Created_tmp_tables < 0.1
-- 即磁盘临时表占比 < 10%
优化临时表配置:
-- 增大内存临时表限制
SET GLOBAL tmp_table_size = 256M;
SET GLOBAL max_heap_table_size = 256M;
-- 临时表存储引擎(MySQL 5.7+)
SET GLOBAL default_tmp_storage_engine = 'MEMORY';
-- 临时文件目录(使用 SSD)
-- my.cnf: tmpdir = /mnt/ssd/tmp
3. 显式临时表使用
场景: 复杂多步查询,中间结果复用。
-- 创建临时表
CREATE TEMPORARY TABLE tmp_user_stats (
user_id BIGINT PRIMARY KEY,
order_count INT,
total_amount DECIMAL(10, 2),
last_order_time DATETIME,
INDEX idx_count (order_count DESC)
) ENGINE=MEMORY;
-- 填充数据
INSERT INTO tmp_user_stats
SELECT user_id, COUNT(*), SUM(amount), MAX(create_time)
FROM orders
GROUP BY user_id;
-- 多次使用临时表
SELECT u.*, t.order_count, t.total_amount
FROM users u
INNER JOIN tmp_user_stats t ON u.id = t.user_id
WHERE t.order_count > 5;
SELECT u.*, t.order_count
FROM users u
INNER JOIN tmp_user_stats t ON u.id = t.user_id
WHERE t.total_amount > 1000;
-- 临时表自动销毁(会话结束)
-- 或手动销毁
DROP TEMPORARY TABLE tmp_user_stats;
适用场景:
- 复杂多步查询
- 中间结果复用多次
- 需要索引优化中间结果
4. 临时表 vs 派生表
-- 派生表(子查询)
SELECT t.user_id, t.order_count
FROM (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
) t
WHERE t.order_count > 5;
-- 每次执行都重新计算
-- 临时表
CREATE TEMPORARY TABLE tmp_stats AS
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id;
SELECT user_id, order_count FROM tmp_stats WHERE order_count > 5;
SELECT user_id, order_count FROM tmp_stats WHERE order_count > 10;
-- 物化一次,多次使用
选择原则:
- 使用 1 次:派生表
- 使用多次:临时表
物化视图优化
MySQL 8.0+ 物化视图
-- 创建物化视图
CREATE TABLE user_order_stats (
user_id BIGINT PRIMARY KEY,
order_count INT,
total_amount DECIMAL(10, 2),
last_order_time DATETIME,
INDEX idx_count (order_count DESC),
INDEX idx_amount (total_amount DESC)
) ENGINE=InnoDB;
-- 初始数据
INSERT INTO user_order_stats
SELECT user_id, COUNT(*), SUM(amount), MAX(create_time)
FROM orders
GROUP BY user_id;
-- 增量更新(使用 ON DUPLICATE KEY UPDATE)
INSERT INTO user_order_stats
SELECT user_id, 1 AS order_count, amount, create_time
FROM new_orders
ON DUPLICATE KEY UPDATE
order_count = order_count + 1,
total_amount = total_amount + VALUES(amount),
last_order_time = GREATEST(last_order_time, VALUES(create_time));
-- 查询物化视图
SELECT u.*, t.order_count, t.total_amount
FROM users u
INNER JOIN user_order_stats t ON u.id = t.user_id
WHERE t.order_count > 5
ORDER BY t.total_amount DESC
LIMIT 0, 20;
物化视图刷新策略
-- 1. 定时全量刷新
CREATE EVENT refresh_user_stats
ON SCHEDULE EVERY 1 HOUR
DO
TRUNCATE TABLE user_order_stats;
INSERT INTO user_order_stats
SELECT user_id, COUNT(*), SUM(amount), MAX(create_time)
FROM orders
GROUP BY user_id;
-- 2. 触发器增量刷新
DELIMITER $$
CREATE TRIGGER trg_order_after_insert
AFTER INSERT ON orders
FOR EACH ROW
BEGIN
INSERT INTO user_order_stats (user_id, order_count, total_amount, last_order_time)
VALUES (NEW.user_id, 1, NEW.amount, NEW.create_time)
ON DUPLICATE KEY UPDATE
order_count = order_count + 1,
total_amount = total_amount + NEW.amount,
last_order_time = GREATEST(last_order_time, NEW.create_time);
END$$
DELIMITER ;
-- 3. 应用层维护
-- 在订单创建/更新时同步更新物化视图
实战案例
案例 1:用户行为分析优化
原始 SQL:
SELECT u.id, u.name,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) AS order_count,
(SELECT SUM(amount) FROM orders WHERE user_id = u.id) AS total_amount,
(SELECT COUNT(DISTINCT product_id) FROM orders WHERE user_id = u.id) AS product_count,
(SELECT MAX(create_time) FROM orders WHERE user_id = u.id) AS last_order_time,
(SELECT COUNT(*) FROM comments WHERE user_id = u.id) AS comment_count
FROM users u
WHERE u.status = 1;
-- 5 个相关子查询,性能极差
优化方案:
-- 方案 1:JOIN 聚合
SELECT u.id, u.name,
COALESCE(o.order_count, 0) AS order_count,
COALESCE(o.total_amount, 0) AS total_amount,
COALESCE(o.product_count, 0) AS product_count,
o.last_order_time,
COALESCE(c.comment_count, 0) AS comment_count
FROM users u
LEFT JOIN (
SELECT user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
COUNT(DISTINCT product_id) AS product_count,
MAX(create_time) AS last_order_time
FROM orders
GROUP BY user_id
) o ON u.id = o.user_id
LEFT JOIN (
SELECT user_id, COUNT(*) AS comment_count
FROM comments
GROUP BY user_id
) c ON u.id = c.user_id
WHERE u.status = 1;
-- 方案 2:物化视图
-- 创建 user_stats 表存储聚合数据
-- 定期刷新或增量更新
案例 2:复杂报表查询优化
原始 SQL:
-- 多步骤复杂查询
SELECT
DATE(o.create_time) AS date,
u.city,
COUNT(*) AS order_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount
FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.create_time BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY DATE(o.create_time), u.city
ORDER BY date, total_amount DESC;
-- 问题:GROUP BY 表达式,无法使用索引
优化方案:
-- 方案 1:临时表
CREATE TEMPORARY TABLE tmp_orders (
order_date DATE,
city VARCHAR(50),
amount DECIMAL(10, 2),
INDEX idx_date_city (order_date, city)
);
INSERT INTO tmp_orders
SELECT DATE(o.create_time), u.city, o.amount
FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.create_time BETWEEN '2024-01-01' AND '2024-01-31';
SELECT order_date AS date, city,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount
FROM tmp_orders
GROUP BY order_date, city
ORDER BY date, total_amount DESC;
-- 方案 2:添加计算列
ALTER TABLE orders ADD COLUMN order_date DATE;
UPDATE orders SET order_date = DATE(create_time);
CREATE INDEX idx_order_date ON orders(order_date);
SELECT order_date, u.city,
COUNT(*) AS order_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount
FROM orders o
INNER JOIN users u ON o.id = o.user_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY order_date, u.city
ORDER BY order_date, total_amount DESC;
案例 3:批量数据处理优化
问题: 批量更新 100 万条数据,逐条更新太慢。
-- 错误:逐条更新(100 万次)
UPDATE orders SET status = 1 WHERE id = 1;
UPDATE orders SET status = 1 WHERE id = 2;
...
-- 优化 1:临时表 + JOIN
CREATE TEMPORARY TABLE tmp_order_ids (
id BIGINT PRIMARY KEY
);
-- 批量插入 ID
INSERT INTO tmp_order_ids VALUES (1), (2), (3), ...;
-- JOIN 更新
UPDATE orders o
INNER JOIN tmp_order_ids t ON o.id = t.id
SET o.status = 1;
-- 优化 2:CASE 语句
UPDATE orders SET status = CASE id
WHEN 1 THEN 1
WHEN 2 THEN 1
WHEN 3 THEN 1
...
END
WHERE id IN (1, 2, 3, ...);
案例 4:去重查询优化
-- 错误:DISTINCT 性能差
SELECT DISTINCT user_id, product_id FROM orders;
-- Extra: Using temporary
-- 优化 1:GROUP BY
SELECT user_id, product_id FROM orders GROUP BY user_id, product_id;
-- 仍然使用临时表,但语义更清晰
-- 优化 2:索引
CREATE INDEX idx_user_product ON orders(user_id, product_id);
SELECT DISTINCT user_id, product_id FROM orders;
-- Extra: Using index(覆盖索引,无需临时表)
最佳实践总结
子查询优化原则
-
避免相关子查询
- 改为 JOIN 聚合
- 使用窗口函数(MySQL 8.0+)
-
IN 子查询
- MySQL 5.6+ 已优化
- 可考虑 EXISTS 或 JOIN
-
NOT IN 子查询
- 使用 NOT EXISTS
- 或 LEFT JOIN … IS NULL
-
派生表
- 确保有合适的索引
- 考虑物化为临时表
临时表使用指南
使用临时表场景:
1. 中间结果复用多次
2. 复杂多步查询
3. 需要索引优化中间结果
避免临时表场景:
1. 简单查询
2. 结果集很小
3. 只使用一次
配置优化
[mysqld]
# 临时表配置
tmp_table_size = 256M
max_heap_table_size = 256M
# 临时文件目录(使用 SSD)
tmpdir = /mnt/ssd/tmp
# 监控临时表使用
# SHOW STATUS LIKE 'Created_tmp%';
参考资料
- MySQL 官方文档 - 子查询优化
- MySQL 官方文档 - 临时表
- 《高性能 MySQL》第 4 章:查询性能优化