核心概念
JOIN 是数据库中最常用的操作之一,用于将多个表的数据按照关联条件组合在一起。JOIN 性能直接影响查询效率。
JOIN 类型
| JOIN 类型 | 说明 | 返回结果 |
|---|---|---|
| INNER JOIN | 内连接 | 只返回匹配的行 |
| LEFT JOIN | 左连接 | 返回左表所有行 + 右表匹配行 |
| RIGHT JOIN | 右连接 | 返回右表所有行 + 左表匹配行 |
| FULL JOIN | 全连接 | 返回所有匹配和不匹配的行(MySQL 不支持) |
| CROSS JOIN | 交叉连接 | 笛卡尔积 |
MySQL JOIN 实现原理
MySQL 只支持 Nested Loop Join 算法:
1. Simple Nested Loop Join (SNLJ)
2. Block Nested Loop Join (BNLJ)
3. Index Nested Loop Join (INLJ)
优化器选择策略:
- 有索引:INLJ(最优)
- 无索引:BNLJ
- 小表:SNLJ
JOIN 底层原理
Nested Loop Join 执行流程
FOR each row in outer_table
FOR each row in inner_table
IF rows match ON condition
RETURN joined row
Block Nested Loop Join
当连接字段无索引时使用:
1. 将外层表数据读入 join buffer
2. 扫描内层表,与 buffer 中数据匹配
3. 返回匹配结果
join_buffer_size 影响性能:
- buffer 越大,扫描内层表次数越少
- 默认 256KB,可调大到 4MB+
-- 查看 join buffer 大小
SHOW VARIABLES LIKE 'join_buffer_size';
-- 调整(全局)
SET GLOBAL join_buffer_size = 4194304; -- 4MB
-- 调整(会话)
SET SESSION join_buffer_size = 4194304;
Index Nested Loop Join
当连接字段有索引时使用:
1. 遍历外层表
2. 使用索引查找内层表匹配行
3. 返回结果
性能最优,避免全表扫描
EXPLAIN 分析 JOIN
关键字段解读
EXPLAIN SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.status = 1;
| 字段 | 说明 | 优化目标 |
|---|---|---|
| id | 执行顺序 | 相同 id 从上到下 |
| select_type | 查询类型 | SIMPLE 最优 |
| table | 表名 | - |
| type | 访问类型 | system > const > eq_ref > ref > range > index > ALL |
| possible_keys | 可能使用的索引 | - |
| key | 实际使用的索引 | 有值为优 |
| key_len | 索引长度 | 越短越好 |
| ref | 关联字段 | - |
| rows | 扫描行数 | 越少越好 |
| Extra | 额外信息 | Using index 最优 |
type 字段详解
-- system:常量表(最优)
EXPLAIN SELECT * FROM (SELECT 1 AS id) t;
-- const:主键或唯一索引等值查询
EXPLAIN SELECT * FROM users WHERE id = 1;
-- eq_ref:主键或唯一索引关联
EXPLAIN SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- ref:非唯一索引关联
EXPLAIN SELECT * FROM orders WHERE user_id = 1;
-- range:范围查询
EXPLAIN SELECT * FROM orders WHERE id > 100;
-- index:索引全扫描
EXPLAIN SELECT id FROM users;
-- ALL:全表扫描(最差)
EXPLAIN SELECT * FROM users WHERE name = '张三';
Extra 字段关键信息
| Extra 信息 | 含义 | 优化建议 |
|---|---|---|
| Using index | 覆盖索引 | 无需优化 |
| Using where | 使用 WHERE 过滤 | 正常 |
| Using index condition | 索引下推 | 无需优化 |
| Using temporary | 使用临时表 | 需要优化 |
| Using filesort | 文件排序 | 需要优化 |
| Using join buffer | 使用连接缓冲 | 添加索引 |
JOIN 优化策略
策略 1:小表驱动大表
-- 原则:数据量小的表作为驱动表
-- 错误:大表驱动小表
-- orders(1000 万) LEFT JOIN users(10 万)
SELECT * FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE o.status = 1;
-- 正确:小表驱动大表
-- users(10 万) LEFT JOIN orders(1000 万)
SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.status = 1;
优化器自动调整:
-- MySQL 优化器会自动选择小表作为驱动表
-- 但建议显式写出正确顺序
-- 查看优化器选择
EXPLAIN SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id;
策略 2:确保关联字段有索引
-- 错误:关联字段无索引
-- orders.user_id 无索引
SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- type: ALL(全表扫描)
-- 正确:添加索引
CREATE INDEX idx_user_id ON orders(user_id);
SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- type: ref(索引查找)
索引设计原则:
- 关联字段必须建索引
- 外键字段必须建索引
- 复合索引考虑最左前缀
策略 3:避免多表 JOIN
-- 错误:JOIN 表过多
SELECT o.*, u.name, p.title, c.name, s.status_name,
w.warehouse_name, sh.shipping_name
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
JOIN categories c ON p.category_id = c.id
JOIN order_status s ON o.status = s.id
JOIN warehouses w ON o.warehouse_id = w.id
JOIN shipping sh ON o.shipping_id = sh.id
WHERE o.create_time > '2024-01-01';
-- 正确:拆分查询
-- 查询 1:订单
SELECT * FROM orders WHERE create_time > '2024-01-01';
-- 查询 2:批量获取关联数据
SELECT * FROM users WHERE id IN (...);
SELECT * FROM products WHERE id IN (...);
SELECT * FROM categories WHERE id IN (...);
...
-- 应用层组装
建议:
- JOIN 表数量 ≤ 3 个
- 超过 3 个考虑拆分
策略 4:ON 与 WHERE 条件放置
-- LEFT JOIN 场景
-- 错误:过滤条件放 WHERE(变成 INNER JOIN)
SELECT * FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.status = 1; -- 过滤了 users,LEFT JOIN 失效
-- 正确:过滤条件放 ON
SELECT * FROM users u
LEFT JOIN orders o ON u.id = o.user_id AND o.status = 1;
-- INNER JOIN 场景(ON 和 WHERE 等价)
SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.status = 1; -- 可以
策略 5:使用覆盖索引
-- 创建覆盖索引
CREATE INDEX idx_user_order ON orders(user_id, status, create_time);
-- 查询只使用索引列
SELECT user_id, status, create_time FROM orders
WHERE status = 1;
-- Extra: Using index(覆盖索引)
-- JOIN 场景
SELECT o.user_id, o.status, u.name FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.status = 1;
-- orders 使用覆盖索引
策略 6:优化 JOIN 顺序
-- 三表 JOIN
-- 错误:优化器可能选择次优顺序
SELECT * FROM a
JOIN b ON a.id = b.a_id
JOIN c ON b.id = c.b_id
WHERE a.status = 1;
-- 正确:使用 STRAIGHT_JOIN 强制顺序
SELECT STRAIGHT_JOIN * FROM a
JOIN b ON a.id = b.a_id
JOIN c ON b.id = c.b_id
WHERE a.status = 1;
-- 强制 a → b → c 的顺序
使用场景:
- 确定优化器选择错误时
- 需要固定执行顺序时
典型 JOIN 场景优化
场景 1:分页 + JOIN
-- 错误:深度分页 + JOIN
SELECT o.*, u.name FROM orders o
LEFT JOIN users u ON o.user_id = u.id
ORDER BY o.create_time DESC
LIMIT 1000000, 10;
-- 优化 1:延迟关联
SELECT o.*, u.name FROM orders o
INNER JOIN (
SELECT id FROM orders
ORDER BY create_time DESC
LIMIT 1000000, 10
) tmp ON o.id = tmp.id
LEFT JOIN users u ON o.user_id = u.id;
-- 优化 2:游标分页
SELECT o.*, u.name FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE o.create_time < '2024-01-01 10:00:00'
ORDER BY o.create_time DESC
LIMIT 10;
场景 2:COUNT + JOIN
-- 错误:JOIN 后 COUNT
SELECT COUNT(*) FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 1;
-- 优化:子查询
SELECT COUNT(*) FROM orders
WHERE user_id IN (
SELECT id FROM users WHERE status = 1
);
-- 或:先过滤再 JOIN
SELECT COUNT(*) FROM (
SELECT o.id FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 1
) tmp;
场景 3:GROUP BY + JOIN
-- 错误:先 JOIN 再分组
SELECT u.id, u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
-- 优化:先分组再 JOIN
SELECT u.id, u.name, COALESCE(t.order_count, 0) AS order_count
FROM users u
LEFT JOIN (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
) t ON u.id = t.user_id;
场景 4:ORDER BY + JOIN
-- 错误:排序字段不在驱动表
SELECT o.*, u.name FROM orders o
LEFT JOIN users u ON o.user_id = u.id
ORDER BY u.name; -- filesort
-- 优化:排序字段在驱动表
SELECT o.*, u.name FROM users u
INNER JOIN orders o ON u.id = o.user_id
ORDER BY u.name; -- 使用 users 索引
场景 5:IN 子查询转 JOIN
-- 子查询(MySQL 5.6+ 已优化)
SELECT * FROM users
WHERE id IN (
SELECT user_id FROM orders WHERE amount > 100
);
-- JOIN(通常更快)
SELECT DISTINCT u.* FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
-- EXISTS(某些场景最优)
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 100
);
场景 6:LEFT JOIN 转 INNER JOIN
-- LEFT JOIN 但 WHERE 过滤右表(实际是 INNER JOIN)
SELECT * FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NOT NULL; -- 过滤了 NULL
-- 直接写 INNER JOIN
SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id;
JOIN 性能对比
不同 JOIN 方式性能对比
-- 测试数据:users(10 万), orders(100 万)
-- 1. INNER JOIN(最优)
SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- 耗时:0.5s
-- 2. LEFT JOIN
SELECT * FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- 耗时:0.6s
-- 3. 子查询 IN
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);
-- 耗时:2.5s
-- 4. EXISTS
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- 耗时:1.8s
索引对 JOIN 性能影响
-- 无索引
SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- type: ALL, 耗时:10s
-- 添加索引
CREATE INDEX idx_user_id ON orders(user_id);
SELECT * FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- type: ref, 耗时:0.5s
-- 性能提升:20 倍
实战案例
案例 1:电商订单查询优化
原始 SQL:
SELECT o.*, u.name, u.phone, p.title, p.price,
c.name AS category_name, s.status_name
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN products p ON o.product_id = p.id
LEFT JOIN categories c ON p.category_id = c.id
LEFT JOIN order_status s ON o.status = s.id
WHERE o.create_time > '2024-01-01'
ORDER BY o.create_time DESC
LIMIT 0, 20;
问题分析:
- JOIN 表过多(5 个)
- SELECT 字段过多
- 深度分页问题
优化方案:
-- 方案 1:拆分查询
-- 查询 1:订单列表
SELECT id, user_id, product_id, status, create_time, amount
FROM orders
WHERE create_time > '2024-01-01'
ORDER BY create_time DESC
LIMIT 0, 20;
-- 查询 2:批量获取用户
SELECT id, name, phone FROM users WHERE id IN (...);
-- 查询 3:批量获取商品
SELECT id, title, price FROM products WHERE id IN (...);
-- 查询 4:批量获取分类
SELECT id, name FROM categories WHERE id IN (...);
-- 应用层组装
-- 方案 2:减少 JOIN
SELECT o.id, o.amount, o.create_time, u.name, p.title
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN products p ON o.product_id = p.id
WHERE o.create_time > '2024-01-01'
ORDER BY o.create_time DESC
LIMIT 0, 20;
案例 2:用户行为分析优化
原始 SQL:
SELECT u.id, u.name, COUNT(o.id) AS order_count,
SUM(o.amount) AS total_amount,
MAX(o.create_time) AS last_order_time
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 1
GROUP BY u.id, u.name
HAVING order_count > 5
ORDER BY total_amount DESC
LIMIT 0, 100;
优化方案:
-- 方案 1:子查询优化
SELECT u.id, u.name,
COALESCE(t.order_count, 0) AS order_count,
COALESCE(t.total_amount, 0) AS total_amount,
t.last_order_time
FROM users u
LEFT JOIN (
SELECT user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
MAX(create_time) AS last_order_time
FROM orders
GROUP BY user_id
HAVING order_count > 5
) t ON u.id = t.user_id
WHERE u.status = 1
ORDER BY total_amount DESC
LIMIT 0, 100;
-- 方案 2:物化视图(MySQL 8.0+)
CREATE TABLE user_order_stats (
user_id BIGINT PRIMARY KEY,
order_count INT,
total_amount DECIMAL(10, 2),
last_order_time DATETIME
);
-- 定期刷新
INSERT INTO user_order_stats
SELECT user_id, COUNT(*), SUM(amount), MAX(create_time)
FROM orders
GROUP BY user_id
ON DUPLICATE KEY UPDATE
order_count = VALUES(order_count),
total_amount = VALUES(total_amount),
last_order_time = VALUES(last_order_time);
-- 查询
SELECT u.id, u.name, t.order_count, t.total_amount, t.last_order_time
FROM users u
INNER JOIN user_order_stats t ON u.id = t.user_id
WHERE u.status = 1 AND t.order_count > 5
ORDER BY t.total_amount DESC
LIMIT 0, 100;
案例 3:日志关联查询优化
问题: 关联 7 个表查询日志详情,耗时 30s+
原始 SQL:
SELECT l.*, u.name, r.path, m.method, p.name AS param_name,
h.header_name, resp.body
FROM logs l
LEFT JOIN users u ON l.user_id = u.id
LEFT JOIN routes r ON l.route_id = r.id
LEFT JOIN methods m ON l.method_id = m.id
LEFT JOIN params p ON l.id = p.log_id
LEFT JOIN headers h ON l.id = h.log_id
LEFT JOIN responses resp ON l.id = resp.log_id
WHERE l.create_time > '2024-01-01';
优化方案:
-- 方案:宽表 + 搜索引擎
-- 1. 创建日志宽表
CREATE TABLE logs_wide (
id BIGINT PRIMARY KEY,
user_id BIGINT,
user_name VARCHAR(50),
route_path VARCHAR(200),
method_name VARCHAR(10),
param_json JSON,
header_json JSON,
response_body TEXT,
create_time DATETIME,
INDEX idx_create_time (create_time),
INDEX idx_user_id (user_id)
);
-- 2. 异步写入宽表
-- 应用层组装后写入 logs_wide
-- 3. 查询宽表
SELECT * FROM logs_wide WHERE create_time > '2024-01-01';
-- 4. 复杂查询使用 Elasticsearch
-- 日志同步到 ES,使用 ES 查询
最佳实践总结
JOIN 优化清单
-
索引优化
- 关联字段有索引
- 使用覆盖索引
- 避免索引失效
-
表顺序优化
- 小表驱动大表
- 使用 STRAIGHT_JOIN 强制顺序(必要时)
-
查询结构优化
- JOIN 表 ≤ 3 个
- ON 和 WHERE 条件正确放置
- 避免 SELECT *
-
分页优化
- 使用延迟关联
- 游标分页代替 OFFSET
-
执行计划检查
- type 至少是 range
- 避免 Using temporary
- 避免 Using filesort
性能监控
-- 慢查询日志
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
-- 查看慢查询
SELECT * FROM mysql.slow_log;
-- 实时性能监控
SHOW FULL PROCESSLIST;
SHOW ENGINE INNODB STATUS;
参考资料
- MySQL 官方文档 - JOIN 优化
- 《高性能 MySQL》第 4 章:查询性能优化
- 《MySQL 技术内幕:InnoDB 存储引擎》第 8 章