Skip to content
清晨的一缕阳光
返回

MySQL JOIN 优化实战

核心概念

JOIN 是数据库中最常用的操作之一,用于将多个表的数据按照关联条件组合在一起。JOIN 性能直接影响查询效率。

JOIN 类型

JOIN 类型说明返回结果
INNER JOIN内连接只返回匹配的行
LEFT JOIN左连接返回左表所有行 + 右表匹配行
RIGHT JOIN右连接返回右表所有行 + 左表匹配行
FULL JOIN全连接返回所有匹配和不匹配的行(MySQL 不支持)
CROSS JOIN交叉连接笛卡尔积

MySQL JOIN 实现原理

MySQL 只支持 Nested Loop Join 算法:
1. Simple Nested Loop Join (SNLJ)
2. Block Nested Loop Join (BNLJ)
3. Index Nested Loop Join (INLJ)

优化器选择策略:
- 有索引:INLJ(最优)
- 无索引:BNLJ
- 小表:SNLJ

JOIN 底层原理

Nested Loop Join 执行流程

FOR each row in outer_table
    FOR each row in inner_table
        IF rows match ON condition
            RETURN joined row

Block Nested Loop Join

当连接字段无索引时使用:
1. 将外层表数据读入 join buffer
2. 扫描内层表,与 buffer 中数据匹配
3. 返回匹配结果

join_buffer_size 影响性能:
- buffer 越大,扫描内层表次数越少
- 默认 256KB,可调大到 4MB+
-- 查看 join buffer 大小
SHOW VARIABLES LIKE 'join_buffer_size';

-- 调整(全局)
SET GLOBAL join_buffer_size = 4194304;  -- 4MB

-- 调整(会话)
SET SESSION join_buffer_size = 4194304;

Index Nested Loop Join

当连接字段有索引时使用:
1. 遍历外层表
2. 使用索引查找内层表匹配行
3. 返回结果

性能最优,避免全表扫描

EXPLAIN 分析 JOIN

关键字段解读

EXPLAIN SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id 
WHERE o.status = 1;
字段说明优化目标
id执行顺序相同 id 从上到下
select_type查询类型SIMPLE 最优
table表名-
type访问类型system > const > eq_ref > ref > range > index > ALL
possible_keys可能使用的索引-
key实际使用的索引有值为优
key_len索引长度越短越好
ref关联字段-
rows扫描行数越少越好
Extra额外信息Using index 最优

type 字段详解

-- system:常量表(最优)
EXPLAIN SELECT * FROM (SELECT 1 AS id) t;

-- const:主键或唯一索引等值查询
EXPLAIN SELECT * FROM users WHERE id = 1;

-- eq_ref:主键或唯一索引关联
EXPLAIN SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id;

-- ref:非唯一索引关联
EXPLAIN SELECT * FROM orders WHERE user_id = 1;

-- range:范围查询
EXPLAIN SELECT * FROM orders WHERE id > 100;

-- index:索引全扫描
EXPLAIN SELECT id FROM users;

-- ALL:全表扫描(最差)
EXPLAIN SELECT * FROM users WHERE name = '张三';

Extra 字段关键信息

Extra 信息含义优化建议
Using index覆盖索引无需优化
Using where使用 WHERE 过滤正常
Using index condition索引下推无需优化
Using temporary使用临时表需要优化
Using filesort文件排序需要优化
Using join buffer使用连接缓冲添加索引

JOIN 优化策略

策略 1:小表驱动大表

-- 原则:数据量小的表作为驱动表

-- 错误:大表驱动小表
-- orders(1000 万) LEFT JOIN users(10 万)
SELECT * FROM orders o 
LEFT JOIN users u ON o.user_id = u.id 
WHERE o.status = 1;

-- 正确:小表驱动大表
-- users(10 万) LEFT JOIN orders(1000 万)
SELECT * FROM users u 
INNER JOIN orders o ON u.id = o.user_id 
WHERE o.status = 1;

优化器自动调整:

-- MySQL 优化器会自动选择小表作为驱动表
-- 但建议显式写出正确顺序

-- 查看优化器选择
EXPLAIN SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id;

策略 2:确保关联字段有索引

-- 错误:关联字段无索引
-- orders.user_id 无索引
SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id;
-- type: ALL(全表扫描)

-- 正确:添加索引
CREATE INDEX idx_user_id ON orders(user_id);
SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id;
-- type: ref(索引查找)

索引设计原则:

策略 3:避免多表 JOIN

-- 错误:JOIN 表过多
SELECT o.*, u.name, p.title, c.name, s.status_name, 
       w.warehouse_name, sh.shipping_name
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
JOIN categories c ON p.category_id = c.id
JOIN order_status s ON o.status = s.id
JOIN warehouses w ON o.warehouse_id = w.id
JOIN shipping sh ON o.shipping_id = sh.id
WHERE o.create_time > '2024-01-01';

-- 正确:拆分查询
-- 查询 1:订单
SELECT * FROM orders WHERE create_time > '2024-01-01';

-- 查询 2:批量获取关联数据
SELECT * FROM users WHERE id IN (...);
SELECT * FROM products WHERE id IN (...);
SELECT * FROM categories WHERE id IN (...);
...

-- 应用层组装

建议:

策略 4:ON 与 WHERE 条件放置

-- LEFT JOIN 场景

-- 错误:过滤条件放 WHERE(变成 INNER JOIN)
SELECT * FROM users u 
LEFT JOIN orders o ON u.id = o.user_id 
WHERE o.status = 1;  -- 过滤了 users,LEFT JOIN 失效

-- 正确:过滤条件放 ON
SELECT * FROM users u 
LEFT JOIN orders o ON u.id = o.user_id AND o.status = 1;

-- INNER JOIN 场景(ON 和 WHERE 等价)
SELECT * FROM users u 
INNER JOIN orders o ON u.id = o.user_id 
WHERE o.status = 1;  -- 可以

策略 5:使用覆盖索引

-- 创建覆盖索引
CREATE INDEX idx_user_order ON orders(user_id, status, create_time);

-- 查询只使用索引列
SELECT user_id, status, create_time FROM orders 
WHERE status = 1;
-- Extra: Using index(覆盖索引)

-- JOIN 场景
SELECT o.user_id, o.status, u.name FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.status = 1;
-- orders 使用覆盖索引

策略 6:优化 JOIN 顺序

-- 三表 JOIN

-- 错误:优化器可能选择次优顺序
SELECT * FROM a 
JOIN b ON a.id = b.a_id 
JOIN c ON b.id = c.b_id 
WHERE a.status = 1;

-- 正确:使用 STRAIGHT_JOIN 强制顺序
SELECT STRAIGHT_JOIN * FROM a 
JOIN b ON a.id = b.a_id 
JOIN c ON b.id = c.b_id 
WHERE a.status = 1;
-- 强制 a → b → c 的顺序

使用场景:

典型 JOIN 场景优化

场景 1:分页 + JOIN

-- 错误:深度分页 + JOIN
SELECT o.*, u.name FROM orders o 
LEFT JOIN users u ON o.user_id = u.id 
ORDER BY o.create_time DESC 
LIMIT 1000000, 10;

-- 优化 1:延迟关联
SELECT o.*, u.name FROM orders o 
INNER JOIN (
    SELECT id FROM orders 
    ORDER BY create_time DESC 
    LIMIT 1000000, 10
) tmp ON o.id = tmp.id
LEFT JOIN users u ON o.user_id = u.id;

-- 优化 2:游标分页
SELECT o.*, u.name FROM orders o 
LEFT JOIN users u ON o.user_id = u.id 
WHERE o.create_time < '2024-01-01 10:00:00'
ORDER BY o.create_time DESC 
LIMIT 10;

场景 2:COUNT + JOIN

-- 错误:JOIN 后 COUNT
SELECT COUNT(*) FROM orders o 
INNER JOIN users u ON o.user_id = u.id 
WHERE u.status = 1;

-- 优化:子查询
SELECT COUNT(*) FROM orders 
WHERE user_id IN (
    SELECT id FROM users WHERE status = 1
);

-- 或:先过滤再 JOIN
SELECT COUNT(*) FROM (
    SELECT o.id FROM orders o
    INNER JOIN users u ON o.user_id = u.id
    WHERE u.status = 1
) tmp;

场景 3:GROUP BY + JOIN

-- 错误:先 JOIN 再分组
SELECT u.id, u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

-- 优化:先分组再 JOIN
SELECT u.id, u.name, COALESCE(t.order_count, 0) AS order_count
FROM users u
LEFT JOIN (
    SELECT user_id, COUNT(*) AS order_count
    FROM orders
    GROUP BY user_id
) t ON u.id = t.user_id;

场景 4:ORDER BY + JOIN

-- 错误:排序字段不在驱动表
SELECT o.*, u.name FROM orders o 
LEFT JOIN users u ON o.user_id = u.id 
ORDER BY u.name;  -- filesort

-- 优化:排序字段在驱动表
SELECT o.*, u.name FROM users u 
INNER JOIN orders o ON u.id = o.user_id 
ORDER BY u.name;  -- 使用 users 索引

场景 5:IN 子查询转 JOIN

-- 子查询(MySQL 5.6+ 已优化)
SELECT * FROM users 
WHERE id IN (
    SELECT user_id FROM orders WHERE amount > 100
);

-- JOIN(通常更快)
SELECT DISTINCT u.* FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;

-- EXISTS(某些场景最优)
SELECT * FROM users u 
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 100
);

场景 6:LEFT JOIN 转 INNER JOIN

-- LEFT JOIN 但 WHERE 过滤右表(实际是 INNER JOIN)
SELECT * FROM users u 
LEFT JOIN orders o ON u.id = o.user_id 
WHERE o.id IS NOT NULL;  -- 过滤了 NULL

-- 直接写 INNER JOIN
SELECT * FROM users u 
INNER JOIN orders o ON u.id = o.user_id;

JOIN 性能对比

不同 JOIN 方式性能对比

-- 测试数据:users(10 万), orders(100 万)

-- 1. INNER JOIN(最优)
SELECT * FROM users u 
INNER JOIN orders o ON u.id = o.user_id;
-- 耗时:0.5s

-- 2. LEFT JOIN
SELECT * FROM users u 
LEFT JOIN orders o ON u.id = o.user_id;
-- 耗时:0.6s

-- 3. 子查询 IN
SELECT * FROM users 
WHERE id IN (SELECT user_id FROM orders);
-- 耗时:2.5s

-- 4. EXISTS
SELECT * FROM users u 
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- 耗时:1.8s

索引对 JOIN 性能影响

-- 无索引
SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id;
-- type: ALL, 耗时:10s

-- 添加索引
CREATE INDEX idx_user_id ON orders(user_id);
SELECT * FROM orders o 
INNER JOIN users u ON o.user_id = u.id;
-- type: ref, 耗时:0.5s

-- 性能提升:20 倍

实战案例

案例 1:电商订单查询优化

原始 SQL:

SELECT o.*, u.name, u.phone, p.title, p.price, 
       c.name AS category_name, s.status_name
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN products p ON o.product_id = p.id
LEFT JOIN categories c ON p.category_id = c.id
LEFT JOIN order_status s ON o.status = s.id
WHERE o.create_time > '2024-01-01'
ORDER BY o.create_time DESC
LIMIT 0, 20;

问题分析:

  1. JOIN 表过多(5 个)
  2. SELECT 字段过多
  3. 深度分页问题

优化方案:

-- 方案 1:拆分查询
-- 查询 1:订单列表
SELECT id, user_id, product_id, status, create_time, amount
FROM orders 
WHERE create_time > '2024-01-01'
ORDER BY create_time DESC 
LIMIT 0, 20;

-- 查询 2:批量获取用户
SELECT id, name, phone FROM users WHERE id IN (...);

-- 查询 3:批量获取商品
SELECT id, title, price FROM products WHERE id IN (...);

-- 查询 4:批量获取分类
SELECT id, name FROM categories WHERE id IN (...);

-- 应用层组装

-- 方案 2:减少 JOIN
SELECT o.id, o.amount, o.create_time, u.name, p.title
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN products p ON o.product_id = p.id
WHERE o.create_time > '2024-01-01'
ORDER BY o.create_time DESC 
LIMIT 0, 20;

案例 2:用户行为分析优化

原始 SQL:

SELECT u.id, u.name, COUNT(o.id) AS order_count, 
       SUM(o.amount) AS total_amount, 
       MAX(o.create_time) AS last_order_time
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 1
GROUP BY u.id, u.name
HAVING order_count > 5
ORDER BY total_amount DESC
LIMIT 0, 100;

优化方案:

-- 方案 1:子查询优化
SELECT u.id, u.name, 
       COALESCE(t.order_count, 0) AS order_count,
       COALESCE(t.total_amount, 0) AS total_amount,
       t.last_order_time
FROM users u
LEFT JOIN (
    SELECT user_id, 
           COUNT(*) AS order_count,
           SUM(amount) AS total_amount,
           MAX(create_time) AS last_order_time
    FROM orders
    GROUP BY user_id
    HAVING order_count > 5
) t ON u.id = t.user_id
WHERE u.status = 1
ORDER BY total_amount DESC 
LIMIT 0, 100;

-- 方案 2:物化视图(MySQL 8.0+)
CREATE TABLE user_order_stats (
    user_id BIGINT PRIMARY KEY,
    order_count INT,
    total_amount DECIMAL(10, 2),
    last_order_time DATETIME
);

-- 定期刷新
INSERT INTO user_order_stats 
SELECT user_id, COUNT(*), SUM(amount), MAX(create_time)
FROM orders
GROUP BY user_id
ON DUPLICATE KEY UPDATE 
    order_count = VALUES(order_count),
    total_amount = VALUES(total_amount),
    last_order_time = VALUES(last_order_time);

-- 查询
SELECT u.id, u.name, t.order_count, t.total_amount, t.last_order_time
FROM users u
INNER JOIN user_order_stats t ON u.id = t.user_id
WHERE u.status = 1 AND t.order_count > 5
ORDER BY t.total_amount DESC 
LIMIT 0, 100;

案例 3:日志关联查询优化

问题: 关联 7 个表查询日志详情,耗时 30s+

原始 SQL:

SELECT l.*, u.name, r.path, m.method, p.name AS param_name, 
       h.header_name, resp.body
FROM logs l
LEFT JOIN users u ON l.user_id = u.id
LEFT JOIN routes r ON l.route_id = r.id
LEFT JOIN methods m ON l.method_id = m.id
LEFT JOIN params p ON l.id = p.log_id
LEFT JOIN headers h ON l.id = h.log_id
LEFT JOIN responses resp ON l.id = resp.log_id
WHERE l.create_time > '2024-01-01';

优化方案:

-- 方案:宽表 + 搜索引擎

-- 1. 创建日志宽表
CREATE TABLE logs_wide (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    user_name VARCHAR(50),
    route_path VARCHAR(200),
    method_name VARCHAR(10),
    param_json JSON,
    header_json JSON,
    response_body TEXT,
    create_time DATETIME,
    INDEX idx_create_time (create_time),
    INDEX idx_user_id (user_id)
);

-- 2. 异步写入宽表
-- 应用层组装后写入 logs_wide

-- 3. 查询宽表
SELECT * FROM logs_wide WHERE create_time > '2024-01-01';

-- 4. 复杂查询使用 Elasticsearch
-- 日志同步到 ES,使用 ES 查询

最佳实践总结

JOIN 优化清单

  1. 索引优化

    • 关联字段有索引
    • 使用覆盖索引
    • 避免索引失效
  2. 表顺序优化

    • 小表驱动大表
    • 使用 STRAIGHT_JOIN 强制顺序(必要时)
  3. 查询结构优化

    • JOIN 表 ≤ 3 个
    • ON 和 WHERE 条件正确放置
    • 避免 SELECT *
  4. 分页优化

    • 使用延迟关联
    • 游标分页代替 OFFSET
  5. 执行计划检查

    • type 至少是 range
    • 避免 Using temporary
    • 避免 Using filesort

性能监控

-- 慢查询日志
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

-- 查看慢查询
SELECT * FROM mysql.slow_log;

-- 实时性能监控
SHOW FULL PROCESSLIST;
SHOW ENGINE INNODB STATUS;

参考资料


分享这篇文章到:

上一篇文章
代理模式实战详解
下一篇文章
Spring Boot Actuator 监控端点