This article explains how to use EXPLAIN ANALYZE, a useful tool for improving queries in MySQL 8

Hello,
I'm Mandai, the Wild Team member of the development team.

This time, I'll introduce how to use EXPLAIN ANALYZE, a useful tool for improving queries in MySQL 8

When tuning queries in MySQL, it's common to use EXPLAIN, but EXPLAIN only displays the query execution plan and doesn't provide detailed information such as the actual query execution time or the number of rows

However, starting with MySQL 8, you can use EXPLAIN ANALYZE to find out things that EXPLAIN alone couldn't. Let's take a look!

What is EXPLAIN ANALYZE?

EXPLAIN ANALYZE is a new feature introduced in MySQL 8, and it's a very useful tool for query tuning

The crucial difference between EXPLAIN ANALYZE and EXPLAIN is that EXPLAIN displays the query execution plan based on optimizer statistics, while EXPLAIN ANALYZE actually executes the query, allowing you to accurately determine the query execution time

The crucial part is actually executing the query. Only by actually running it can detailed information such as the query execution time and the number of rows processed become reliable

Furthermore, because the execution time is clearly shown for each stage of the internal processing, developers can easily identify which part of the query is causing a bottleneck, which is a very welcome feature

Information provided by EXPLAIN ANALYZE

Using EXPLAIN ANALYZE will provide you with the following information:

explain analyze select ...(omitted)...\G *************************** 1. row *************************** EXPLAIN: -> Nested loop left join (cost=10.2 rows=2.13) (actual time=0.192..3.42 rows=63 loops=1) -> Nested loop left join (cost=8.37 rows=2.13) (actual time=0.172..3.26 rows=63 loops=1) -> Nested loop left join (cost=6.22 rows=0.208) (actual time=0.153..2.77 rows=63 loops=1) -> Nested loop left join (cost=6.06 rows=0.208) (actual time=0.151..2.75 rows=63 loops=1) -> Nested loop left join (cost=5.92 rows=0.0991) (actual time=0.123..0.626 rows=21 loops=1) -> Nested loop inner join (cost=5.86 rows=0.07) (actual time=0.0448..0.065 rows=7 loops=1) -> Nested loop inner join (cost=5.61 rows=0.7) (actual time=0.0398..0.0569 rows=7 loops=1) -> Filter: ((table1.id in ('52','48','58','201','59','45','54')) and (table1.cd1 is not null) and (table1.cd2 is not null)) (cost=3.16 rows=7) (actual time=0.0248..0.0383 rows=7 loops=1) -> Index range scan on table1 using PRIMARY over (id = 45) OR (id = 48) OR (5 more) (cost=3.16 rows=7) (actual time=0.0231..0.0311 rows=7 loops=1) -> Filter: (table2.flag1 = '1') (cost=0.251 rows=0.1) (actual time=0.0023..0.00235 rows=1 loops=7) -> Single-row index lookup on table2 using cd1 (cd1=table1.cd1) (cost=0.251 rows=1) (actual time=0.00199..0.002 rows=1 loops=7) -> Filter: (table3.flag1 = '1') (cost=0.264 rows=0.1) (actual time=902e-6..995e-6 rows=1 loops=7) -> Single-row index lookup on table3 using cd3 (cd3=table1.cd3) (cost=0.264 rows=1) (actual time=762e-6..789e-6 rows=1 loops=7) -> Filter: ((table4.type = '1') and (table4.date <= table1.date) and (table4.date = (select #2))) (cost=2.8 rows=1.42) (actual time=0.0509..0.0798 rows=3 loops=7) -> Index lookup on table4 using table5_idx2 (id3=table1.id3, date=(select #2)) (cost=2.8 rows=1.42) (actual time=0.0367..0.0376 rows=3 loops=7) -> Select #2 (subquery in condition; dependent) -> Aggregate: max(table7.date) (cost=1.17 rows=1) (actual time=0.0134..0.0134 rows=1 loops=35) -> Filter: (table7.type = '1') (cost=1.16 rows=0.0699) (actual time=0.00638..0.0121 rows=4 loops=35) -> Index lookup on table7 using table5_idx1 (id3=table1.id3), with index condition: (table7.date <= table1.date) (cost=1.16 rows=2.1) (actual time=0.00624..0.0116 rows=6.43 loops=35) -> Filter: ((table6.type = '2') and (table6.date >= table1.date) and (table6.date = (select #3))) (cost=3.27 rows=2.1) (actual time=0.0216..0.101 rows=3 loops=21) -> Index lookup on table6 using table5_idx1 (id3=table1.id3) (cost=3.27 rows=2.1) (actual time=0.00552..0.0129 rows=10 loops=21) -> Select #3 (subquery in condition; dependent) -> Aggregate: min(table8.date) (cost=1.17 rows=1) (actual time=0.0138..0.0138 rows=1 loops=126) -> Filter: (table8.type = '2') (cost=1.16 rows=0.0699) (actual time=0.00603..0.0124 rows=6 loops=126) -> Index lookup on table8 using table5_idx1 (id3=table1.id3), with index condition: (table8.date >= table1.date) (cost=1.16 rows=2.1) (actual time=0.00577..0.0118 rows=7.71 loops=126) -> Single-row index lookup on table9 using no (no=table6.data) (cost=1.19 rows=1) (actual time=89.6e-6..89.6e-6 rows=0 loops=63) -> Filter: (table10.type = '1') (cost=14.2 rows=10.2) (actual time=0.00681..0.00772 rows=1 loops=63) -> Index lookup on table10 using table10_idx1 (id3=table10.id3) (cost=14.2 rows=10.2) (actual time=0.00667..0.00735 rows=5 loops=63) -> Index lookup on table11 using id4 (id4=table11.id4) (cost=0.797 rows=1) (actual time=0.00203..0.00231 rows=1 loops=63) 1 row in set, 4 warnings (0.007 seconds)

It might seem like there are a lot of JOINs, but I intentionally created a complex query.
If you're used to seeing the output of EXPLAIN, you might find the output of EXPLAIN ANALYZE a little hard to read, but this output is packed with information that can help improve the performance of your query!

Conveniently, we had a query that mixed LEFT JOIN and INNER JOIN. Looking at it in light of MySQL's Nested Loop Join mechanism, we can see how the records from the LEFT JOIN table are placed outside the loop, and the records from the INNER JOIN table are placed inside the loop, and then the two tables are joined

Actually, the order of the INNER JOIN and LEFT JOIN in this query is mixed up, but the optimizer does its job properly, and the processing order is correct, starting with the table that was LEFT JOINed

cost , rows , actual time , and loops for each process . Since you can see the actual time taken to process the data and the number of rows processed, you can identify which part of the query is causing a bottleneck.

The interpretation of each value is as follows:

The number in parentheses on the left

  • cost : Cost estimated by the query optimizer
  • rows : The number of rows estimated by the optimizer.

The numbers in parentheses on the right

  • Actual time : The actual time taken. The time to the left of ".." is the time until the first line is returned, and the time to the right is the time until the last line is returned.
  • rows : The actual number of rows processed
  • loops : The number of times this process has been executed

For example, if we look at the very first line, Nested loop left join, it says (cost=10.2 rows=2.13) (actual time=0.192..3.42 rows=63 loops=1)

This means that the query optimizer estimated the cost of this process to be 10.2 ( cost=10.2 and predicted that 2.13 rows would be processed (
rows=2.13 In contrast, the actual time is 0.192..3.42 , meaning it took 0.192 milliseconds for the first row to be returned and 3.42 milliseconds for the last row to be returned.
The time spent processing the outermost last row can be considered roughly the same as the total execution time of the query, but this does not include the time spent returning the results over the network.
If you feel that the query execution time is long even though the EXPLAIN ANALYZE results are good, it is possible that the time spent returning the results over the network is a significant factor.

Furthermore, although 63 rows were actually processed (
rows=63) This suggests that the optimizer's statistics may be outdated, so it would be a good idea to run ANALYZE TABLE to update the statistics.

Incidentally, this EXPLAIN ANALYZE was run in my local environment, so the amount of data is not large.
Therefore, looking at this result alone, you might think it's incredibly fast! However, please note that you won't get accurate information unless you prepare a amount of data similar to that of a production environment.

For simple index checks, EXPLAIN is easier to read because it provides less information. However, when trying to improve query performance, such as when indexes aren't working as expected, using EXPLAIN ANALYZE to check detailed information like the actual query execution time and the number of rows processed can sometimes provide a clear solution

summary

This time, I introduced a method for improving queries using EXPLAIN ANALYZE, a new feature added in MySQL 8

Even though it's called "new," MySQL 8.0.18, which added EXPLAIN ANALYZE, was released in 2019, so it's not exactly "new" at all. However, it seems there are still many people who try to improve their queries but end up giving up after only looking at EXPLAIN and still not understanding anything, so I decided to write this article again

If the application specifications are not clearly defined beforehand, and columns are added retrospectively during development, you may encounter queries where indexes do not function effectively, or queries that use awkward JOINs to retrieve data

The point was that you might want to try using EXPLAIN ANALYZE when tuning such queries

That's all

If you found this article useful, please click [Like]!
0
Loading...
0 votes, average: 0.00 / 10
2
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Yoichi Bandai

My main job is developing web APIs for social games, but I'm also grateful to be able to do a variety of other work, including marketing.
My portrait rights within Beyond are CC0.