takin
05-08-2008, 09:02 AM
I've got a report that I've been loading into mysql to process. I've normalised the database (essential for another query), but that's made a key query extend from .09s (pre-normalisation) to 4min50s (post-normalisation). I'm sure there's a better way of writing the query or indexing tables, but explain means nothing to me.
The complexity comes from the fact that each line (call) of the original report links 2 sites (source and destination) from the same pool. I've pulled the sites out into another table and created a third (connections) to link the two properly, but I still need to know the type of systems the connection is between. I don't care too much about the direction so if one line of the result shows connections from type 1 to type 3 plus connections from type 3 to type 1, that's fine.
Here's the table structure:
Single table: id, date, mins, s_ip, s_type, d_ip, d_type
broken into
Calls: call_id, date, mins
Sites: ip, type
Connections: id, call_id, ip, direction
The query on a single table is
SELECT
YEAR(date), MONTH(date), s_type, d_type, COUNT(mins), SUM(mins)
FROM
pnraw
GROUP BY
YEAR(date), MONTH(date), s_type, d_type
Explain shows
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | pnraw | ALL | NULL | NULL | NULL | NULL | 43864 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
On the normalised database, this becomes
SELECT
YEAR(date), MONTH(date), s_type, d_type, COUNT(mins), SUM(mins)
FROM
calls
LEFT JOIN
(SELECT
call_id, type AS s_type
FROM
connections
LEFT JOIN
sites
ON
system=ip
WHERE
direction=1)
AS s
ON
calls.id=s.call_id
LEFT JOIN
(SELECT
call_id, type AS d_type
FROM
connections
LEFT JOIN
sites
ON
system=ip
WHERE
direction=2)
AS d
ON
calls.id=d.call_id
GROUP BY
YEAR(date), MONTH(date), s_type, d_type;
Explain says
+----+-------------+-------------+--------+---------------+---------+---------+-----------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------+---------+---------+-----------------------------+-------+---------------------------------+
| 1 | PRIMARY | calls | ALL | NULL | NULL | NULL | NULL | 42347 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 42347 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 42347 | |
| 3 | DERIVED | connections | index | NULL | call_id | 19 | NULL | 84694 | Using where; Using index |
| 3 | DERIVED | sites | eq_ref | PRIMARY | PRIMARY | 15 | pnreport.connections.system | 1 | |
| 2 | DERIVED | connections | index | NULL | call_id | 19 | NULL | 84694 | Using where; Using index |
| 2 | DERIVED | sites | eq_ref | PRIMARY | PRIMARY | 15 | pnreport.connections.system | 1 | |
+----+-------------+-------------+--------+---------------+---------+---------+-----------------------------+-------+---------------------------------+
Is there any way I can reduce the time this query takes in the normalised structure?
The complexity comes from the fact that each line (call) of the original report links 2 sites (source and destination) from the same pool. I've pulled the sites out into another table and created a third (connections) to link the two properly, but I still need to know the type of systems the connection is between. I don't care too much about the direction so if one line of the result shows connections from type 1 to type 3 plus connections from type 3 to type 1, that's fine.
Here's the table structure:
Single table: id, date, mins, s_ip, s_type, d_ip, d_type
broken into
Calls: call_id, date, mins
Sites: ip, type
Connections: id, call_id, ip, direction
The query on a single table is
SELECT
YEAR(date), MONTH(date), s_type, d_type, COUNT(mins), SUM(mins)
FROM
pnraw
GROUP BY
YEAR(date), MONTH(date), s_type, d_type
Explain shows
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | pnraw | ALL | NULL | NULL | NULL | NULL | 43864 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
On the normalised database, this becomes
SELECT
YEAR(date), MONTH(date), s_type, d_type, COUNT(mins), SUM(mins)
FROM
calls
LEFT JOIN
(SELECT
call_id, type AS s_type
FROM
connections
LEFT JOIN
sites
ON
system=ip
WHERE
direction=1)
AS s
ON
calls.id=s.call_id
LEFT JOIN
(SELECT
call_id, type AS d_type
FROM
connections
LEFT JOIN
sites
ON
system=ip
WHERE
direction=2)
AS d
ON
calls.id=d.call_id
GROUP BY
YEAR(date), MONTH(date), s_type, d_type;
Explain says
+----+-------------+-------------+--------+---------------+---------+---------+-----------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------+---------+---------+-----------------------------+-------+---------------------------------+
| 1 | PRIMARY | calls | ALL | NULL | NULL | NULL | NULL | 42347 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 42347 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 42347 | |
| 3 | DERIVED | connections | index | NULL | call_id | 19 | NULL | 84694 | Using where; Using index |
| 3 | DERIVED | sites | eq_ref | PRIMARY | PRIMARY | 15 | pnreport.connections.system | 1 | |
| 2 | DERIVED | connections | index | NULL | call_id | 19 | NULL | 84694 | Using where; Using index |
| 2 | DERIVED | sites | eq_ref | PRIMARY | PRIMARY | 15 | pnreport.connections.system | 1 | |
+----+-------------+-------------+--------+---------------+---------+---------+-----------------------------+-------+---------------------------------+
Is there any way I can reduce the time this query takes in the normalised structure?