Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    Regular Coder
    Join Date
    Mar 2009
    Posts
    175
    Thanks
    3
    Thanked 1 Time in 1 Post

    Ideas on optimizing this query

    I have a query that is driving my cpu through the roof.
    Let me know if you have any ideas on how to optimize it:
    Code:
    SELECT DISTINCT relation.itemid, relation.catid, title, owner, image, background,
         icon
      FROM `jos_sobi2_cat_items_relations` AS relation LEFT JOIN `jos_sobi2_item` AS
         items ON relation.itemid = items.itemid
      WHERE (`published` = '1' AND (relation.catid > 1) AND (`publish_down` >
         '2012-01-04 16:24:13' OR `publish_down` = '0000-00-00 00:00:00') AND
         items.itemid IN(SELECT itemid FROM `jos_sobi2_item` WHERE (`published` = 1 AND
         (`publish_down` > '2012-01-04 16:24:13' OR `publish_down` = '0000-00-00
         00:00:00'))))
      GROUP BY items.itemid
      ORDER BY items.publish_up ASC
      LIMIT 72846, 9

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,210
    Thanks
    75
    Thanked 4,343 Times in 4,309 Posts
    Tell us how bad the performance is if you just use
    Code:
    LIMIT 1
    instead of what you have there.

    But you know, it's pointless to use the GROUP BY when you are also using DISTINCT.

    And on top of that, unless you understand how MySQL treats a GROUP BY that does *NOT* include all non-aggregate fields in the SELECT, you really should avoid it.

    Let me give you an example:
    Code:
    mysql> select * from demo order by n1, n2, n3;
    +------+------+------+
    | n1   | n2   | n3   |
    +------+------+------+
    |  100 |  200 |  300 |
    |  100 |  200 |  301 |
    |  100 |  200 |  302 |
    |  100 |  201 |  301 |
    |  100 |  202 |  302 |
    +------+------+------+
    5 rows in set (0.01 sec)
    
    mysql> select n1, n2, n3 from demo group by n1;
    +------+------+------+
    | n1   | n2   | n3   |
    +------+------+------+
    |  100 |  200 |  300 |
    +------+------+------+
    1 row in set (0.01 sec)
    
    mysql> select n1, n2, n3 from demo group by n1, n2;
    +------+------+------+
    | n1   | n2   | n3   |
    +------+------+------+
    |  100 |  200 |  300 |
    |  100 |  201 |  301 |
    |  100 |  202 |  302 |
    +------+------+------+
    3 rows in set (0.00 sec)
    Do you see it? When you GROUP BY fewer fields than are in the SELECT, MySQL indeed makes sure that the field(s) you specify in GROUP BY have not duplicates.

    *BUT*!!!

    BUT then it chooses an ARBITRARY value for the fields you did *NOT* specify!

    In short, it LIES to you!

    No other database I know of allows you to omit non-aggregate SELECTed fields from the GROUP BY. MySQL's "quirk" here, *can* be a nice feature, if you understand what it is doing. But if you don't, MySQL will simply *HIDE* duplicate field values and you DO NOT KNOW which field value it will choose.

    Maybe a more practical example:
    Code:
    mysql> select * from demo;
    +-------+-------+----------+
    | first | last  | city     |
    +-------+-------+----------+
    | Tom   | Jones | Medville |
    | John  | Jones | Alamosa  |
    | Mike  | Jones | Xanadu   |
    | Tom   | Jones | Xanadu   |
    +-------+-------+----------+
    
    mysql> select first, last, city from demo group by last, first;
    +-------+-------+----------+
    | first | last  | city     |
    +-------+-------+----------+
    | John  | Jones | Alamosa  |
    | Mike  | Jones | Xanadu   |
    | Tom   | Jones | Medville |
    +-------+-------+----------+
    You see it? There are two TOM JONES, from different cities. But because the GROUP BY didn't specify CITY, MySQL arbitrarily chose just ONE of them.

    Is that the one you wanted???? How will you ever know?

    Now, the obvious answer to this is to *SPECIFY* the one you want. Say you always want the last one, alphabetically:
    Code:
    mysql> select first, last, MAX(city) from demo group by last, first;
    +-------+-------+-----------+
    | first | last  | MAX(city) |
    +-------+-------+-----------+
    | John  | Jones | Alamosa   |
    | Mike  | Jones | Xanadu    |
    | Tom   | Jones | Xanadu    |
    +-------+-------+-----------+
    You still only get one of the two possible cities, but at least now *YOU* control which one you get, not MySQL.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,210
    Thanks
    75
    Thanked 4,343 Times in 4,309 Posts
    Your query has more problems than that.

    Look here:
    http://www.codingforums.com/showthre...192#post818192

    You do *NOT* have a LEFT JOIN query.

    Because you used a field from the right-side table in the WHERE clause, your LEFT JOIN was converted to an INNER JOIN.

    You have the worst of both worlds: You don't get a LEFT JOIN so far as the results are concerned and yet you pay at least part of the performance penalty of a LEFT JOIN.

    Now, it looks to me like you don't *really* want a LEFT JOIN, anyway. Because it makes no sense at all to do a GROUP BY on a field from the right-side table of a LEFT JOIN. After all, if there is no match for the right-side table, that field in the GROUP BY will be NULL!

    I *suspect* that all you wanted was an INNER JOIN.

    And, especially if that is so, then you have really *WAY* over-complicated this query by the presence of that unneeded inner SELECT.

    What in the world is the purpose of that???

    All you do in that inner SELECT is *DUPLICATE* the conditions that are ALREADY PRESENT in the rest of the WHERE clause!

    So...assuming that you don't really want a LEFT JOIN (and I'm 98% sure you don't), it looks to me like you can just do this:
    Code:
    SELECT DISTINCT 
         relation.itemid, relation.catid, title, owner, image, background, icon
    FROM jos_sobi2_cat_items_relations AS relation,
         jos_sobi2_item AS items 
    ON relation.itemid = items.itemid
    WHERE items.published = 1 
      AND relation.catid > 1 
      AND ( items.publish_down>'2012-01-04 16:24:13' OR items.publish_down='0000-00-00 00:00:00' ) 
    ORDER BY items.publish_up ASC
    LIMIT 72846, 9
    Again, though, using LIMIT with a large starting OFFSET like that (72846) means that MySQL has to actually construct the full result set and then throw away that many records before it finally returns the answer to you.

    LIMIT is a really nice feature of MySQL, but it can be a real performance hog.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #4
    Regular Coder
    Join Date
    Mar 2009
    Posts
    175
    Thanks
    3
    Thanked 1 Time in 1 Post
    Thanks again for the advice.
    This was a query in a component I have to use for a Joomla project I'm working on, I'm leaning toward scrapping the whole thing eventually and building one from scratch without all of the bloat and crap and screwy database structure this has.

    I tried doing away with the left join like you mentioned but couldn't get it to work, what makes it a bit trickier is the query I posted is what I get in mtop, however the actual query is built dynamically over a couple of different files.

    After thinking about the whole thing for a bit though I didn't really see why there even needed to be a distinct or group by in this case so ended up removing them both out of the query. It seems to be working fine and no longer killing the server with removing duplicates. I'm sure I could still improve it but like I said I think over all it's going to be easier to start from scratch eventually so unless I find an issue with it I think I leave it there for now.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,210
    Thanks
    75
    Thanked 4,343 Times in 4,309 Posts
    Good choices. DISTINCT and GROUP BY are indeed performance killers.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •