Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    Regular Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    939
    Thanks
    21
    Thanked 0 Times in 0 Posts

    Query Creating Duplicate Data

    Ugh! I just finished a new Data Model, and thought I was all done, however, I am having trouble with one of my queries. (I sure hope I don't have to re-do things?!)


    Here is my Data Model...
    Code:
    SECTION -||-----|<- SECTION_DIMENSION ->|-------||- DIMENSION
    
    DIMENSION -||------|<- DIMENSION_SUBSECTION ->|------||- SUBSECTION
    
    SECTION_DIMENSION -||------|<- ARTICLE_PLACEMENT
    
    DIMENSION_SUBSECTION -||------|<- ARTICLE_PLACEMENT
    
    ARTICLE -||-------|<- ARTICLE_PLACEMENT

    All of the following queries have worked fine...

    - Names of Section/Dimension pairs
    - Names of Dimension/SubSection pairs
    - Names of Section/Dimension/SubSection pairs


    But when I try to get the "Names of Section/Dimension/SubSection/Article pairs", it seems like my query is multiplying the results set I'd expect...

    Here is my query, which is pretty ugly...
    Code:
    SELECT s.name AS Section, d.name AS Dimension, ss.name AS SubSection, a.heading AS Article
    FROM article_placement AS ap
    INNER JOIN section_dimension AS sd
    ON sd.section_slug = ap.sd_section_slug
    INNER JOIN section AS s
    ON s.slug = sd.section_slug
    INNER JOIN dimension_subsection AS ds
    ON ds.dimension_slug = ap.ds_dimension_slug
    INNER JOIN dimension AS d
    ON d.slug = ds.dimension_slug
    INNER JOIN subsection AS ss
    ON ss.slug = ds.subsection_slug
    INNER JOIN article AS a
    ON a.slug = ap.article_slug

    I am getting a result set of 204, and I should be getting like 12 records returned.

    I think all of the INNER JOINS are working against me...

    Sincerely,


    Debbie

  • #2
    Regular Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    939
    Thanks
    21
    Thanked 0 Times in 0 Posts
    It looks like I figured things out.

    I had forgotten that the join between DIMENSION_SUBSECTION and ARTICLE_PLACEMENT had *two* Foreign Keys/Joins.

    Here is my updated query which appears to be working so far...
    Code:
    SELECT s.name AS Section, d.name AS Dimension, ss.name AS SubSection, a.heading AS Article
    FROM section AS s
    INNER JOIN section_dimension AS sd
    ON s.slug = sd.section_slug
    INNER JOIN dimension AS d
    ON d.slug = sd.dimension_slug
    INNER JOIN dimension_subsection AS ds
    ON d.slug = ds.dimension_slug
    INNER JOIN subsection AS ss
    ON ss.slug = ds.subsection_slug
    INNER JOIN article_placement AS ap
    ON (sd.section_slug = ap.sd_section_slug)
    AND (ds.dimension_slug = ap.ds_dimension_slug)
    AND (ds.subsection_slug = ap.ds_subsection_slug)
    INNER JOIN article AS a
    ON a.slug = ap.article_slug
    ORDER BY s.sort, d.sort, ss.sort

    This sure is an ugly and complicated query.

    Is there something I could do to streamline it?

    And do I need to be concerned about a query with so many INNER JOINS?


    The good news is that I mainly did this query to make sure that I could get a result-set with all of these fields if need be, than needing it for regular use.

    But anything I could do to improve things, would be great.

    Sincerely,


    Debbie

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Do an "EXPLAIN" on that query and see what MySQL says. My guess is that you will be pleasantly surprised.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #4
    Regular Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    939
    Thanks
    21
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Do an "EXPLAIN" on that query and see what MySQL says. My guess is that you will be pleasantly surprised.
    The is a link to do this in phpMyAdmin, and this what I got...

    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
    1 	SIMPLE 	s 	ALL 	PRIMARY 	NULL 	NULL 	NULL 	3 	Using temporary; Using filesort
    1 	SIMPLE 	sd 	ref 	PRIMARY,idx_section_slug,idx_dimension_slug 	idx_section_slug 	22 	doubledee.s.slug 	2 	Using index
    1 	SIMPLE 	d 	eq_ref 	PRIMARY 	PRIMARY 	32 	doubledee.sd.dimension_slug 	1 	 
    1 	SIMPLE 	ds 	ref 	PRIMARY,idx_dimension_slug,idx_subsection_slug 	PRIMARY 	32 	doubledee.d.slug 	1 	Using where; Using index
    1 	SIMPLE 	ss 	eq_ref 	PRIMARY 	PRIMARY 	32 	doubledee.ds.subsection_slug 	1 	 
    1 	SIMPLE 	ap 	ref 	PRIMARY,idx_article_slug,idx_ds_dimension_slug,idx_sd_section_slug,idx_ds_subsection_slug 	PRIMARY 	86 	doubledee.sd.section_slug,doubledee.sd.dimension_slug,doubledee.ss.slug 	1 	Using where; Using index
    1 	SIMPLE 	a 	eq_ref 	PRIMARY 	PRIMARY 	102 	doubledee.ap.article_slug 	1

    I don't know how to read it, or even how to easily post the results here...


    Debbie

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Here it is a bit more readable, with the ID and POSSIBLE_KEYS columns omitted:
    Code:
    seltype table type    key              key_len ref                  rows  Extra
    SIMPLE  s     ALL     NULL             NULL    NULL                 3     Using temporary; Using filesort
    SIMPLE  sd    ref     idx_section_slug 22      s.slug               2     Using index
    SIMPLE  d     eq_ref  PRIMARY          32      sd.dimension_slug    1      
    SIMPLE  ds    ref     PRIMARY          32      d.slug               1     Using where; Using index
    SIMPLE  ss    eq_ref  PRIMARY          32      ds.subsection_slug   1      
    SIMPLE  ap    ref     PRIMARY          86      sd.section_slug,sd.dimension_slug,ss.slug     1     Using where; Using index
    SIMPLE  a     eq_ref  PRIMARY          102     ap.article_slug      1
    POSSIBLE_KEYS just tells you which keys MySQL considered using. Not much help. KEY is the important column, along with REF and EXTRA.

    KEY means this is the key column(s) that MySQL actually chose to use, and REF are the REFerences it used.

    EXTRA tells you what other operations and/or decisions came into play.

    Suffice to say, that EXPLAIN looks wonderful.

    The only USING TEMPORARY you see there is the one you would expect: Your main table. MySQL needs to create a TEMPORARY TABLE to hold the results of all the actual SELECT so that it can then implement the ORDER BY that you requested. That can be a costly operation in a SELECT that returns thousands of rows, but it's no big deal when all the other operations result in only a few--or even a few hundred--rows.

    You can read up more on what EXPLAIN is telling you, but trust me: That you are seeing is telling you all good things. You query is going to be very efficient.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Regular Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    939
    Thanks
    21
    Thanked 0 Times in 0 Posts
    Old Pedant,

    I did read up on EXPLAIN in the MySQL manual.

    Most of it was Greek, but I was sort of able to figure out what the columns were trying to explain.


    Quote Originally Posted by Old Pedant View Post
    Suffice to say, that EXPLAIN looks wonderful.
    Thanks!


    Don't I need to be concerned about the NULL in the Key and Ref columns for my Section Table?

    And why do I see that? I believe my Section table is properly indexed...


    The only USING TEMPORARY you see there is the one you would expect: Your main table. MySQL needs to create a TEMPORARY TABLE to hold the results of all the actual SELECT so that it can then implement the ORDER BY that you requested. That can be a costly operation in a SELECT that returns thousands of rows, but it's no big deal when all the other operations result in only a few--or even a few hundred--rows.
    So what would you do on a larger result-set to tackle this issue?


    You can read up more on what EXPLAIN is telling you, but trust me: That you are seeing is telling you all good things. You query is going to be very efficient.
    That is good to hear. Lord knows I/we worked hard enough on things. (To the tune of 2+ weeks of my life!!)

    Thanks,


    Debbie

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Quote Originally Posted by doubledee View Post
    Don't I need to be concerned about the NULL in the Key and Ref columns for my Section Table?
    LOL! Look again at your query: NO PLACE in that query are you using anything that depends on the value of any field in your Section table *except* where you join to it.

    So there is no possible way that any index on Section *could* be used. If you had a WHERE condition that only selected some sections, then maybe.

    So what would you do on a larger result-set to tackle this [Using filesort] issue?
    Possibly nothing.

    Possibly make sure there is an index on the field you are doing the ORDER BY on.

    But if you then add in a LIMIT clause, there may be no way to avoid at least the "Using temporary". Then you may have to rethink how you are doing the query.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #8
    Regular Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    939
    Thanks
    21
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by doubledee
    Don't I need to be concerned about the NULL in the Key and Ref columns for my Section Table?

    Quote Originally Posted by Old Pedant View Post
    LOL! Look again at your query: NO PLACE in that query are you using anything that depends on the value of any field in your Section table *except* where you join to it.

    So there is no possible way that any index on Section *could* be used. If you had a WHERE condition that only selected some sections, then maybe.
    I figured it would be used here...
    Code:
    SELECT s.name AS Section, d.name AS Dimension, ss.name AS SubSection, a.heading AS Article
    FROM section AS s
    INNER JOIN section_dimension AS sd
    ON s.slug = sd.section_slug
    

    Quote Originally Posted by doubledee
    So what would you do on a larger result-set to tackle this [Using filesort] issue?

    Quote Originally Posted by Old Pedant View Post
    Possibly nothing.

    Possibly make sure there is an index on the field you are doing the ORDER BY on.

    But if you then add in a LIMIT clause, there may be no way to avoid at least the "Using temporary". Then you may have to rethink how you are doing the query.
    Is that a case where breaking things up into "Views" or "Materialized Views" (whatever those are) would help?

    Sincerely,


    Debbie

  • #9
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Quote Originally Posted by doubledee View Post
    I figured it would be used here...
    Code:
    SELECT s.name AS Section, d.name AS Dimension, ss.name AS SubSection, a.heading AS Article
    FROM section AS s
    INNER JOIN section_dimension AS sd
    ON s.slug = sd.section_slug
    
    The primary key is used in the JOIN, but that is *NOT* what EXPLAIN is showing you: It's showing you the keys that are needed to satisfy your conditions (dependent tables in joins and where clauses, etc.).

    Since you put no restrictions on "s", MySQL has no choice but to test every record in "s". That is, there are no conditions on "s", per se.

    *******

    Is that a case where breaking things up into "Views" or "Materialized Views" (whatever those are) would help?
    Probably not. More likely what you would do is try to JOIN the main query to a secondary query, with the LIMIT applied to the secondary query. And in the secondary query, you try to keep the amount of data used (the size of the field, the joins, etc.) to a minimum.

    Simple minded example, with only one table:

    Say you had a table with huge records that included a DATETIME column, "eventtime". eventtime is indexed, but it's not a unique index so it's not the primary key. And now you want to do
    Code:
    SELECT * FROM bigtable ORDER BY eventtime LIMIT 1500,10
    As written, MySQL would have to select *ALL* the records, in eventtime order, dump them into a temp table, and then find records 1500 through 1509. Very possibly the temp table is so big that it has to be written to disk (it won't fit in MySQL's buffer memory.)

    Easy rewrite:
    Code:
    SELECT B.*
    FROM bigtable AS B
    INNER JOIN ( 
            SELECT pk, eventtime FROM bigtable
            ORDER BY eventtime LIMIT 1500,10 ) AS S
    ON B.pk = S.pk
    ORDER BY eventtime
    MySQL will still have to construct the temp table, but now that temp table will only have two very small fields in it and will most likely fit in buffer memory. Then it finds the 10 required little tiny records and easily makes the join from them back to the bigtable using the pk (primary key).
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    doubledee (05-15-2013)

  • #10
    Regular Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    939
    Thanks
    21
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Probably not. More likely what you would do is try to JOIN the main query to a secondary query, with the LIMIT applied to the secondary query. And in the secondary query, you try to keep the amount of data used (the size of the field, the joins, etc.) to a minimum.

    Simple minded example, with only one table:

    Say you had a table with huge records that included a DATETIME column, "eventtime". eventtime is indexed, but it's not a unique index so it's not the primary key. And now you want to do
    Code:
    SELECT * FROM bigtable ORDER BY eventtime LIMIT 1500,10
    As written, MySQL would have to select *ALL* the records, in eventtime order, dump them into a temp table, and then find records 1500 through 1509. Very possibly the temp table is so big that it has to be written to disk (it won't fit in MySQL's buffer memory.)

    Easy rewrite:
    Code:
    SELECT B.*
    FROM bigtable AS B
    INNER JOIN ( 
            SELECT pk, eventtime FROM bigtable
            ORDER BY eventtime LIMIT 1500,10 ) AS S
    ON B.pk = S.pk
    ORDER BY eventtime
    MySQL will still have to construct the temp table, but now that temp table will only have two very small fields in it and will most likely fit in buffer memory. Then it finds the 10 required little tiny records and easily makes the join from them back to the bigtable using the pk (primary key).
    Hey, that's a really neat query!

    A good technique to know for down the road.

    Thanks,


    Debbie
    Last edited by doubledee; 05-15-2013 at 09:29 PM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •