Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,064
    Thanks
    25
    Thanked 0 Times in 0 Posts

    Junction Table and (Physical) Primary Key

    This past Summer I came up with a new strategy for dealing with Keys in my tables. And while it seems to be working just fine, I want to backtrack and get an outside opinion on whether what I am doing makes sense.

    Specifically, I am interested in my approach and how it relates to Junction Tables.

    I have always had mixed feelings about Derived AutoIncrement Primary Keys.

    On one hand, they are good because the database ensures you always have a unique primary key.

    On the other hand, they tend to be meaningless, and while AutoIncrement ensures your PKs are unique, it does nothing to ensure you don't have this...

    Code:
    ID	User
    ---	-----
    1	DoubleDee
    2	DoubleDee
    3	DoubleDee
    4	DoubleDee

    As such, I started using this approach to get the best of both worlds.

    - Every table has an ID that is AutoIncrement
    - The ID field is indexed as Unique
    - Now I am guaranteed to get my ID=1, 2, 3, ...
    - Then I try and find something "physical" that naturally describes the record and *should* ensure uniqueness (e.g. Username, Email, etc.)

    This has served me well on single tables, but I'd like an outside opinion on how it impacts my Junction Tables...

    Imagine this scenario...
    Code:
    member
    - id (UK)
    - username (PK)
    Code:
    article
    - id (UK)
    - slug (PK)
    Code:
    comment
    - id (UK)
    - article_id (PK1)(FK)
    - member_id (PK2)(FK)
    - created_on (PK3)

    In the example above, I chose to use the ID's for linking to be more efficient.

    My question is whether I am gaining any great benefit (or causing any undo harm) in my Junction Table by having the ID be a UK, and then creating a Composite PK from the two Foreign Key fields along with the Timestamp??


    A lot of people would do this....
    Code:
    comment
    - id (PK)
    - article_id (FK)
    - member_id (FK)
    - created_on

    To me, it is better to create the Composite Key because it (physically) ensures unique records. (Although since this example uses a Timestamp, that is less relevant.)

    Follow me?

    What do you think?

    Sincerely,


    Debbie

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,166
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    The only real disadvantage of a composite key in MySQL is that MySQL has some severe limitations on when it can use such a key.

    In particular, given your composite key of (article_id, member_id, created_on), MySQL would only be able to take advantage of that key (e.g., when one or more of those fields appeared in your WHERE clause) if the conditions in your query included a test for:
    article_id and member_id and create_on (in any order, but all three)
    article_id and member_id (ditto)
    article_id

    Any queries that used conditions other than those (e.g., a WHERE clause that used only member_id, or one that used article_id and create_on) would *NOT* use the index, so you'd be reduced to a full table scan.

    This is well documented in the MySQL docs and is a limitation that may not appear in other databases.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,064
    Thanks
    25
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    The only real disadvantage of a composite key in MySQL is that MySQL has some severe limitations on when it can use such a key.
    I created the Composite Key for the purpose of being a CONSTRAINT, thus ensuring the table would only ever have one Article + Member + CreatedOn combination.

    I guess I was under the impression you should always do that on a Junction Table...

    (Again, I just get this funny feeling about blindly having a field labeled "id", making it a PK, and calling it quits. To me, an Index/Constraint help ensure you have "good" data in your tables to some degree - as I am trying to do.)


    Quote Originally Posted by Old Pedant View Post
    In particular, given your composite key of (article_id, member_id, created_on), MySQL would only be able to take advantage of that key (e.g., when one or more of those fields appeared in your WHERE clause) if the conditions in your query included a test for:
    article_id and member_id and create_on (in any order, but all three)
    article_id and member_id (ditto)
    article_id

    Any queries that used conditions other than those (e.g., a WHERE clause that used only member_id, or one that used article_id and create_on) would *NOT* use the index, so you'd be reduced to a full table scan.

    This is well documented in the MySQL docs and is a limitation that may not appear in other databases.
    My Comment table has the following indexes:

    1.) Name: Primary
    Type: Primary
    Fields: article_id + member_id + created_on


    2.) Name: index_u_id:
    Type: Unique
    Fields: id


    3.) Name: idx_article_id
    Type: Index
    Fields: article_id


    4.) Name: idx_member_id
    Type: Index
    Fields: member_id


    If I had queries that needed to look for all Comments for Article 123, or all Comments by DoubleDee, wouldn't Indexed #3 and #4 work?

    Sincerely,


    Debbie

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,166
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    Oh, sure, with all those added indexes you are just fine.

    And you can actually DROP the idx_article_id since MySQL could and would use the PK for it, anyway.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #5
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,064
    Thanks
    25
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Oh, sure, with all those added indexes you are just fine.

    Sorry to *mushroom* this thread, BUT here are some issues that have me upset...


    Issue #1:
    What should be the Primary Key of any table?


    Should it always be the "ID" field with AutoIncrement set?

    Or should it be a "Natural" key if one exists?

    When you helped me with my Section/Subsection/Article stuff, I opted for making all of the PK's "natural" for readability during joins.

    (I'm not sure I feel so strongly in cases like my Comments table...)


    Issue #2:
    Does a Junction Table need a Composite Key on the Foreign Key fields to prevent dups?


    By having a Composite Primary Key on (article_id + member_id + created+on) I am able to prevent this scenario...

    Code:
    id	article_id	member_id	subscribed_on
    ---	-----------	----------	--------------
    1	5678		38		2013-09-19
    2	5678		38		2013-09-19
    3	5678		38		2013-09-19
    4	5678		38		2013-09-19
    (Maybe that is unlikely, but it just seems weak to have the ID be the PK and no other constraints in place to keep your data honest.)


    Issue #3:
    Some people have shot down using Unique Indexes because they allow NULLs, and thus are deemed "weak".


    My feeling was that if every table starts off with an "ID" field set to AutoIncrement and Null = No, then my UNIQUE Index could never be NULL and so it should serve as a good "pseudo" Primary Key.


    Issue #4:
    As time goes on, I am flipping more on what should be the Primary Key: Surrogate ID field vs. Some Natural Field


    I think that is what has me most confused and upset.

    For my Section/SubSection/Article thingy, I think making all of the "slug" fields (e.g. Section Slug = "Finance") the Primary Key was totally the right decision.

    But do I really want a Comment table which has a Composite Primary Key like this...
    Code:
    "postage-meters-can-save-you-time-and-money" + "DoubleDee" + "2013-09-19 15:29"



    Maybe just switching from this...
    Code:
    comment
    - id (UK)
    - article_id (PK1)(FK)
    - member_id (PK2)(FK)
    - subscribed_on (PK3)

    To this would be enough...
    Code:
    comment
    - id (PK)
    - article_id (UK1)(FK)
    - member_id (UK2)(FK)
    - subscribed_on (UK3)

    Regardless, I still think this way is a bad idea...
    Code:
    comment
    - id (PK)
    - article_id (FK)
    - member_id (FK)
    - subscribed_on


    Quote Originally Posted by Old Pedant View Post
    And you can actually DROP the idx_article_id since MySQL could and would use the PK for it, anyway.
    But in my original post, the Primary Key is...
    Code:
    - article_id (PK1)(FK)
    - member_id (PK2)(FK)
    - subscribed_on (PK3)

    Help!!!!!!!!!

    Sincerely,


    Debbie

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,166
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    #2: Yes, of course, if you indeed don't want to allow duplicates.

    #3: Silly to worry about NULLs in unique keys if you simply disallow NULL in your fields, which in that above case you clearly will be doing (for all fields, presumably).

    Have to leave for the day. Will think on the others.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #7
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,618
    Thanks
    0
    Thanked 645 Times in 635 Posts
    Any table that contains a derived key as well as a natural key is not fully normalised. Normalising it will get rid of the derived key.

    While there are sometimes reasons for using a derived key in addition to a natural one (eg the natural key is very large) but the decision to undo that part of the normalisation should be done after the logical database design is completed when applying those changes to the logical design needed to make your physical design practical. Where the logical design can be converted to a practical physical design without undoing this normalisation then the derived key should not be added. Everywhere that your physical design differs from a fully normalised logical design you should have a specific reason for that change.
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • #8
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,064
    Thanks
    25
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by felgall View Post
    Any table that contains a derived key as well as a natural key is not fully normalised. Normalising it will get rid of the derived key.

    While there are sometimes reasons for using a derived key in addition to a natural one (eg the natural key is very large) but the decision to undo that part of the normalisation should be done after the logical database design is completed when applying those changes to the logical design needed to make your physical design practical. Where the logical design can be converted to a practical physical design without undoing this normalisation then the derived key should not be added. Everywhere that your physical design differs from a fully normalised logical design you should have a specific reason for that change.
    The tables described in my posts above are in 3rd Normal Form as far as I can see.

    It is also entirely possible to use *just* Natural Keys for my Primary and Foreign Keys.

    So that meets what you are saying about the "Logical Model Design Phase".

    I am now in implementation, and I have chosen to add an "ID" field that is AutoIncrement, Null = No, and has a Unique Index to these tables because having such a field in each table...

    1.) Makes it easier to identify a particular record. (Why try and find an Article Slug that is a VARCHAR(100) when I can simply look for ArticleID=3975?!)

    2.) It makes joins much easier.

    3.) As things grow, it *may* have performance implications, although that is yet to be seen.

    4.) There is virtually *no overhead* in having a ubiquitous "ID" field with AutoIncrement and a Unique Index, so why not?!


    BTW, thanks for the thoughts, but you clearly must be from the "Natural Key Camp", because there are millions of DBA's out there that would vehemently disagree.

    They would say that you should always have an AutoIncrement ID field on every table - in addition to the Natural Keys. Further, they would insist that you *always* use a Surrogate Key to join tables!!

    To me, that is as extreme as saying to never use Surrogate Keys - which I am not implying you said.

    For me, I want to balance these two extremes, and ideally, have it both ways!

    Sincerely,


    Debbie


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •