Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,057
    Thanks
    25
    Thanked 0 Times in 0 Posts

    Logical vs Natural Keys

    How much would using Natural/Physical Keys versus Logical Keys for the Primary Key slow down MySQL?

    For example, I have a "Section" and a "Dimension" table which form a many-to-many relationship.

    Would it be a great sin if my junction table looked like this...
    Code:
    SECTION		DIMENSION
    --------	----------
    Finance		Business Structure
    "		Store Type
    "		Offering
    "		Accounting
    Legal		Business Structure
    "		Store Type
    "		Offering
    "		Featured Legal

    That certainly would be easier to read and use directly versus this...

    Code:
    SECTION		DIMENSION
    --------	----------
    1		1
    1		2
    1		3
    1		4
    2		1
    2		2
    2		3
    2		5

    BTW, if I was working with things like "Customer" and "Order" then I would always use Logical Keys. However, since the larger project I am working on right now deals with Look-Up tables and things which won't likely change, and which you'd want to use directly, I am leaning towards Natural/Physical Keys, however I don't want to choke MySQL!!

    Sincerely,


    Debbie

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,155
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    Short answer: A relatively small amount, assuming that the fields are indexed.

    Longer answer: It depends on the size of the tables. The advantage of using an INT for a key is that it's only 4 bytes long. So on a many to many table such as you show there, each pairing is only 8 bytes. In a 64KB block (typical for most DBs nowadays, not sure if it's the number MySQL uses, but close enough) you get 8,000 pairs.

    With VARCHAR keys, as you show, it looks like 32 bytes per record will be close to right. So only 2,000 pairs per disk block.

    If your table only has, say, 100,000 records, then I doubt you'll be able to see the difference (between fetching 12 disk blocks and 48 disk blocks...though surely less than that if you are using indexes). If you have 10,000,000 records...well, yes, you will likely be able to see the difference.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,155
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    Oh, and I should point out that MANY database experts are strong advocates of using natural keys when possible. For the very reason you are noting.

    Most often, they are referring to PRIMARY keys. For example, it seems silly to have an INT primary key when you have (say) Social Security Number or maybe Telephone Number or (even more typical) Username in your table, where each of those must be UNIQUE anyway. Again, unless there is a performance issue (and it would apply to the foreign keys, as if those fields are UNIQUE KEYS you actually gain a little by making them PRIMARY KEYS), use the natural key.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #4
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,057
    Thanks
    25
    Thanked 0 Times in 0 Posts
    Hi Old Pedant!!

    Thanks for the responses.

    A few things...

    1.) This thread is leading up to a *larger problem*

    Should I start a new thread?


    2.) I hear a lot of people say to NOT use Natural Keys, because they will almost always change?!

    But what I don't get is that there is "CASCADE ON UPDATE" in MySQL, so if I ever used a Natural Key for a PK/FK, and the PK changed, then it should be easy to update the FK, right? (And remember that this is for a business I am starting, and there is only me working on the website. So I don't have the coordination issues Amazon.com might...)


    3.) What do you think about this...

    A lot of people say to use a Derived PK because it never changes and thus protects Referential Integrity. True, but what about this scenario...

    Code:
    ID	SSN		NAME
    ---	----		-----
    1	999-99-9999	John Smith
    2	999-99-9999	John Smith
    3	999-99-9999	John Smith
    What good does your Derived PK do here?!

    And even if you added a Unique Key (UK) to each field, then you could have this issue...

    Monday
    Code:
    ID	SSN		NAME
    ---	----		-----
    1	777-77-7777	John Smith
    2	888-88-8888	Amy Ward
    3	999-99-9999	Fred Long

    Friday
    Code:
    ID	SSN		NAME
    ---	----		-----
    1	888-88-8888	Amy Ward
    2	777-77-7777	John Smith
    3	999-99-9999	Fred Long
    (Unlikely in a "controlled environment, but still...)


    Quote Originally Posted by Old Pedant View Post
    Oh, and I should point out that MANY database experts are strong advocates of using natural keys when possible. For the very reason you are noting.

    Most often, they are referring to PRIMARY keys. For example, it seems silly to have an INT primary key when you have (say) Social Security Number or maybe Telephone Number or (even more typical) Username in your table, where each of those must be UNIQUE anyway. Again, unless there is a performance issue (and it would apply to the foreign keys, as if those fields are UNIQUE KEYS you actually gain a little by making them PRIMARY KEYS), use the natural key.
    Okay, but what about in the Junction Table like I mentioned above?

    How evil (or not) is it to have a Junction Table like this...

    SECTION_DIMENSION junction table
    Code:
    SECTION		DIMENSION
    --------	----------
    Finance		Business Structure
    Finance		Store Type
    Finance		Offering
    Finance		Accounting
    Legal		Business Structure
    Legal		Store Type
    Legal		Offering
    Legal		Featured Legal
    Sincerely,


    Debbie

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,155
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    Don't forget that CASCADE ON UPDATE (and DELETE ON UPDATE and referential integrity in general) only works on INNODB tables in MySQL. (A weird choice, in my opinion. One not made by any other DB I'm aware of.)

    I'll offer the same answer again: It depends on the number of records you expect to have. 100,000? Performance won't matter, so go with what makes sense. 10,000,000? Maybe time to do some benchmarks to see if it matters enough to you.

    And even if you added a Unique Key (UK) to each field, then you could have this issue...

    Monday
    Code:
    ID	SSN		NAME
    ---	----		-----
    1	777-77-7777	John Smith
    2	888-88-8888	Amy Ward
    3	999-99-9999	Fred Long
    <shrug>A database can't do *EVERYTHING* for you. You have to do some work yourself.</shrug>

    Anyway, how would you convert
    Code:
    1	777-77-7777	John Smith
    to
    Code:
    1	888-88-8888	Amy Ward
    in a single step? At the same time you are changing the other record(s)? At some point in the process, you *would* have a duplicate key in at least one of the fields, unless you went to great lengths to avoid it.

    In any case, my choice here would be to make the SSN be the primary key. SSNs are (supposed to be) immutable. Yes, you can change your name, but you can't (normally) change your SSN. In the rare case a person is assigned a new SSN, the old one is simply retired, not reused. And that's probably the database answer, too: In general, don't reuse a primary key for any reason. (Which is why so many people choose auto_increment for the primary key.)

    I, personally, have no problem with a many to many table that is just a pair of INTs. But I also don't have a problem with you desiring to avoid that. So long as the values in the many to many table refer to primary keys in the parent tables, I'm fine with it.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,155
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    One more comment:
    2.) I hear a lot of people say to NOT use Natural Keys, because they will almost always change?!
    Then you chose the wrong field for your primary key. Again, if you use a natural key, it should be one that is NOT likely to change. And I give you my examples of SSN and phone number and user name. Yes, they can change, but they are unlikely to. And you can build in coding protections for the rare case when they do.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #7
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,057
    Thanks
    25
    Thanked 0 Times in 0 Posts
    Old Pedant,

    Before I go doing lots of typing for nothing, can I expand this thread to get to my *real* question/problem - which builds off of our conversation - or should I start a new thread?

    (Here is hoping you'll stick with me long enough to help me fix my dilemma!!)

    Sincerely,


    Debbie

  • #8
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,155
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    If it's relevant to this thread, keep it here. Else start a new one.

    A new one with a different title might also attract more answers from others, too.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #9
    Senior Coder doubledee's Avatar
    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    1,057
    Thanks
    25
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Don't forget that CASCADE ON UPDATE (and DELETE ON UPDATE and referential integrity in general) only works on INNODB tables in MySQL. (A weird choice, in my opinion. One not made by any other DB I'm aware of.)
    Yeah, I always use InnoDB for that reason.


    I'll offer the same answer again: It depends on the number of records you expect to have. 100,000? Performance won't matter, so go with what makes sense. 10,000,000? Maybe time to do some benchmarks to see if it matters enough to you.
    This will ultimately be for all of the Articles in my database.

    Since I am writing all of these myself, I'm thinking it'll take a while before I exceed 100,000!!


    <shrug>A database can't do *EVERYTHING* for you. You have to do some work yourself.</shrug>
    I know, but I was just pointing out that I think a lot of the Logical Key advocates get a false sense of security having an AutoNumber, when you really need other things to make the record truly unique...

    As mentioned above, these records have unique PK's, but from a physical standpoint, they are duplicates describing the same Person entity...

    Code:
    ID	SSN		NAME
    ---	----		-----
    1	999-99-9999	John Smith
    2	999-99-9999	John Smith
    3	999-99-9999	John Smith

    Anyway, how would you convert
    Code:
    1	777-77-7777	John Smith
    to
    Code:
    1	888-88-8888	Amy Ward
    in a single step? At the same time you are changing the other record(s)? At some point in the process, you *would* have a duplicate key in at least one of the fields, unless you went to great lengths to avoid it.
    True, but my point was, "Who cares if your AutoNumber is always unique is you have records that are really duplicates in being, or that can be tampered with to become duplicates or swaps?"

    (Remember, I am not using Oracle with user-roles and triggers and what-not. I am using MySQL and have phpMyAdmin, so I can change *anything* I want... )


    Personally, I think the best value an AutoNumber provides is that it is like a "timestamp" and ensure it can never be a duplicate. It also makes joining easier in some ways.

    However, to my earlier points, an AutoNumber does not ensure what records are actually describing are unique, and it can make reading a junction table harder.


    In any case, my choice here would be to make the SSN be the primary key. SSNs are (supposed to be) immutable. Yes, you can change your name, but you can't (normally) change your SSN. In the rare case a person is assigned a new SSN, the old one is simply retired, not reused. And that's probably the database answer, too: In general, don't reuse a primary key for any reason. (Which is why so many people choose auto_increment for the primary key.)
    Okay.


    I, personally, have no problem with a many to many table that is just a pair of INTs. But I also don't have a problem with you desiring to avoid that. So long as the values in the many to many table refer to primary keys in the parent tables, I'm fine with it.
    That's good to know. (It is always good to have options!)


    Okay, so I'll create a new thread which takes this topic and extends to to dealing with Many-to-Many-to-Many Junction Tables.

    (Please do stop by!!)

    Thanks,


    Debbie
    Last edited by doubledee; 05-06-2013 at 09:48 PM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •