Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder Stooshie's Avatar
    Join Date
    Mar 2008
    Location
    Dundee, Scotland
    Posts
    378
    Thanks
    9
    Thanked 39 Times in 39 Posts

    Question Many REAL columns with 100,000 rows

    Hi there,

    I am working on a system that will be analysing medical trial data. One table needs multiple columns(maybe up to 50) of type REAL and will have 10s of 1000s if not 100s of 1000s of rows.

    I know I can index real columns and I may need to. Is this going to make the DB vastly huge and unworkable for such large datasets?

    Regards,

    Andrew.
    Regards, Stooshie
    O

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    50 type reals (at 8 bytes each) is 400 bytes.

    100,000 records at 400 bytes each is 40 MB.

    The largest MySQL table I've created (so far!) was ONE THOUSAND TIMES LARGER at 40 GIGABYTES. And, with proper indexes, I could get anything I needed from that table in about 100 milliseconds.

    Pardon me for laughing, but why would you possibly be worried abut such a very tiny database?
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    Regular Coder Stooshie's Avatar
    Join Date
    Mar 2008
    Location
    Dundee, Scotland
    Posts
    378
    Thanks
    9
    Thanked 39 Times in 39 Posts
    It's not the size of the DB itself I am worried about. It's the indexing of 50 or so REAL or DECIMAL columns in a DB with 100,000 rows that is making me wonder about the size issue.
    Regards, Stooshie
    O

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    You need to index *ALL* the columns? WOW. I have to wonder if that's the best design then. Maybe an associated table, instead?

    Example:
    Code:
    CREATE maintable (
        id INT AUTO_INCREMENT PRIMARY KEY,
        ... various fields but not the REAL data ...
    );
    
    CREATE realdatatable (
        id INT REFERENCES maintable(id),
        flag INT, /* describes the type of the data
        data REAL,
        INDEX masterindex (flag, data)
    );
    Now you have only the one index. Granted, it's a big index, but that's not an issue for MySQL. So your queries would do something like
    Code:
    SELECT list, of, fields 
    FROM maintable AS M, realdatattable AS R
    WHERE M.id = R.id
    AND R.flag = 33
    AND R.data BETWEEN 3.771 AND 4.032
    Just a stab at a maybe solution. Dunno if it fits your model or not, at all.

    ********

    Having said all the above: Though 50 indexes seems excessive, MySQL should be able to handle them just fine. If you use EXPLAIN and find it using the wrong index in some cases, you can use "hints" in the SELECT (usually) to force it to use the right one(s).
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #5
    Regular Coder Stooshie's Avatar
    Join Date
    Mar 2008
    Location
    Dundee, Scotland
    Posts
    378
    Thanks
    9
    Thanked 39 Times in 39 Posts
    Thanks for that.

    It's a DB of medical research data so each column of data needs searched on for CHI Squared tests etc...

    Andrew.
    Regards, Stooshie
    O

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Quote Originally Posted by Stooshie View Post
    It's a DB of medical research data so each column of data needs searched on for CHI Squared tests etc...
    Yes, but nobody said you have to *STORE* the data in 50 columns in a single table, did they?

    *DO* consider the idea of "tagged" data, instead.

    Not saying you have to end up choosing that scheme, but there are advantages to it.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •