PDA

View Full Version : Possible To Have Too Much Data To Query?


Sussex_Chris
04-02-2010, 02:10 AM
I have got a simple query that selects an ID from a database:
$idsql = mysql_query("SELECT * FROM ids WHERE sex<4 ORDER BY RAND() LIMIT ".$max."");

My $max limit is set at 30 rows.

And then the num rows check:
if (mysql_num_rows($idsql)<1){echo "Out Of ID's<br />"; exit;}

And this just keeps on returning the following error:
Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in.....

The thing is, when the DB has under 100K rows it is fine, as soon as I start going higher it starts getting this problem. Preferably I would like 1 million+ rows in the table but I can't even get my server to let me pull back 27 random records from a 200K row table.

SQL Structure:
CREATE TABLE `ids` (
`id` bigint(20) NOT NULL,
`sex` tinyint(1) NOT NULL,
`country` varchar(255) NOT NULL,
PRIMARY KEY (`fbid`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


Any ideas why this error would happen?

Edit: Been doing a bit of research and I it is down to the ORDER BY RAND() query. The problem is, without this when I run multiple instances duplicate rows are returned, is there an alternative way to order this much data my random?

Old Pedant
04-02-2010, 05:15 AM
Usual advice we *ALWAYS* give:

GET YOUR PHP CODE *out* of the question.

Try the query with a MySQL tool of some kind.

See what happens.

SELECT * FROM ids WHERE sex<4 ORDER BY RAND() LIMIT 30;

Does it work? Does it take forever to run?

It could, because in order to execute that query, MySQL has to build a temporary table, assignin a RAND() value to each row. And then it has to sort that table by the RAND() value. And only then can it go get the 30 lowest RAND()-valued rows.

To see this, do

EXPLAIN SELECT * FROM ids WHERE sex<4 ORDER BY RAND() LIMIT 30;

This is a fine way to select a few records out of a few hundred. I would *expect* it to fall flat on its face if you have millions of records.

Sussex_Chris
04-03-2010, 08:26 AM
Thanks for your help Old_Pedant.

I managed to find this info from another site to get around my ORDER BY RAND() but still select random rows. I have tried this on a DB with 2M records and the results are almost instant:

$offset_result = mysql_query("SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `ids` WHERE sex<4");
$offset_row = mysql_fetch_object($offset_result);
$offset = $offset_row->offset;
$result = mysql_query("SELECT * FROM `ids` WHERE sex<4 LIMIT $offset, 1000");

Using this in conjunction with updating a new cell, 'status', I can use multiple scripts to pull out random records + check to see if the current ID is in use by updating the row's status at the beginning and end of each row query :)

Old Pedant
04-04-2010, 06:24 AM
Well, you do understand that this gets you to a random *starting* point, but then you are pulling the rows sequentially from there?

There are other ways to do randomizing, but they are all dependent on what your data looks like.