load data from huge file into an array as fast as possible
Hi,
I've got certain data which I can't store in a db (it is too much data, don't ask) so I'll store it in an optimized file structure. Now I want to read the data from that file and get it into a specific array format:
$file_array[$id_a][$id_b] = $value; with data being like:
this number of items in this array can get up to 2 million items!! It takes about 4 seconds to currently load the data into the array with 1.6M items from a file which is stored like:
Databases are supposed to load MUCH faster than flat files as they're designed to retrieve data directly from the disk sector unlike flat files where it has to go through certain OS restrictions. I'd suggest using databases, change the field variable types to allow you to input large amounts of data.
Also you might want to try splitting the data into separate tables, for example Forum 'subject' and forum 'message' are broken into 2 separate tables to decrease the table size within forum tables.
Indexing the database tables might also work, most current database systems have indexing features, which allow faster retrieval and more efficient sorting of data.
MySQL is capable of handling very huge database sizes. PHPBB's forums are an example. They have over 10 million threads in their forums, and the site isn't loading slow at all.
Denormalized file structures can be much faster than a database when you don't care abou things like transactions and atomicity, the overhead of which dwarfs that of whatever operating systems costs the database doesn't also have to pay.
IIRC phpBB archives threads to separate tables so only a small number of however many millions total are ever a factor in most queries.
Per the OP, please post how you're reading the data now and what the file structure is. Ideally you'd load the data into memory once and serve it many times, meaning load time wouldn't be so much a concer as random access time, which is probably acceptable. Maybe someone can clarify how you can share memory in PHP like you can in the mod_* extensions.
If this really needs to be fast you could write an Apache module in C to read the data and share it with PHP clients.
Since you don't indicate what this data is, how it is used, or how often it gets updated..., we can only guess, but I'll guess that you only display, access, or process a small select portion of it on any web page that you output to a browser? If so, you can save yourself a lot of loading time by using a database and only access the data you need on any particular page.
__________________
If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.