Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 02-25-2007, 07:23 AM   PM User | #1
Rooseboom
New Coder

 
Join Date: Jul 2002
Posts: 15
Thanks: 0
Thanked 0 Times in 0 Posts
Rooseboom is an unknown quantity at this point
load data from huge file into an array as fast as possible

Hi,

I've got certain data which I can't store in a db (it is too much data, don't ask) so I'll store it in an optimized file structure. Now I want to read the data from that file and get it into a specific array format:

$file_array[$id_a][$id_b] = $value; with data being like:

$file_array[1023][50123435] = 10023;
$file_array[1023][50035768] = 00234;
$file_array[1023][50003452] = 00037;
$file_array[1023][50002345] = 00002;
$file_array[566978][50000343] = 023493;
$file_array[566978][50123435] = 004543;
$file_array[566978][50003452] = 000039;

this number of items in this array can get up to 2 million items!! It takes about 4 seconds to currently load the data into the array with 1.6M items from a file which is stored like:

1023 => array ( '50123435' => '10023', '50035768' => '00234', '50003452' => '00037', '50002345' => '00002')
566978 => array ( '50000343' => '023493', '50123435' => '004543', '50003452' => '000039')

Is there a better way to store the data in the file from which I can fill the array more quickly (<1sec.)?

Thanks!
Rooseboom is offline   Reply With Quote
Old 02-27-2007, 03:04 PM   PM User | #2
kenetix
New Coder

 
Join Date: Feb 2005
Posts: 10
Thanks: 0
Thanked 0 Times in 0 Posts
kenetix is an unknown quantity at this point
Databases are supposed to load MUCH faster than flat files as they're designed to retrieve data directly from the disk sector unlike flat files where it has to go through certain OS restrictions. I'd suggest using databases, change the field variable types to allow you to input large amounts of data.

Also you might want to try splitting the data into separate tables, for example Forum 'subject' and forum 'message' are broken into 2 separate tables to decrease the table size within forum tables.

Indexing the database tables might also work, most current database systems have indexing features, which allow faster retrieval and more efficient sorting of data.

MySQL is capable of handling very huge database sizes. PHPBB's forums are an example. They have over 10 million threads in their forums, and the site isn't loading slow at all.
__________________
Kenetix:: Achieving more than the ordinary.
http://www.kenetix.net
kenetix is offline   Reply With Quote
Old 02-27-2007, 03:23 PM   PM User | #3
ralph l mayo
Regular Coder

 
ralph l mayo's Avatar
 
Join Date: Nov 2005
Posts: 951
Thanks: 1
Thanked 31 Times in 29 Posts
ralph l mayo is on a distinguished road
Denormalized file structures can be much faster than a database when you don't care abou things like transactions and atomicity, the overhead of which dwarfs that of whatever operating systems costs the database doesn't also have to pay.

IIRC phpBB archives threads to separate tables so only a small number of however many millions total are ever a factor in most queries.

Per the OP, please post how you're reading the data now and what the file structure is. Ideally you'd load the data into memory once and serve it many times, meaning load time wouldn't be so much a concer as random access time, which is probably acceptable. Maybe someone can clarify how you can share memory in PHP like you can in the mod_* extensions.

If this really needs to be fast you could write an Apache module in C to read the data and share it with PHP clients.
ralph l mayo is offline   Reply With Quote
Old 02-27-2007, 03:52 PM   PM User | #4
CFMaBiSmAd
Senior Coder

 
CFMaBiSmAd's Avatar
 
Join Date: Oct 2006
Location: Denver, Colorado USA
Posts: 2,712
Thanks: 2
Thanked 251 Times in 243 Posts
CFMaBiSmAd is a jewel in the roughCFMaBiSmAd is a jewel in the roughCFMaBiSmAd is a jewel in the roughCFMaBiSmAd is a jewel in the rough
Since you don't indicate what this data is, how it is used, or how often it gets updated..., we can only guess, but I'll guess that you only display, access, or process a small select portion of it on any web page that you output to a browser? If so, you can save yourself a lot of loading time by using a database and only access the data you need on any particular page.
__________________
If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.
CFMaBiSmAd is offline   Reply With Quote
Old 02-27-2007, 11:48 PM   PM User | #5
marek_mar
Sensei


 
Join Date: Aug 2003
Location: One step ahead of you.
Posts: 2,815
Thanks: 0
Thanked 3 Times in 3 Posts
marek_mar is on a distinguished road
Using a databse with a properly structured and indexed table should do the trick.
__________________
I'm not sure if this was any help, but I hope it didn't make you stupider.

Experience is something you get just after you really need it.
PHP Installation Guide Feedback welcome.
marek_mar is offline   Reply With Quote
Old 02-27-2007, 11:55 PM   PM User | #6
GJay
Senior Coder

 
Join Date: Sep 2005
Posts: 1,791
Thanks: 5
Thanked 36 Times in 35 Posts
GJay is on a distinguished road
you can put things in memory with something like memcached or with the apc caching extension, with each:
PHP Code:
$memcache = new Memcache;
$memcache->connect('localhost'11211) or die ("Could not connect");


if(
false===($data $memcache->get('big_data'))) {
  
$data get_lots_of_data(); //this takes a long time
  
$memcache->set('big_data',$data);

or with apc:
PHP Code:
if(false===($data apc_fetch('big_data'))) {
  
$data get_lots_of_data(); //this takes a long time
  
apc_store('big_data',$data);

http://php.net/memcache
http://php.net/apc
__________________
My thoughts on some things: http://codemeetsmusic.com
And my scrapbook of cool things: http://gjones.tumblr.com
GJay is offline   Reply With Quote
Old 02-28-2007, 12:05 AM   PM User | #7
marek_mar
Sensei


 
Join Date: Aug 2003
Location: One step ahead of you.
Posts: 2,815
Thanks: 0
Thanked 3 Times in 3 Posts
marek_mar is on a distinguished road
.. or with streams:
http://www.php.net/manual/en/wrappers.php.php
__________________
I'm not sure if this was any help, but I hope it didn't make you stupider.

Experience is something you get just after you really need it.
PHP Installation Guide Feedback welcome.
marek_mar is offline   Reply With Quote
Old 04-19-2007, 04:22 PM   PM User | #8
Rooseboom
New Coder

 
Join Date: Jul 2002
Posts: 15
Thanks: 0
Thanked 0 Times in 0 Posts
Rooseboom is an unknown quantity at this point
sorry for my late response, been away and the subject recently became a priority again.

in total many gigs (>250GB) of these numbers are stored and every time (several every second) a different part of the total is requested.

In a single request up to 2M items after each => array represented like this:

1023 => array ( '50123435' => '10023', '50035768' => '00234', '50003452' => '00037', '50002345' => '00002')
566978 => array ( '50000343' => '023493', '50123435' => '004543', '50003452' => '000039')

This is just a very small sample.
I now turn them into an array to use for calculations:

$file_array[1023][50123435] = 10023;
$file_array[1023][50035768] = 00234;
$file_array[1023][50003452] = 00037;
$file_array[1023][50002345] = 00002;
$file_array[566978][50000343] = 023493;
$file_array[566978][50123435] = 004543;
$file_array[566978][50003452] = 000039;

When finally loaded into the $file_array some calculations are done (which have the same items and than add values:

both 1023 and 566978 have item 50123435 with total value 10023+004543
both 1023 and 566978 habe item 3452 with total value 00037+000039

every item is evaluated like this and only items in both (can be three, four , five etc.) lists are finally send back.
Rooseboom is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:35 AM.


Advertisement
Log in to turn off these ads.