...

View Full Version : Searching millions of files.



mattis2k
06-29-2007, 11:49 PM
Hi,

I'm currently writing an email application, it reads a pop box and stores the mail body on the file system.

Theres currently nearly 1 million files on the file system called EMAIL_ID.dat

EMAIL_ID being the same as the other details for that mail in the db.

The reason i chose the file system to store the mail body is because the mysql table was too big the other way. over 2gig and ORDER BY was very slow.

Anyone got any ideas on how i can search the file system quickly or if indeed it can be done better ?

Cheers
Mart

rfresh
06-29-2007, 11:54 PM
Can you please clarify:

1. You are storing your email body text in a file system vs a database correct?

2. When you had the body in a database what field data type were you using?

3. Did you index that email body database field or no?

I have an idea but would like to know the above answers please.

digital-ether
06-30-2007, 08:26 AM
Hi,

I'm currently writing an email application, it reads a pop box and stores the mail body on the file system.

Theres currently nearly 1 million files on the file system called EMAIL_ID.dat

EMAIL_ID being the same as the other details for that mail in the db.

The reason i chose the file system to store the mail body is because the mysql table was too big the other way. over 2gig and ORDER BY was very slow.

Anyone got any ideas on how i can search the file system quickly or if indeed it can be done better ?

Cheers
Mart

You will likely have to create a relevant index of keywords from the files in mysql that would allow quick lookups. Try looking at how one of the open source search engines would index those files to figure out how to create the index.

mattis2k
06-30-2007, 08:47 AM
Can you please clarify:

1. You are storing your email body text in a file system vs a database correct?

2. When you had the body in a database what field data type were you using?

3. Did you index that email body database field or no?

I have an idea but would like to know the above answers please.

Hi,

Thanks for the reply.. I inherited this system, the email body was stored as a TEXT field, there is currently no index on it in the previous system.

I'm open to using the database to store them, but not sure if having a db with large TEXT fields is efficient.

mattis2k
07-02-2007, 09:14 PM
anyone ?



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum