View Full Version : Creating a Simple Filename Search Engine

02-23-2004, 07:36 AM
I need to come up with a solution for searching a directory of documents. I waill have potentially thousands of pdf files with the naming convention of customer#_filedescription.pdf . These will be images that are batch scanned offsite. The scanning company does not off a management system except for using the search function within windows, a hack at best and troublesome as the file count grows. We have looked into commercial offerings that are wondrful at document management but very labor intensive in entering in the documents and extremly expensive. I then found Knowledge tree and terracotta. Both work great at finding the documents but upload is one file at a time. They also seem to be a little overkill for our document managrment needs. What I would like to implement is a simple sytem that would search a directory and it's subdirectories and display the hyperlinks to the files in a results page. Seeing as there will be 4-6 files per customer I want to be able to search by just the customer number. Most search engine scripts that i have tried only search the contents of the pdf file and not he file name. I also tried ftpsearch which I set up but could'nt get it to return any files in the results page, always returning not find!, and support for the code in english was nowhere to be found. I now realize that I need to create the system myself in order to achieve the desired results.
Looking at the different ways to accomplish it seems that the best route is to setup a mySQL database interfaced through PHP scripts. My only problem is I have never done any coding in PHP except for making a few changes in scripts from other people.
So my questions are:
1. Is the PHP/mySQL approach the way to go?
2. Where do I start? (good books/tutorials)

Any help is appreciated.


02-23-2004, 10:43 AM
Welcome here.

Good question. Now, i don't see the problem in finding out if the file exists, since that is precisely what file_exists() does. More info

Of course, you will need to loop through all directorys, and run the file_exists() for each directory. You can get all directorys with scandir()http://be2.php.net/manual/en/function.scandir.php . I'l sure that if you look around at www.php.net, phpclasses.org or www.hotscripts.com, that you will find a ready made class or examplescript on how to scan a complete tree to see if the file exists inthere. Not a very performant operation, i would suppose.

So it would be better, i think, to set up a script that updates a db, when new files are scanned (or when you upload the scanned files or whatever. Maybe a file that runs overnight (activated with a CRON or scheduled task) and that checks on the files creationdate or so + that checks if the file was already recorded in the db or so. Inside the db, you then record the filename and the path. You then run your 'does the file exist' searches against the db --> will return the records with the complete path in a fraction of a second. Converting the path into a link is then real easy.
But getting the filadresses will be a resource consuming proces, so you'll need a script that runs overnight with a long timout.

Or you could upload all files to the server, and then, when you upload them, immedeately create the new record in the db.

Post back if you need more info.