View Full Version : using scandir to list specific pages in site

09-01-2007, 04:41 AM
I need to create a php script that will generate a list of all pages in a site that have a filename ending in _es (ie myfile_es.html) or _ne, _fr, _de, _it.

These are foreign language translation pages and I need a way to generate an index dynamically so that it is always up to date without doing this manually.

As a part of this php script, I need to also parse the html of the pages that meet the criteria (filename ends in _es etc) and extract the text which is between the <title></title> tags to use as a descriptor for the user to know what is at that link.

My initial research into this seems to indicate using scandir is the place to start. Any helpful suggestions and ideas that may send me off in good directions are much needed and appreciated.

Mahalo! :thumbsup:

09-01-2007, 06:37 AM
Use glob() (http://us2.php.net/manual/en/function.glob.php) to create a list of the files. Isolating the text between the <title> tags is a little harder and I'm not good with regex which is what you'll want to use (I imagine).

09-01-2007, 11:52 PM
Thanks Fumigator. A couple questions that the php.net function reference does not clarify for me on glob():

1) Does glob() search through all folders in the web that it is located in when executed? (ie public_html on down?)

2) Does glob() also provide file path information in the array it returns or just the file name?

I need to create a hyperlink to these files in the end so I need both path and filename data.

ALSO, I am assuming you mean ereg as the regex function?

If so it would appear I might need to use file_get_contents function to grab contents of each file found and then ereg to look for the <title></title> tags?

Does that seem like a sound approach?

09-02-2007, 04:42 AM
I've setup a test script and determined that glob() does not include file path info nor does it scan for files outside of the directory it is executed from.

Is there a simple way to scan ALL directories and sub-directories in the site for files matching my pattern criteria AND to get the file path info to them as well as file name?