03-16-2009, 03:31 AM
Hi all,

First off,

Is cURL the best method for web scraping? I've heard a bit about the new loadHTMLfile(), is that just as good? What about file_get_contents?

Now to my main question: Is there a way to selectively grab content from an html file instead of grabbing the entire contents and then extracting information with preg_match? For example, load an html file until the first preg_match and then end the html loading...Weird I know, but was just wondering if it could be done.

Finally, I've been getting into making simple little web applications for myself. The most recent one I've been thinking of doing grabs movie information (RT ratings, imdb plot summary, apple trailer, etc.) and displays it in one location. Would something like this be ethically sound? If so, would there be better ways to go about scraping that information than the above methods, i.e. are they the fastest/simplest/etc.?