...

View Full Version : Saving the contents of URL and searching the file



john13
05-24-2011, 03:38 PM
Hello, I'm looking for a way to dump the contents of a URL (in plaintext, all of the HTML) to a text file, and then search that file for the amount of times a certain string occurs.

Currently, I'm using the following:



$url = "www.url.com";
$url_contents = file_get_contents($url);

$datadump = 'datadump.txt';


if (!$handle = fopen($datadump, 'a')) {
echo "Cannot open file ($datadump)";
exit;
}
if (fwrite($handle, $url_contents) === FALSE) {
echo "Cannot write to file ($filename)";
exit;
}


This is only saving around 21kb of data per run (it's not the same repeated data, it's parsing in 21kb chunks), which is the first problem.

For the searching, I tried dumping the URL contents to a string and then searching it like this:



$url = "www.url.com";
$url_contents = file_get_contents($url);

$count = preg_match_all("/string/", $url_contents, $matches);
print $count . "<br />";


However, this is only returning the same results as if I searched the file the first piece of code created for the same string manually, leading me to assume that it's ending up as the same file size and there's something I'm doing wrong.

I'm highly open to suggestions if there's a faster/more efficient way of doing what I'm trying to do, I just can't seem to find a way to do what I'm looking for.

gvre
05-24-2011, 10:07 PM
Try this

<?php
$url = "http://www.domain.com/";
$content = @file_get_contents($url);
if (!$content)
exit;
$datadump = 'datadump.txt';
$pattern = '#\bstring\b#si'; // OR #string#si for non word boundary matches
$count = preg_match_all($pattern, $content, $m);
file_put_contents($datadump, $content);



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum