Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 2 of 2
  1. #1
    New to the CF scene
    Join Date
    May 2011
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Saving the contents of URL and searching the file

    Hello, I'm looking for a way to dump the contents of a URL (in plaintext, all of the HTML) to a text file, and then search that file for the amount of times a certain string occurs.

    Currently, I'm using the following:

    PHP Code:
    $url "www.url.com";
    $url_contents file_get_contents($url);

    $datadump 'datadump.txt';


    if (!
    $handle fopen($datadump'a')) {
             echo 
    "Cannot open file ($datadump)";
             exit;
    }
    if (
    fwrite($handle$url_contents) === FALSE) {
         echo 
    "Cannot write to file ($filename)";
         exit;

    This is only saving around 21kb of data per run (it's not the same repeated data, it's parsing in 21kb chunks), which is the first problem.

    For the searching, I tried dumping the URL contents to a string and then searching it like this:

    PHP Code:
    $url "www.url.com";
    $url_contents file_get_contents($url);

    $count preg_match_all("/string/"$url_contents$matches);
    print 
    $count "<br />"
    However, this is only returning the same results as if I searched the file the first piece of code created for the same string manually, leading me to assume that it's ending up as the same file size and there's something I'm doing wrong.

    I'm highly open to suggestions if there's a faster/more efficient way of doing what I'm trying to do, I just can't seem to find a way to do what I'm looking for.

  • #2
    Regular Coder
    Join Date
    May 2011
    Posts
    240
    Thanks
    1
    Thanked 56 Times in 55 Posts
    Try this
    PHP Code:
    <?php
    $url 
    "http://www.domain.com/";
    $content = @file_get_contents($url);
    if (!
    $content)
            exit;
    $datadump 'datadump.txt'
    $pattern '#\bstring\b#si'// OR #string#si for non word boundary matches
    $count preg_match_all($pattern$content$m);
    file_put_contents($datadump$content);


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •