Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9

Thread: webscraper

  1. #1
    New Coder
    Join Date
    Feb 2007
    Posts
    43
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question webscraper

    HI,
    I want to build a php program that can crawl into sitemap of different websites and then store the data (like property listings of real estate) into my own database. Currently, I'm only being able to store the links. Please help!

  • #2
    Master Coder mlseim's Avatar
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,397
    Thanks
    8
    Thanked 1,078 Times in 1,069 Posts
    It involves using PHP and a ton of scripting to parse each domain,
    because each domain is different. This will be a big scripting project,
    and will take you a lot of time to do. You might want to hire someone.

  • #3
    New Coder
    Join Date
    Oct 2007
    Posts
    84
    Thanks
    0
    Thanked 8 Times in 8 Posts
    Before you move any further, you need to check with your state's real estate commission and your local association of Realtors on the rules for IDX/ILD in that particular area.

    As for the problem at hand, you can run through the remote file line by line looking for traits of information that you would want to import. But its not worth the time involved when you can get an FTP pull of the information every night from your local board of realtors.

  • #4
    Master Coder mlseim's Avatar
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,397
    Thanks
    8
    Thanked 1,078 Times in 1,069 Posts
    Even if you have to pay a monthly or yearly fee for FTP access to
    a realtor database ... I think it would be worth it. That would be the
    best solution to this. The information would all be in one place and
    accessible in an easy way.

  • #5
    New Coder
    Join Date
    Feb 2007
    Posts
    43
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi! Now, i need to scrap data from one site only. But th esit contains huge amount of data.

  • #6
    New Coder
    Join Date
    Feb 2007
    Posts
    43
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I developed a code to navigate each link on the sitemap, open those pages, read the page structure parse the page and then insert the required data to my database. But this is a crap since it takes a hell lot of time. Any suggestion on easier an faster methos is mopst welcome.

  • #7
    New Coder
    Join Date
    Feb 2007
    Posts
    43
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ok, I have used cURL. But still, the site I want to scrap has hundreds of links. And when I open all those links and parse the HTML, its the same problem again - the time.

  • #8
    Super Moderator Inigoesdr's Avatar
    Join Date
    Mar 2007
    Location
    Florida, USA
    Posts
    3,638
    Thanks
    2
    Thanked 404 Times in 396 Posts
    PHP isn't really the most efficient tool to scrape large sites with(or many small sites), and you can't do anything about the time it takes to scrape it other than caching images & other large items.

  • #9
    New Coder
    Join Date
    Feb 2007
    Posts
    43
    Thanks
    0
    Thanked 0 Times in 0 Posts
    So what is the solution to my problem? Is there any solution?


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •