Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    New to the CF scene
    Join Date
    Dec 2008
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Custom Spider/Scraper - Help!

    Hey!

    Im new over here, so a small introduction. I am from Canada, Toronto, and run a small financially focused website.

    The problem - many financial insitutions publish their data online, and update it on daily basis. There are over 60 institutions, and to follow each one is very challenging. I want to create a summary page with financial data from those institutions. Release a spider once a day, get their updates, and then post them all together on the website.

    Obviosuly copy&paste is off the table since it takes at least 1.5 hour to go through all lenders and get their data. The only possible solution it seems is to set up a custom spider who will crawl specific fields (div tags, table cells), extract data and compile it into one file. The question is - do you know any software that is capable of doing this? I know there are plenty of scrapers out there, but the requirement for a spider is to be able to extract data from specified table cells and in some cases div tags.

    I cant go to a data extraction company since they charge too much (do they?). Please let me know if you're aware of any applications that can match those requrements.

    Any help guys! Thanks!

  • #2
    New to the CF scene
    Join Date
    Nov 2008
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You are right - the problem can't be solved easily. To collect data from each source is not a problem, but to bring this all to one format is a unique job. I'm looking for such service.
    In my mind it could be an online service which is told what pages to grab and aoutomatically cut required info form them with sending results via email for example.
    May be someone already found this?

  • #3
    Senior Coder
    Join Date
    Oct 2008
    Location
    Long Beach
    Posts
    1,196
    Thanks
    36
    Thanked 164 Times in 164 Posts
    I could build one for your specific needs without too much problem. All this project really needs is a one-time effort of visiting each of your pages and locating exactly where the data is displayed on each page.

    After that, the only maintenance required would be to check your results every so often to make sure the source website(s) haven't changed the location/method of output.

    Feel free to email me if you want a faster response.
    Feel free to e-mail me if I forget to respond ;)
    ohsosexybrit@gmail.com

  • #4
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by IvanSEO View Post
    Hey!

    Im new over here, so a small introduction. I am from Canada, Toronto, and run a small financially focused website.

    The problem - many financial insitutions publish their data online, and update it on daily basis. There are over 60 institutions, and to follow each one is very challenging. I want to create a summary page with financial data from those institutions. Release a spider once a day, get their updates, and then post them all together on the website.

    Obviosuly copy&paste is off the table since it takes at least 1.5 hour to go through all lenders and get their data. The only possible solution it seems is to set up a custom spider who will crawl specific fields (div tags, table cells), extract data and compile it into one file. The question is - do you know any software that is capable of doing this? I know there are plenty of scrapers out there, but the requirement for a spider is to be able to extract data from specified table cells and in some cases div tags.

    I cant go to a data extraction company since they charge too much (do they?). Please let me know if you're aware of any applications that can match those requrements.

    Any help guys! Thanks!
    can you post a link to one of the sites or something closer to what you need to extract?

    regards


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •