Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    New to the CF scene
    Join Date
    Nov 2012
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Advice - best way to use web scrapping for web project

    I am looking for a bit of help on this project idea I have decided to launch.

    1) Project Details:
    The web project I have in mind is a website that will show the largest online database of cosmetics.

    Basically, any product of any cosmetic brand will be on the website. (Or at least that is the objective here).

    To reach this, I was thinking of using web scrapping to collect the info on each product, and save it in a database.
    I have no idea how this can be technically implemented.
    I wanted to use Joomla as the CMS tool and combine it with the web scrapping feature so the site can be updated on a weekly/monthly.

    Questions:
    1. That is technically possible?
    2. If it is complicated to develop?
    3. If it is legally possible?


    Advice would be welcome here!
    Perhaps there is another way to do it that I haven’t thought about??


    In addition to this, there will be additional features such as:
    • User accounts
    • Add comments
    • Social media feature
    • Comparative feature
      Etc…



    2) Payment method/ details (Paypal, check? Timeline?):
    I have not thought about how much it would cost to do it.
    I am still trying to find the best way to do it.
    It is a serious project and I have a budget for it.

  • #2
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    Technically possible? Definitely.

    Complicated? That really depends on the details: How many sites need to be scraped? How much information is being gathered? How will you aggregate data from all these different sites? You'll want to complete a detailed analysis of your project before writing a single line of code.

    Legal? Probably not... you'll have to look up the terms and conditions of every site you scrape to see if they're cool with you lifting their data. If in doubt, send an email to the website owner and ask. If your site will be competing with them, obviously they will have a problem with you using their data, and they may take legal action to shut you down-- especially if you're successful/popular. This is not something you want to ignore; if you don't obtain written permission up-front, you may go through a lot of work only to be shut down overnight. And believe me when I say, hosting companies do not mess around with sites they're hosting that may be in a legal grey area-- they will shut you down first and ask questions later.

    If you get a green light to scrape, then you should put together a business plan, assuming you intend to make money on this website. A good business plan will help you determine where the money will come from, where your visitors (customers) will come from, how you will market the site, who you will market the site to, how soon you will expect to be profitable, etc. etc. etc.

  • #3
    Regular Coder
    Join Date
    Apr 2012
    Location
    St. Louis, MO
    Posts
    985
    Thanks
    7
    Thanked 101 Times in 101 Posts
    Not to mention the legal issues that could be involved. Written permission or no.

    PLUS, whenever a website design is changed, or even just a different attribute if used for the mask in the scraping, your site breaks. Every time.
    ^_^

    If anyone knows of a website that can offer ColdFusion help that isn't controlled by neurotic, pedantic jerks* (stackoverflow.com), please PM me with a link.
    *
    The neurotic, pedantic jerks are not the owners; just the people who are in control of the "popularity contest".

  • #4
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    I'm no lawyer but I would be comfortable scraping a site if I had a letter from the site owner giving me permission to scrape it. As long as that website has original data! I guess if that site was illegally obtaining their data, and I knew that, then permission wouldn't be enough to keep me out of trouble. It's definitely a sticky issue.

    As for the guy's website breaking every time the sites being scraped get updated-- let's just clarify what would actually break, and that is the scrape-and-update process. The website itself will be fine, it's just the data won't be updated until the scraper is fixed. Important distinction.

  • #5
    Mega-ultimate member
    Join Date
    Jun 2002
    Location
    Winona, MN - The land of 10,000 lakes
    Posts
    1,855
    Thanks
    1
    Thanked 45 Times in 42 Posts
    I'm no lawyer either, but I see nothing wrong with "searching the web for the best prices and displaying the results in a better format for the end user". Call it scraping if you want, but google and all major search engines do it. Once they're under the gun for copyright infringement, then I'll worry


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •