Advice - best way to use web scrapping for web project
I am looking for a bit of help on this project idea I have decided to launch.
1) Project Details:
The web project I have in mind is a website that will show the largest online database of cosmetics.
Basically, any product of any cosmetic brand will be on the website. (Or at least that is the objective here).
To reach this, I was thinking of using web scrapping to collect the info on each product, and save it in a database.
I have no idea how this can be technically implemented.
I wanted to use Joomla as the CMS tool and combine it with the web scrapping feature so the site can be updated on a weekly/monthly.
Advice would be welcome here!
Perhaps there is another way to do it that I haven’t thought about??
In addition to this, there will be additional features such as:
2) Payment method/ details (Paypal, check? Timeline?):
I have not thought about how much it would cost to do it.
I am still trying to find the best way to do it.
It is a serious project and I have a budget for it.
Technically possible? Definitely.
Complicated? That really depends on the details: How many sites need to be scraped? How much information is being gathered? How will you aggregate data from all these different sites? You'll want to complete a detailed analysis of your project before writing a single line of code.
Legal? Probably not... you'll have to look up the terms and conditions of every site you scrape to see if they're cool with you lifting their data. If in doubt, send an email to the website owner and ask. If your site will be competing with them, obviously they will have a problem with you using their data, and they may take legal action to shut you down-- especially if you're successful/popular. This is not something you want to ignore; if you don't obtain written permission up-front, you may go through a lot of work only to be shut down overnight. And believe me when I say, hosting companies do not mess around with sites they're hosting that may be in a legal grey area-- they will shut you down first and ask questions later.
If you get a green light to scrape, then you should put together a business plan, assuming you intend to make money on this website. A good business plan will help you determine where the money will come from, where your visitors (customers) will come from, how you will market the site, who you will market the site to, how soon you will expect to be profitable, etc. etc. etc.
Not to mention the legal issues that could be involved. Written permission or no.
PLUS, whenever a website design is changed, or even just a different attribute if used for the mask in the scraping, your site breaks. Every time.
I'm no lawyer but I would be comfortable scraping a site if I had a letter from the site owner giving me permission to scrape it. As long as that website has original data! I guess if that site was illegally obtaining their data, and I knew that, then permission wouldn't be enough to keep me out of trouble. It's definitely a sticky issue.
As for the guy's website breaking every time the sites being scraped get updated-- let's just clarify what would actually break, and that is the scrape-and-update process. The website itself will be fine, it's just the data won't be updated until the scraper is fixed. Important distinction.
I'm no lawyer either, but I see nothing wrong with "searching the web for the best prices and displaying the results in a better format for the end user". Call it scraping if you want, but google and all major search engines do it. Once they're under the gun for copyright infringement, then I'll worry ;)
|All times are GMT +1. The time now is 05:24 PM.|
Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.