Advice - best way to use web scrapping for web project
Notices
Welcome to the "Web Projects" forums. Before you post, make sure you've selected the correct category based on the size of your project (in US dollars), or whether it is a partnership or "looking for work" request instead. Observe the guidelines for this section:
To all thread starters, check your "pm" box for responses, as that's how bidders will be communicating with you mainly using. You can turn on new pm notification by going to "User CP"-> "Edit Options"-> "Receive Email Notification of New Private Messages".
To respondents, avoid replying to a thread with a short generic response like "pm sent" as much as possible. It is pointless and clutters up the screen.
To respondents, don't reply to a work request with a summary of your resume. If you're not willing to spend the time to at least read the request and respond specifically to the request, then don't reply. The Web Projects forum isn't a place to get your resume/ portfolio posted on as many threads as possible, it's for the direct, human interaction between coder and client. Members who blanket the forums with their resume replies will be treated as spammers and banned.
To all thread starters, please update your thread's prefix to "resolved", when you've found a coder for your project or partnership (or wish to terminate the request). This is so coders don't waste time bidding on an invalid project. To do this, click on the "Edit" button under the first post of the thread, then the "Go Advanced" button. You have 14 days to update your thread's title.
Advice - best way to use web scrapping for web project
I am looking for a bit of help on this project idea I have decided to launch.
1) Project Details:
The web project I have in mind is a website that will show the largest online database of cosmetics.
Basically, any product of any cosmetic brand will be on the website. (Or at least that is the objective here).
To reach this, I was thinking of using web scrapping to collect the info on each product, and save it in a database.
I have no idea how this can be technically implemented.
I wanted to use Joomla as the CMS tool and combine it with the web scrapping feature so the site can be updated on a weekly/monthly.
Questions:
That is technically possible?
If it is complicated to develop?
If it is legally possible?
Advice would be welcome here!
Perhaps there is another way to do it that I haven’t thought about??
In addition to this, there will be additional features such as:
User accounts
Add comments
Social media feature
Comparative feature
Etc…
2) Payment method/ details (Paypal, check? Timeline?):
I have not thought about how much it would cost to do it.
I am still trying to find the best way to do it.
It is a serious project and I have a budget for it.
Location: Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
Posts: 7,687
Thanks: 42
Thanked 637 Times in 625 Posts
Technically possible? Definitely.
Complicated? That really depends on the details: How many sites need to be scraped? How much information is being gathered? How will you aggregate data from all these different sites? You'll want to complete a detailed analysis of your project before writing a single line of code.
Legal? Probably not... you'll have to look up the terms and conditions of every site you scrape to see if they're cool with you lifting their data. If in doubt, send an email to the website owner and ask. If your site will be competing with them, obviously they will have a problem with you using their data, and they may take legal action to shut you down-- especially if you're successful/popular. This is not something you want to ignore; if you don't obtain written permission up-front, you may go through a lot of work only to be shut down overnight. And believe me when I say, hosting companies do not mess around with sites they're hosting that may be in a legal grey area-- they will shut you down first and ask questions later.
If you get a green light to scrape, then you should put together a business plan, assuming you intend to make money on this website. A good business plan will help you determine where the money will come from, where your visitors (customers) will come from, how you will market the site, who you will market the site to, how soon you will expect to be profitable, etc. etc. etc.
Not to mention the legal issues that could be involved. Written permission or no.
PLUS, whenever a website design is changed, or even just a different attribute if used for the mask in the scraping, your site breaks. Every time.
__________________ ^_^
If anyone knows of a website that can offer ColdFusion help that isn't controlled by neurotic, pedantic jerks* (stackoverflow.com), please PM me with a link.
* The neurotic, pedantic jerks are not the owners; just the people who are in control of the "popularity contest".
Location: Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
Posts: 7,687
Thanks: 42
Thanked 637 Times in 625 Posts
I'm no lawyer but I would be comfortable scraping a site if I had a letter from the site owner giving me permission to scrape it. As long as that website has original data! I guess if that site was illegally obtaining their data, and I knew that, then permission wouldn't be enough to keep me out of trouble. It's definitely a sticky issue.
As for the guy's website breaking every time the sites being scraped get updated-- let's just clarify what would actually break, and that is the scrape-and-update process. The website itself will be fine, it's just the data won't be updated until the scraper is fixed. Important distinction.
I'm no lawyer either, but I see nothing wrong with "searching the web for the best prices and displaying the results in a better format for the end user". Call it scraping if you want, but google and all major search engines do it. Once they're under the gun for copyright infringement, then I'll worry