Go Back   CodingForums.com > Web Projects and Services Marketplace > Web Projects > Large Projects (new web application, complex features etc)

Notices

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 11-30-2012, 12:02 PM   PM User | #1
kahina
New to the CF scene

 
Join Date: Nov 2012
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
kahina is an unknown quantity at this point
Question Advice - best way to use web scrapping for web project

I am looking for a bit of help on this project idea I have decided to launch.

1) Project Details:
The web project I have in mind is a website that will show the largest online database of cosmetics.

Basically, any product of any cosmetic brand will be on the website. (Or at least that is the objective here).

To reach this, I was thinking of using web scrapping to collect the info on each product, and save it in a database.
I have no idea how this can be technically implemented.
I wanted to use Joomla as the CMS tool and combine it with the web scrapping feature so the site can be updated on a weekly/monthly.

Questions:
  1. That is technically possible?
  2. If it is complicated to develop?
  3. If it is legally possible?

Advice would be welcome here!
Perhaps there is another way to do it that I haven’t thought about??


In addition to this, there will be additional features such as:
  • User accounts
  • Add comments
  • Social media feature
  • Comparative feature
    Etc…


2) Payment method/ details (Paypal, check? Timeline?):
I have not thought about how much it would cost to do it.
I am still trying to find the best way to do it.
It is a serious project and I have a budget for it.
kahina is offline   Reply With Quote
Old 11-30-2012, 02:40 PM   PM User | #2
Fumigator
UE Antagonizer


 
Fumigator's Avatar
 
Join Date: Dec 2005
Location: Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
Posts: 7,686
Thanks: 42
Thanked 637 Times in 625 Posts
Fumigator is a glorious beacon of lightFumigator is a glorious beacon of lightFumigator is a glorious beacon of lightFumigator is a glorious beacon of lightFumigator is a glorious beacon of light
Technically possible? Definitely.

Complicated? That really depends on the details: How many sites need to be scraped? How much information is being gathered? How will you aggregate data from all these different sites? You'll want to complete a detailed analysis of your project before writing a single line of code.

Legal? Probably not... you'll have to look up the terms and conditions of every site you scrape to see if they're cool with you lifting their data. If in doubt, send an email to the website owner and ask. If your site will be competing with them, obviously they will have a problem with you using their data, and they may take legal action to shut you down-- especially if you're successful/popular. This is not something you want to ignore; if you don't obtain written permission up-front, you may go through a lot of work only to be shut down overnight. And believe me when I say, hosting companies do not mess around with sites they're hosting that may be in a legal grey area-- they will shut you down first and ask questions later.

If you get a green light to scrape, then you should put together a business plan, assuming you intend to make money on this website. A good business plan will help you determine where the money will come from, where your visitors (customers) will come from, how you will market the site, who you will market the site to, how soon you will expect to be profitable, etc. etc. etc.
__________________
Fumigator is offline   Reply With Quote
Old 11-30-2012, 03:02 PM   PM User | #3
WolfShade
Regular Coder

 
Join Date: Apr 2012
Location: St. Louis, MO, USA
Posts: 945
Thanks: 7
Thanked 97 Times in 97 Posts
WolfShade is an unknown quantity at this point
Not to mention the legal issues that could be involved. Written permission or no.

PLUS, whenever a website design is changed, or even just a different attribute if used for the mask in the scraping, your site breaks. Every time.
__________________
^_^

If anyone knows of a website that can offer ColdFusion help that isn't controlled by neurotic, pedantic jerks* (stackoverflow.com), please PM me with a link.
*
The neurotic, pedantic jerks are not the owners; just the people who are in control of the "popularity contest".
WolfShade is offline   Reply With Quote
Old 11-30-2012, 04:36 PM   PM User | #4
Fumigator
UE Antagonizer


 
Fumigator's Avatar
 
Join Date: Dec 2005
Location: Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
Posts: 7,686
Thanks: 42
Thanked 637 Times in 625 Posts
Fumigator is a glorious beacon of lightFumigator is a glorious beacon of lightFumigator is a glorious beacon of lightFumigator is a glorious beacon of lightFumigator is a glorious beacon of light
I'm no lawyer but I would be comfortable scraping a site if I had a letter from the site owner giving me permission to scrape it. As long as that website has original data! I guess if that site was illegally obtaining their data, and I knew that, then permission wouldn't be enough to keep me out of trouble. It's definitely a sticky issue.

As for the guy's website breaking every time the sites being scraped get updated-- let's just clarify what would actually break, and that is the scrape-and-update process. The website itself will be fine, it's just the data won't be updated until the scraper is fixed. Important distinction.
__________________
Fumigator is offline   Reply With Quote
Old 11-30-2012, 10:30 PM   PM User | #5
bcarl314
Mega-ultimate member


 
Join Date: Jun 2002
Location: Winona, MN - The land of 10,000 lakes
Posts: 1,855
Thanks: 1
Thanked 45 Times in 42 Posts
bcarl314 will become famous soon enough
I'm no lawyer either, but I see nothing wrong with "searching the web for the best prices and displaying the results in a better format for the end user". Call it scraping if you want, but google and all major search engines do it. Once they're under the gun for copyright infringement, then I'll worry
bcarl314 is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:27 AM.


Advertisement
Log in to turn off these ads.