Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    New Coder
    Join Date
    Apr 2012
    Posts
    91
    Thanks
    7
    Thanked 0 Times in 0 Posts

    What to use for Web scraping?

    I need to scrape a Chinese website. The api looks dodgy, my virus checker won't let me go to the website as it says its flagged as unsafe.

    I searched on Google and found one article that comes 1st or 2nd in results.

    At the top of the list is Goutte.
    Great I thought. I'll use that.
    I can't find any YouTube videos on it.
    Not a single one (there might be some foreign language ones + I only looked on the first page of results).

    Hmm odd. But then I looked at the 2nd and 3rd and 4th suggested code. I can't seem to find YouTube videos.

    YouTube videos seem to use curl?

    Just want a few recommendations.

    Thanks.

  2. #2
    Master Coder Dormilich's Avatar
    Join Date
    Jan 2010
    Location
    Behind the Wall
    Posts
    5,846
    Thanks
    26
    Thanked 609 Times in 602 Posts
    Quote Originally Posted by OM2 View Post
    I can't seem to find YouTube videos.
    I wouldn't recommend YouTube as primary documentation source... Instead check out the project itself first: https://github.com/FriendsOfPhp/Goutte
    The computer is always right. The computer is always right. The computer is always right. Take it from someone who has programmed for over ten years: not once has the computational mechanism of the machine malfunctioned.
    André Behrens, NY Times Software Developer

  3. #3
    New Coder
    Join Date
    Apr 2012
    Posts
    91
    Thanks
    7
    Thanked 0 Times in 0 Posts
    @Dormilich youtube is great i think for seeing someone walk through.
    Would you recommend yourself using Goutte?
    This was all I was after - just a solid recommendation of what to use.
    Let me know.
    Thanks.

  4. #4
    Master Coder Dormilich's Avatar
    Join Date
    Jan 2010
    Location
    Behind the Wall
    Posts
    5,846
    Thanks
    26
    Thanked 609 Times in 602 Posts
    I never needed web scraping, so I can't tell.
    The computer is always right. The computer is always right. The computer is always right. Take it from someone who has programmed for over ten years: not once has the computational mechanism of the machine malfunctioned.
    André Behrens, NY Times Software Developer

  5. #5
    Supreme Master coder!
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    10,390
    Thanks
    10
    Thanked 1,191 Times in 1,181 Posts
    Can you describe what you need to scrape and even post a link to it?
    I realize you think it might not be "safe", but is it a website that is Rated-G?

    The mention of CURL is a PHP method of accessing API's. Do you have to log into the site and use the API?

    If I knew (and could see) the information you are trying to scrape, I might have a better answer (or not).

  6. #6
    New Coder
    Join Date
    Apr 2012
    Posts
    91
    Thanks
    7
    Thanked 0 Times in 0 Posts
    @mlseim see these 3 pages:

    https://detail.1688.com/offer/598596300863.html
    https://detail.1688.com/offer/596432525660.html
    https://detail.1688.com/offer/596794895300.html

    the first, you can choose size and colour.
    the second, you can choose colour (one size fits all).
    the third, no choice (one size fits all).

    From these, I need: pics from the gallery and page and variations, sizes and colours - if they are there.
    I need to create a Woocommerce product from there.

    Any ideas?

    (Guys... I've been trying to reply to this thread for ages - I think there's a bug on the website - the reply box kept disappearing.)

  7. #7
    New Coder
    Join Date
    Apr 2012
    Posts
    91
    Thanks
    7
    Thanked 0 Times in 0 Posts
    Anyone?

  8. #8
    Senior Coder djm0219's Avatar
    Join Date
    Aug 2003
    Location
    North Carolina
    Posts
    1,561
    Thanks
    5
    Thanked 250 Times in 247 Posts
    What API that you mentioned in your first post looks "dodgy"? Any API is likely to be more accurate and reliable than trying to scrape a site like that. And it's likely that scraping is not something that the site approves of which is probably why they have provided an API. Using their API, such as it is, is likely your best course of action. If you run into problems with it I would let them know and ask them to fix it.
    Dave .... HostMonster for all of your hosting needs

  9. #9
    Supreme Master coder!
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    10,390
    Thanks
    10
    Thanked 1,191 Times in 1,181 Posts
    There's no way to scrape that. I see it's a sister site of Alibaba ( Alibaba's online wholesale marketplace in China ).
    You will have to find out if they offer an API, or even an XML RSS feed if that's at least possible. I doubt it though.

  10. #10
    New Coder
    Join Date
    Dec 2014
    Posts
    28
    Thanks
    6
    Thanked 0 Times in 0 Posts
    Hello if you would like to scrape anything it can easily be done through python or php if you want to go python way just google it if you would like to go php way google php dom parser and download the files and get started or best way to scrape anything and store is to download and install webscraper.io extension in google chrome there are lots of video you can use as a guide it will scrape anything on the net from any website. One you have successfully scrape the data download the csv file then go to google and just search csv to json converter do that once you get that go back to google and search for json to sql converter and convert the file to .sql then simply import to database and you will be all set.


 

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •