logo
languageENdown
menu

AI Web Scraping: Scrape Ecommerce Website with Auto-detection

3 min read

We have created a series of web scraping tutorials for you to get on board quickly with our latest version Octoparse 8. By the end of the series, you will be able to build a crawler from scratch and pull data from any website you want. 

In this lesson, we will go through how to scrape eCommerce data using the auto-detect algorithm on Octoparse 8. This is also a no-code web scraping example for scraping eCommerce data. You can follow us to build such a crawler for practice. Let’s begin this eCommerce web scraping journey without coding.

Most of the websites share similar layouts. For example, eBay is a webpage containing many items nested in a list. 

Octoparse’s brand new auto-detect algorithm is specially designed to scrape these kinds of pages. It automatically detects for listing data (including text elements and links), “Next page” buttons, “load more” buttons and scrolls down the page, and then it generates the scraping task automatically.

Step 1: Create a new task

Enter the example URL into the search box. Click “Start” to create a new task.

Step 2: Get data via auto-detect

Octoparse will load the webpage URL in the built-in browser and start the auto-detect process. Please wait patiently until the process is complete and when more info is provided on the “Tips” panel. 

Step 3: Check the data

Once the auto-detection is complete, follow the instructions provided on “Tips” and check your data in the preview section. You can rename the data fields or remove those that are not needed. The detected data will also be highlighted on the webpage for you. 

Step 4: Confirm your options

Now, go to “Tips” and check your options. Based on the type of data detected, a number of options are provided for you to choose from. In this example, the listed data is detected so you are provided with the option to:

Option 1: Scrape the data in the list 

This option is selected by default as Octoparse thinks this is what you need to do for sure. 

Option 2:  Click the “Next” button to capture multiple pages 

Apparently, Octoparse has detected a “Next” button on the page. Check this option if you want Octoparse to click the “Next” button to scrape data from more pages.

To find out if the button detected is the correct one, click “Check” and see if it gets highlighted on the webpage. If you need to re-select the “Next” button, click “Edit” and follow the instructions on “Tips”. 

Now Octoparse is asking if you want to click on the links detected and scrape more information from the detail pages. Check this option if this is what you need.

To confirm if the links are the ones you’d like to click through, click “Check” to have the links highlighted on the web page. 

In this case, we only want to scrape the list information across all pages, so we’ll go ahead and check the first and the second option. 

Step 5: Save task settings

Octoparse will generate a workflow automatically based on the data detected and the saved settings. You can choose to run the task now or edit the workflow manually.

If everything looks good, you can hit save and run to get your data.  

Don’t forget to practice with the HelloWorld test site. If you encounter any difficulties, feel free to submit a ticket or email us at support@octoparse.com. To know how to optimize your task, you can check out lesson 2

Get Web Data in Clicks
Easily scrape data from any website without coding.
Free Download

Hot posts

Explore topics

image
Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today

Free Download

Related Articles