logo
languageENdown
menu

How to Scrape Websites Using Google Sheets Effortless

10 min read

Have you ever thought that Google Sheets can do web scraping for you? The truth is, as a powerful cloud-based tool, Google Sheets can scrape data from websites. With its powerful functions, like IMPORTXML and IMPORTHTML, users can extract data directly from websites, simplifying data collection and analysis tasks. With the scraped data, you can share it directly with your team or friends or integrate it with other third-party tools.

In this guide, we’ll walk you through the simple process of web scraping with Google Sheets and show you how to use these powerful functions to collect data from various websites.

Introduction to Web Scraping with Google Sheets

Google Sheets offers an easy and accessible way for both beginners and professionals to scrape data from websites. Whether you want to pull specific data points or collect tables and lists, Google Sheets provides built-in tools that can help automate the process without needing to write complex code.

The main functions used for web scraping in Google Sheets are IMPORTXML and IMPORTHTML, both of which allow you to import data from websites by specifying the URL and relevant data elements (like XPath or HTML table).

In the following parts, we’ll guide you through using these functions to scrape data and discuss how they can simplify your data extraction process.

5 Methods of Google Sheets for Web Scraping

Method 1: Using ImportXML in Google Spreadsheets

ImportXML is a function in Google Spreadsheets that allows you to import data from structured sources like XML, HTML, CSV, TSV, and RSS feeds using XPath queries.

Here’s what it looks like:

=IMPORTXML(URL, xpath_query, locale)
  • URL: The URL of the webpage.
  • xpath_query: An XPath query to select the desired data.
  • locale: A language and region locale code to use when parsing the data. If unspecified, the document locale is used.

Below, I will show you an example of how to scrap the titles of the latest articles from the Octoparse Blog.

To do this, first, you need to create a new spreadsheet or open an existing one. Then click on the cell where you want the imported data to appear and enter the URL and the XPath of the element that you want to extract.

You would need to get the Xpath of the titles of the articles by inspecting(right click and you will see inspect) the target element in the website.

Here,

  1. URL: https://www.octoparse.com/blog
  2. XPath Query: //h2

//h2 means the exact Xpath of all the blog titles on page one of the Octoparse blog.

So the complete function would be:

=IMPORTXML("https://www.octoparse.com/blog", "//h2")

Enter the function in the cell where you want the imported data to appear. You might also need to click access data to allow Google Sheets to import data from the outside.

After a few seconds of loading, you will get all the titles of page one blog.

Using ImportXML in Google Spreadsheets

1. You can also replace the URL with a cell as long as you put the URL into the cell. Here, see A3.
2. Remember to add quotation marks when you put the Xpath and the URL. If you replace them with cell content, then no quotation marks are needed.
3. If there are already quotation marks, the quotation marks inside the Xpath need to be single. For example, “//div[@class=’author’]/span”. Wrong spelling like “//div[@class=”author”]/span” leads to an error.
4. You can ignore the “Locale” if you want the document locale.
If you are not familiar with XPath, find the XPath cheat sheet.

You can continue to fill the sheet with other data by following the same steps above.

Example Google sheet: https://docs.google.com/spreadsheets/d/18Tkp6rP9p1Az_AlXyOJlrqVxwESpTDOf7MSIweZnOD0/edit#gid=0

Using ImportXML in Google Spreadsheets

Method 2: Using ImportHTML in Google Spreadsheets

The IMPORTHTML function in Google Sheets is designed to fetch data from tables and lists within HTML pages. To use it, you would need the target website’s URL and the query, which is either “table” or “list”.

The function would look like this:

=IMPORTHTML(url, query, index)
  • url: The URL of the webpage containing the data.
  • query: “table” or “list”, depending on what you want to extract.
  • index: The position of the table or list on the webpage (starting from 1).

In this example, I will scrape a table from one of Octoparse’s blogs.

To begin with, first create a new sheet and then open the target URL.

Using ImportHTML in Google Spreadsheets

Here, you can see the HTML that our target table is labeled <table>; If you want to scrape list, then the label would be <list>.

To scrape the table from the blog, we need to enter the function IMPORTHTML into the cell where we want the imported data to appear.

Enter:

=IMPORTHTML("https://www.octoparse.com/blog/top-web-crawler-tools-comparison", "table", 1)

Then you will have the table loaded.

Using ImportHTML in Google Spreadsheets

You can continue scraping by changing the table index to 2, so you will get the second table in the blog.

Using ImportHTML in Google Spreadsheets

Tips:

1. You will need to add double quotation marks to the query.
2. You can replace the URL with a cell where the URL is populated.

Method 3: Using ImportDATA in Google Spreadsheets

The IMPORTDATA function imports data from a given URL in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format directly into Google Sheets.

All you need is just one url of your target website. Simple, is it?

Let’s say we need to scrape data from the website: https://catalog.data.gov/dataset/?res_format=CSV

On the website, we can see the CSV file through the link: https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD (hover to the button CSV and right click to copy the address)

get the CSV file for your IMPORTDATA in google sheets.

To import the data into your Google Sheets, first, you would need to create a new sheet like what we did in the last two methods. And use the function below:

=IMPORTDATA(A3,”,”)

A3: is where your .csv file link is located

“,”: separates the data fields in the imported file

Then you will get the imported data.

Using ImportDATA in Google Spreadsheets

Method 4: Using ImportFeed in Google Spreadsheets

The IMPORTFEED function fetches RSS or ATOM feed data from a given URL.

It looks like this:

=IMPORTFEED(url, [query], [headers], [num_items])

  • url: The URL of the RSS or ATOM feed.
  • query: [Optional] A query to fetch specific items. Default is "" (empty string), which fetches all items.
  • headers: [Optional] Whether to include headers. Default is TRUE.
  • num_items: [Optional] The number of items to fetch. Default is 20.

To do this, we will need a podcast’s RSS feed: https://rss.com/podcasts/sportsify/

Then enter the function:

=IMPORTFEED(A4,””,””,10)

Here I have input the RSS feed link in A4. And I want to pull 10 feeds out.

And you will get the result real quick.

Using ImportFeed in Google Spreadsheets

Method 5: Using ImportRange in Google Spreadsheets

The IMPORTRANGE function in Google Sheets allows you to import data from one Google Sheets spreadsheet into another. This is particularly useful for consolidating data from multiple sources or sharing data across different sheets while keeping them dynamically linked.

Here’s what it looks like:

=IMPORTRANGE(spreadsheet_url, range_string)

  • spreadsheet_url: The URL of the spreadsheet from which you want to import data. This should be in quotation marks.
  • range_string: A string that specifies the range of cells to import. This is typically in the format "SheetName!A1:D10".

In this example, I need to import data from another spreadsheet(https://docs.google.com/spreadsheets/d/1pJGLJr2o3PxRUe9HFOcAlAoRQnIsai55KvEpzPY2FSg/edit#gid=0) to the one I am editing.

To do this, enter the function:

=IMPORTRANGE(A3, “sheet1!a1:b”)

  • A3: is where the source data is stored.
  • sheet1!a1:b: I need the data from sheet1 of the target sheet. And the data range is from A1 to B column.

Then, you can get the data after it finishes loading.

Using ImportRange in Google Spreadsheets

Limitations of Google Sheets in Web Scraping

Although these 5 functions simplify our way of importing data from websites to Google Sheets, there are many limitations.

Data Volume

First, these functions cannot import large volumes of data which involve pagination. As you can see in the IMPORTXML example, we can only scrape one page of data. If you need all the pages, you will need to do it by changing the URL continuously.

Dynamic Content

The above-mentioned functions can only scrape static and structured data. While dynamic data loaded with Javascript and AJAX is out of their reach.

Rate Limits

If you request the data too frequently, you may trigger Google’s rate limiting, preventing you from further data extraction temporarily.

Constant Maintenance

Since some functions rely on the HTML of the website to locate the data, once the structure of the website changes, your data will be affected and become invalid.

Alternative: Scrape Data Without Any Coding

If you need more than just a table or list of data, and you don’t want to handle script writing, then a no-code web scraping software like Octoparse might be your next perfect solution. It provides the auto-detecting function to help you recognize the data fields automatically, and its advanced functions help you make more customization.

Turn website data into structured Excel, CSV, Google Sheets, and your database directly.

Scrape data easily with auto-detecting functions, no coding skills are required.

Preset scraping templates for hot websites to get data in clicks.

Never get blocked with IP proxies and advanced API.

Cloud service to schedule data scraping at any time you want.

Octoparse also gives preset data scraping templates for popular websites like Amazon, eBay, Google Maps, LinkedIn, etc. You can preview the data sample first and start scraping within a few parameters and clicks. Try the Google Maps online scraping templates below:

https://www.octoparse.com/template/google-maps-contact-scraper

Other features you can get from Octoparse custom scraping:

  • Custom scrapers service and data service.
  • Anti-blocking settings: CAPTCHAs, IP rotation, User agents, intervals, and Log-in.
  • Complex site handled: infinite scrolling, dropdown, hover, retry, AJAX loading.
  • Cloud extraction and data storage.
  • Free plan and 14-day free trial for Standard and Professional plans.
  • Pay-as-you-go residential proxies; 99 Million+ IPs from 155 countries.
  • Highly responsive support team: 24/7 via live chat in intercom
  • Third-party app integration via Octoparse RPA; API access.

Reading Octoparse Help Center if you still have any questions on scraping website data. If you’re looking for a data service for your project, Octoparse data service is a good choice. We work closely with you to understand your data requirements and make sure we deliver what you desire.

Final Thoughts

Google Sheets provides an easy and effective way for beginners to perform basic web scraping tasks without needing complex programming skills. By using the IMPORTXML and IMPORTHTML functions, you can quickly extract structured data from websites and import it into your spreadsheet for analysis. While these functions are perfect for small-scale scraping, more complex tasks may require advanced tools like Octoparse to ensure efficiency and accuracy. Whether you’re scraping product prices, market trends, or other useful data, Google Sheets offers a great starting point for simple web scraping.

Get Web Data in Clicks
Easily scrape data from any website without coding.
Free Download

Hot posts

Explore topics

image
Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today

Free Download

Related Articles