Scraping Data With Google Scripts
Waltir | 2019-08-31
Extract data from websites easily using a Google Script
Have you ever needed to extract some data from a webpage to insert into a Google Sheet? This simple lightweight parsing library makes it easy to complete such tasks.
Enable the Library
- Open a new Google Sheet
- Click the “Tools” button
- Click the “Script Editor” button
- Once the Script Editor is open click the “Resources” button
- Next you’ll want to click the “Libraries” button
- Enter the following project key in the Libraries field “ MEl2mrtYovcKce9Wise5nA3hyU9YQlBn2” and click the “Add” button
Run the test
The test function can be called if you wish to see a working example. Scraper.test()
This test simply makes a request to https://www.npmjs.com and extracts various datapoints located in the request response.
Extract data from your own site
We use the Scraper.getUrl()
function to make a request to a website. Next we use the Scraper.parse()
function to extract the data we want. The .parse function requires a few parameters. (data, starting point, ending point)
function myFunction() {
var site = Scraper.getUrl("https://www.npmjs.com")
var json = {
url: 'https://www.npmjs.com',
title: Scraper.parse(site, '<title data-react-helmet="true">', '</title>'),
og_image: Scraper.parse(site, 'property="og:image" content="', '"'),
content1: Scraper.parse(site, '<span class="_99e3854f f6 fw4 link dn dib-ns tl dim mr4 nowrap">', '</span>'),
content2: Scraper.parse(site, '<span class="f4 lh-copy fw4">','</span>')
}
Logger.log(json)
}