Scraping Data With Google Scripts

Waltir | 2019-08-31

Extract data from websites easily using a Google Script

Have you ever needed to extract some data from a webpage to insert into a Google Sheet? This simple lightweight parsing library makes it easy to complete such tasks.

giphy

Enable the Library

  1. Open a new Google Sheet
  2. Click the “Tools” button
  3. Click the “Script Editor” button
  4. Once the Script Editor is open click the “Resources” button
  5. Next you’ll want to click the “Libraries” button
  6. Enter the following project key in the Libraries field “ MEl2mrtYovcKce9Wise5nA3hyU9YQlBn2” and click the “Add” button

Run the test

The test function can be called if you wish to see a working example. Scraper.test() This test simply makes a request to https://www.npmjs.com and extracts various datapoints located in the request response.

Extract data from your own site

We use the Scraper.getUrl() function to make a request to a website. Next we use the Scraper.parse() function to extract the data we want. The .parse function requires a few parameters. (data, starting point, ending point)

function myFunction() {


var site = Scraper.getUrl("https://www.npmjs.com")


var json = {


url: 'https://www.npmjs.com',


title: Scraper.parse(site, '<title data-react-helmet="true">', '</title>'),


og_image: Scraper.parse(site, 'property="og:image" content="', '"'),


content1: Scraper.parse(site, '<span class="_99e3854f f6 fw4 link dn dib-ns tl dim mr4 nowrap">', '</span>'),


content2: Scraper.parse(site, '<span class="f4 lh-copy fw4">','</span>')


}


Logger.log(json)


}

giphy