Retrieving All Values from All Sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python

Gists

This is a sample script for retrieving all values from all sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python.

In this post, it supposes that the Google Spreadsheet has already been published for Web. Ref

Flow

The flow of this method is as follows.

  1. Retrieve XLSX data from the URL of web published Google Spreadsheet as BytesIO data.
    • The URL is like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml.
  2. XLSX data is parsed with openpyxl.
  3. Retrieve all values from all sheets.

Sample script

Please set spreadsheetUrl.

import openpyxl
import re
import requests
from io import BytesIO

spreadsheetUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml"

regex = re.compile(
    'https:\/\/docs\.google\.com\/spreadsheets\/d\/e\/2PACX-.+\/pub(html)?')
if re.match(regex, spreadsheetUrl):
    temp = spreadsheetUrl.split('/')
    temp.pop()
    url = "/".join(temp) + "/pub?output=xlsx"
    res = requests.get(url)
    wb = openpyxl.load_workbook(
        filename=BytesIO(res.content), data_only=False)
    sheets = wb.sheetnames
    r = []
    for sheetName in sheets:
        sheet = wb[sheetName]
        temp = {'sheetName': sheetName, 'values': []}
        for cells in sheet.rows:
            column = []
            for cell in cells:
                column.append(cell.value)
            temp['values'].append(column)
        r.append(temp)
    print(r)

Result

When above script is run, the following sample value is returned.

[
  {
    "sheetName": "Sheet1",
    "values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
  },
  {
    "sheetName": "Sheet2",
    "values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
  },
  ,
  ,
]

Reference

openpyxl

 Share!