This is a sample script for retrieving all values from all sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python.
In this post, it supposes that the Google Spreadsheet has already been published for Web. Ref
Flow
The flow of this method is as follows.
- Retrieve XLSX data from the URL of web published Google Spreadsheet as
BytesIO
data.- The URL is like
https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml
.
- The URL is like
- XLSX data is parsed with openpyxl.
- Retrieve all values from all sheets.
Sample script
Please set spreadsheetUrl
.
import openpyxl
import re
import requests
from io import BytesIO
spreadsheetUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml"
regex = re.compile(
'https:\/\/docs\.google\.com\/spreadsheets\/d\/e\/2PACX-.+\/pub(html)?')
if re.match(regex, spreadsheetUrl):
temp = spreadsheetUrl.split('/')
temp.pop()
url = "/".join(temp) + "/pub?output=xlsx"
res = requests.get(url)
wb = openpyxl.load_workbook(
filename=BytesIO(res.content), data_only=False)
sheets = wb.sheetnames
r = []
for sheetName in sheets:
sheet = wb[sheetName]
temp = {'sheetName': sheetName, 'values': []}
for cells in sheet.rows:
column = []
for cell in cells:
column.append(cell.value)
temp['values'].append(column)
r.append(temp)
print(r)
Result
When above script is run, the following sample value is returned.
[
{
"sheetName": "Sheet1",
"values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
},
{
"sheetName": "Sheet2",
"values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
},
,
,
]