Exporting Google Docs Files in PDF format with Batch Requests using Google Apps Script

08 Mar 2023, 11:23

GasTips

This is a sample script for exporting Google Docs files (Spreadsheets, Documents, and so on) in PDF format with batch requests using Google Apps Script.

I have published a report “Efficient File Management using Batch Requests with Google Apps Script”. In this report, I mentioned how to use the batch requests using Google Apps Script.

In this post, I would like to introduce the method for retrieving binary data using this method.

When the batch request is run, the following response is returned.

--batch_###
Content-Type: application/http
Content-ID: response-1

HTTP/1.1 200 OK
Content-Disposition: attachment
Content-Type: application/zip
Date: ###
Expires: ###
Cache-Control: private, max-age=0
Content-Length: 1000

### data ###
--batch_###--

When the response value is JSON data and text data, this data can be simply retrieved by getContentText() from the response of UrlFetchApp. But, when the response value is the binary data, the data retrieved by getContentText() cannot be decoded because of the issue of character code. In that case, it is required to retrieve the data as binary data. In this post, I would like to introduce the method for retrieving binary data from the response value of the batch request.

Sample script

Before you use this script, please enable Drive API at Advanced Google services. And, please set your file IDs and folder ID to main().

/**
 * Split byteArray by a search data.
 * @param {Array} baseData Input byteArray of base data.
 * @param {Array} searchData Input byteArray of search data using split.
 * @return {Array} An array including byteArray.
 */
function splitByteArrayBySearchData_(baseData, searchData) {
  if (!Array.isArray(baseData) || !Array.isArray(searchData)) {
    throw new Error("Please give byte array.");
  }
  const search = searchData.join("");
  const bLen = searchData.length;
  const res = [];
  let idx = 0;
  do {
    idx = baseData.findIndex(
      (_, i, a) => [...Array(bLen)].map((_, j) => a[j + i]).join("") == search
    );
    if (idx != -1) {
      res.push(baseData.splice(0, idx));
      baseData.splice(0, bLen);
    } else {
      res.push(baseData.splice(0));
    }
  } while (idx != -1);
  return res;
}

/**
 * Ref: https://cloud.google.com/blog/topics/developers-practitioners/efficient-file-management-using-batch-requests-google-apps-script
 * Create a request body of batch requests and request it.
 *
 * @param {Object} object Object for creating request body of batch requests.
 * @returns {Object} UrlFetchApp.HTTPResponse
 */
function batchRequests_(object) {
  const { batchPath, requests } = object;
  const boundary = "sampleBoundary12345";
  const lb = "\r\n";
  const payload = requests.reduce((r, e, i, a) => {
    r += `Content-Type: application/http${lb}`;
    r += `Content-ID: ${i + 1}${lb}${lb}`;
    r += `${e.method} ${e.endpoint}${lb}`;
    r += e.requestBody
      ? `Content-Type: application/json; charset=utf-8" ${lb}${lb}`
      : lb;
    r += e.requestBody ? `${JSON.stringify(e.requestBody)}${lb}` : "";
    r += `--${boundary}${i == a.length - 1 ? "--" : ""}${lb}`;
    return r;
  }, `--${boundary}${lb}`);
  const params = {
    muteHttpExceptions: true,
    method: "post",
    contentType: `multipart/mixed; boundary=${boundary}`,
    headers: { Authorization: "Bearer " + ScriptApp.getOAuthToken() },
    payload,
  };
  return UrlFetchApp.fetch(`https://www.googleapis.com/${batchPath}`, params);
}

// Please run this script.
function main() {
  const folderId = "###"; // Please set folder ID you want to put the files.
  const fileIds = ["###SpreadsheetID1###", "###DocumentID1###"]; // Please set Spreadsheet ID and Document IDs you want to export as PDF format.

  // Run batch requests. In this batch requests, as a sample, Google Docs files (Spreadsheet, Document and so on) are exported as PDF format.
  const requests = fileIds.map((id) => ({
    method: "GET",
    endpoint: `https://www.googleapis.com/drive/v3/files/${id}/export?mimeType=application/pdf`,
  }));
  const object = { batchPath: "batch/drive/v3", requests };
  const res = batchRequests_(object);
  if (res.getResponseCode() != 200) {
    throw new Error(res.getContentText());
  }

  // Parse data as binary data, and create the data as Blob.
  const check = res.getContentText().match(/--batch.*/);
  if (!check) {
    throw new Error("Valid response value is not returned.");
  }
  const search = check[0];
  const baseData = res.getContent();
  const searchData = Utilities.newBlob(search).getBytes();
  const res1 = splitByteArrayBySearchData_(baseData, searchData);
  res1.shift();
  res1.pop();
  const blobs = res1.map((e, i) => {
    const rrr = splitByteArrayBySearchData_(e, [13, 10, 13, 10]);
    const data = rrr.pop();
    const metadata = Utilities.newBlob(rrr.flat()).getDataAsString();
    const dataSize = Number(metadata.match(/Content-Length:(.*)/)[1]);
    return Utilities.newBlob(data.splice(0, dataSize)).setName(
      `sampleName${i + 1}`
    );
  });

  // Create blobs as the files in Google Drive.
  const folder = DriveApp.getFolderById(folderId);
  blobs.forEach((b) => {
    if (b) {
      console.log({ filename: b.getName(), fileSize: b.getBytes().length });
      folder.createFile(b);
    }
  });
}

When this script is run, Google Docs files (Spreadsheets, Documents, and so on) are exported in PDF format with the batch requests.
As a reference sample situation, in order to export 100 Google Document files as PDF files, when I tested this sample script, the processing time was about 150 seconds and no error occurred. And, I confirmed that 100 valid PDF files were created in my Google Drive.

References

I answered this method to this thread on Stackoverflow.
Also, I implemented this method to a Google Apps Script library of BatchRequest.

Exporting Google Docs Files in PDF format with Batch Requests using Google Apps Script

Sample script

References

Share!