There are no prepared methods for retrieving the number of lines in the Google Document. So I thought this workaround. If the end of each line can be detected, the number of lines can be retrieved. So I tried to add the end markers of each line using OCR.
At Google Documents, when a sentence is over the page width, the sentence automatically has the line break. But the line break has no \r\n
or \n
. When users give the line break by the enter key, the line break has \r\n
or \n
. By this, the text data retrieved from the document has only the line breaks which were given by users. I thought that OCR may be able to be used for this situation. The flow is as follows.
- Convert Google Document to PDF.
- Convert PDF to text data using OCR.
- I selected “ocr.space” for OCR.
- If you have already known APIs of OCR, you can try to do this.
- When I used OCR of Drive API, the line breaks of
\r\n
or\n
were not added to the converted text data. So I used ocr.space. ocr.space can add the line breaks.
- I selected “ocr.space” for OCR.
- Count
\n
in the converted text data.- This number means the number of lines.
The sample script for above flow is as follows. When you use this, please retrieve your apikey at “ocr.space”. When you input your information and email to the form, you will receive an email including API key. Please use it to this sample script. And please read the quota of API. I tested this using Free plan.
Sample script :
var apikey = "### Your API key for using ocr.space ###";
var id = DocumentApp.getActiveDocument().getId();
var url = "https://docs.google.com/feeds/download/documents/export/Export?id=" + id + "&format=pdf&access_token=" + ScriptApp.getOAuthToken();
var blob = UrlFetchApp.fetch(url).getBlob();
var payload = {method: "POST", headers: {apikey: apikey}, payload: {file: blob}};
var ocrRes = JSON.parse(UrlFetchApp.fetch("https://api.ocr.space/Parse/Image", payload));
var result = ocrRes.ParsedResults.map(function(e){return e.ParsedText.match(/\n/g).length})[0];
Logger.log(result);
Note :
- Even if the last line of the document has no
\r\n
or\n
, the converted text data has\r\n
at the end of all lines. - In this case, the precision of OCR is not important. The important point is to retrieve the line breaks.