This sample script performs OCR and imports resultant text to a cell using custom function on Spreadsheet.
Drive API has a function to do OCR. It was used for this sample.
I think that this method is one of various ideas.
Problem
When OCR is performed and imported the result to a cell on spreadsheet, there are some limitations. DriveApp
, UrlFetchApp
, setFormula()
cannot be used for custom functions.
Solution
In order to avoid these limitations, I used Web Apps. From previous research, it has been found that Web Apps can avoid various limitations. Also in the case of this situation, Web Apps could avoid the above limitations.
To use this sample script, please deploy Web Apps as follows.
On the Script Editor,
- File
- -> Manage Versions
- -> Save New Version
- Publish
- -> Deploy as Web App
- -> At Execute the app as, select “your account”
- -> At Who has access to the app, select “Anyone, even anonymous”
- -> Click “Deploy”
- -> Copy “Current web app URL”
- -> Click “OK”
When it deploys Web Apps, the approval required authorization can be done, simultaneously.
Sample Script :
Please copy and paste this script to a bound script of spreadsheet.
var folderId = "### Folder ID ###";
function OCR(filename) {
var ac = SpreadsheetApp.getActiveSheet().getActiveCell();
var q1 = "?file=" + filename;
var q2 = "&row=" + ac.getRow();
var q3 = "&col=" + ac.getColumn();
var url = ScriptApp.getService().getUrl() + q1 + q2 + q3;
UrlFetchApp.fetch(url);
}
function doGet(e) {
var res = doOCR(e.parameters.file);
SpreadsheetApp
.getActiveSheet()
.getRange(e.parameters.row, e.parameters.col, 1, 2)
.setFormulas([[
'=T("' + e.parameters.file + '")',
'=T("' + res + '")'
]]);
}
function doOCR(src) {
var srcfile = DriveApp.getFilesByName(src).next();
var fileid = Drive.Files.insert(
{
title: srcfile.getName(),
mimeType: Drive.Files.get(srcfile.getId(), {fields: "mimeType"}).mimeType,
parents: [{"id": srcfile.getParents().next().getId()}]
},
srcfile.getBlob(),
{
ocr: true,
ocrLanguage: 'en'
}
).getId();
var textdata = UrlFetchApp.fetch(
"https://www.googleapis.com/drive/v3/files/" +
fileid +
"/export?mimeType=text/plain",
{
headers : {Authorization: "Bearer " + ScriptApp.getOAuthToken()},
muteHttpExceptions : true
}
).getContentText();
Drive.Files.remove(fileid);
return textdata.replace("________________", "");
}
The text data obtained by OCR is included a text "________________"
. I don’t know the reason. But this can be remove using replace()
.
Flow of Script :
OCR()
- Input
=OCR("sample.png")
in cellA1
."sample.png"
is a source image file. Also PDF files can be used for a source file."sample.png"
is as follows.
- Using
fetch()
, sends data of"sample.png"
and the inputted coordinate todoGet()
.
doGet()
- Using
doGet()
, get the data. - The source file is converted to a Google Document using
Drive API
. - The created Document is converted to text data using
Drive API
. - Imports the text data by
setFormula()
.setFormula()
is used for overwriting=OCR("sample.png")
.
Result :
By inputting =OCR("sample.png")
in cell A1
as a custom function, following result can be obtained.
Note :
- When the custom function
OCR()
is used, loading time is about 40 seconds. (I don’t know whether this occurs only my environment.) - Permissions of the created image are ANYONE_WITH_LINK, VIEW.
Related Tips
- Embedding a Chart to a Cell using Custom Function on Spreadsheet
- Embedding a Map to a Cell using Custom Function on Spreadsheet