Creating Image Bot using Gemini with Google Apps Script

Gists

Abstract

New Gemini API opens doors for developers to integrate its AI power into apps, potentially impacting education, healthcare, and business. The latest Gemini 1.5 brings even more features. This report showcases an image bot using Gemini as one example of its diverse applications. Showcasing its diverse application potential across various fields.

Introduction

The recent release of Gemini as an accessible API on Vertex AI and Google AI Studio empowers developers to integrate its vast capabilities into their applications, potentially revolutionizing fields like education, healthcare, and business. Adding even more powerful features with the recently announced Gemini 1.5, this tool promises even greater impact. Ref and Ref I believe Gemini significantly expands the potential for diverse applications across various fields. To showcase its potential, this report introduces an image bot using Gemini with Google Apps Script and Google Drive. This is just one example of the many compelling use cases developers can build with Gemini.

The repository of this application is here.

Specification of this web application

  • User can use this library by a prompt.
  • Images can be searched from own Google Drive by the semantic search.
  • Local images can be uploaded for searching with this application.

Demonstration

Here, I would like to show you a demonstration of this web application. The following steps are tested.

  1. Give a question What is Google Apps Script?. By this, the description of Google Apps Script is returned.
  2. Give a question Show image of dog.. By this, an image including dog is returned.
  3. Give a question Upload image files.. By this, a file input tag is shown. And, 3 image files are uploaded.
  4. Give a question Show image of whale.. By this, one of uploaded images is returned.
  5. Give a question Show images of dogs.. By this, several images including dogs are returned.

Principle

This image illustrates the workflow of this image bot.

As the prerequisites, it supposes that some images are put into a folder of your Google Drive, and the descriptions of each image have been put into a corpus.

  1. User gives a question.
  2. Google Apps Script sends a request to the Gemini API.
  3. The Gemini API identifies a suitable function with the arguments within the Google Apps Script project.
  4. Google Apps Script executes the chosen function with the given arguments. And, Google Apps Script analyzes the corpus of image descriptions using semantic search to find relevant images.
  5. The function running in Google Apps Script returns a response value.
  6. Google Apps Script transmits the response value back to the Gemini API.
  7. Google Apps Script merges the function’s response with Gemini’s insights, generating a comprehensive answer.
  8. The final response is delivered back to the user through the image bot interface.

As additional information, this image bot application leverages function calling with Gemini to analyze and process images efficiently. As outlined in my previous report “Guide to Function Calling with Gemini and Google Apps Script: https://medium.com/google-cloud/guide-to-function-calling-with-gemini-and-google-apps-script-0e058d472f45", this approach allows seamless integration of Gemini’s capabilities within the Google Apps Script environment, enabling powerful image processing functionality.

In the next section, I would like to introduce the usage of this application.

Usage

1. Install library

I’m concerned that the Google Apps Script project in this case might be a bit complex. To simplify testing, I’ve prepared it as a Google Apps Script library. You can use it directly within your own project or copy and paste the scripts into your Google Apps Script project if you prefer not to use the library.

Here’s how to use the sample as a library. Install the library using the following steps.

  1. Create a GAS project.

    • You can use this library for the GAS project of both the standalone and container-bound script types.
  2. Install this library.

    • Library’s project key is 1qdq-ey6H3-HkG0AF4Tp4ZVEAUx-YryHXB5ZPW7oE-BeXN-wH2-b1PT_Q.

2. Linking Google Cloud Platform Project to Google Apps Script Project for New IDE

In this case, you can see how to do this at my repository.

Also, please enable Generative Language API at the API console.

After the above setting, the following sample script can be used.

3. Scopes

This library uses the following 2 scopes.

  • https://www.googleapis.com/auth/script.external_request
  • https://www.googleapis.com/auth/generative-language.retriever
  • https://www.googleapis.com/auth/drive

When this library is installed, the above scopes are automatically installed to your Google Apps Script project.

5. Prepare a folder

Please create a new folder in your Google Drive.

If you have already had some images, please put them into this folder.

4. Script

Please copy and paste the following script to the script editor of your Google Apps Script project. And, please set the folder ID of your created folder to folderId in the function init_.

function init_() {
  // Please set the folder ID of the folder including images.
  const folderId = "###";

  const newCorpusName = {
    name: "corpora/sample-corpus-1",
    displayName: "sample corpus 1",
  };
  const newDocumentName = {
    name: `${newCorpusName.name}/documents/sample-document-1`,
    displayName: "sample document 1",
  };
  return { folderId, newCorpusName, newDocumentName };
}

// This is a main function. This is used from the library.
const doGet = (e) => ImageBotApp.main({ ...e, ...init_() });

// This is used for putting the existing images in the folder to the corpus.
function putImageDescriptionsToCorpus() {
  const res = ImageBotApp.putImageDescriptionsToCorpus(init_());
  console.log(JSON.stringify(res.map((r) => JSON.parse(r.getContentText()))));
}

5. Deploy Web Apps

This library is used with Web Apps. So, please deploy Web Apps as follows. The detailed information can be seen in the official document.

Please set this using the script editor of your Google Apps Script project.

  1. On the script editor, at the top right of the script editor, please click “click Deploy” -> “New deployment”.
  2. Please click “Select type” -> “Web App”.
  3. Please input the information about the Web App in the fields under “Deployment configuration”.
  4. Please select “Me” for “Execute as”.
  5. Please select “Anyone” for “Who has access to the app:”.
  6. Please click “Deploy” button.
  7. On the script editor, at the top right of the script editor, please click “click Deploy” -> “Test deployments”.
  8. Copy Web Apps URL. It’s like https://script.google.com/macros/s/###/exec.

6. Testing

Before proceeding, if your folder contains images, please run the putImageDescriptionsToCorpus function. This generates descriptions for your images and adds them to the corpus.

To use this library:

  • Open the Web Apps URL in your browser. You’ll see a demonstration image at the top of the page, just like in this post.

Currently, you can search for and upload images as a sample.

Note

  • This is a simple application using Gemini. However, I believe that it will help users understanding Gemini.

 Share!