Semantic Search using Gemini Pro API with Google Apps Script

Gists

Abstract

Gemini API unlocks semantic search for Google Apps Script, boosting its power beyond automation. This report explores the result of attempting the semantic search using Gemini Pro API with Google Apps Script.

Introduction

The recent release of the LLM model Gemini as an API on Vertex AI and Google AI Studio opens a world of possibilities. Ref and Ref I believe Gemini API significantly expands the potential of Google Apps Script and paves the way for diverse applications. In this report, I present a result for attempting the semantic search using Gemini Pro API with Google Apps Script.

Google Apps Script is one of the powerful automation tools for achieving the automation process. Also, Google Apps Script can manage Google Docs (Documents, Spreadsheets, Slides, Forms, and so on) even when users are away from their computers. In the current stage, there are no built-in methods for the semantic search, yet. When Google Apps Script can be used for achieving the semantic search, it is considered that achieving this will expand the management of Google Docs and the development of various applications created by Google Apps Script.

This report delves into the implementation of the semantic search by leveraging Gemini API with Google Apps Script.

Usage

In order to test this script, please do the following flow.

1. Create an API key

Please access https://makersuite.google.com/app/apikey and create your API key. At that time, please enable Generative Language API at the API console. This API key is used for this sample script.

This official document can be also seen. Ref.

2. Create a Google Apps Script project

Please create a standalone Google Apps Script project. Of course, this script can be also used with the container-bound script.

And, please open the script editor of the Google Apps Script project.

3. Sample scripts

The base script is as follows. Please copy and paste the following script to the script editor, set your API key to the function getEmbedding_, and save the script.

In this sample, in order to evaluate the similarity of embedding values, the cosine similarity is calculated.

/**
 * ### Description
 * Calculate cosine similarity from 2 arrays.
 *
 * @param {Array} array1 1-dimensional array.
 * @param {Array} array2 1-dimensional array.
 * @return {Number} Calculated result of cosine similarity.
 */
function cosineSimilarity_(array1, array2) {
  const dotProduct = array1.reduce((t, e, i) => (t += e * array2[i]), 0);
  const magnitudes = [array1, array2]
    .map((e) => Math.sqrt(e.reduce((t, f) => (t += f * f), 0)))
    .reduce((t, f) => (t *= f), 1);
  return dotProduct / magnitudes;
}

/**
 * ### Description
 * Request "Method: models.embedContent" of Gemini Pro API.
 *
 * @param {Object} requests Request body for Method: models.batchEmbedContents.
 * @return {Number} Embedding generated from the input texts.
 */
function getEmbedding_(requests) {
  const apiKey = "###"; // Please set your API key.

  const url = `https://generativelanguage.googleapis.com/v1/models/embedding-001:batchEmbedContents?key=${apiKey}`;
  const options = {
    payload: JSON.stringify({ requests }),
    contentType: "application/json",
  };
  const res = UrlFetchApp.fetch(url, options);
  return (obj = JSON.parse(res.getContentText()));
}

A. Retrieving most similar text using a search text

Please copy and paste the following script to the script editor. This script uses the above functions getEmbedding_ and cosineSimilarity_.

When you run this script, please run sample1. In this sample, the value of searchText is searched from 9 texts in texts.

function sample1() {
  // This is a search text.
  const searchText = "car";

  // These sample texts are generated by Gemini Pro API.
  const texts = [
    {
      title: "car",
      text: "A machine with four wheels that is powered by an engine and is used to transport people or goods. It has a steering wheel, a braking system, and an acceleration system.",
    },
    {
      title: "plane",
      text: "A machine that flies through the air, powered by engines. It has wings that generate lift, and a tail that provides stability. It can carry people or cargo.",
    },
    {
      title: "dog",
      text: "A four-legged mammal that is domesticated and used as a companion or working animal. It has a keen sense of smell and hearing, and is often used for hunting, herding, or guarding.",
    },
    {
      title: "cat",
      text: "A small, furry mammal that is often kept as a pet. It is known for its independent nature, sharp claws, and hunting skills.",
    },
    {
      title: "bird",
      text: "A feathered creature that flies through the air. It has a beak, wings, and feathers. It builds nests and lays eggs.",
    },
    {
      title: "fish",
      text: "A cold-blooded vertebrate that lives in water. It has gills for breathing, fins for swimming, and scales for protection.",
    },
    {
      title: "human",
      text: "A bipedal, intelligent animal that is capable of complex thought and emotion. It is the only species on Earth that can use language, art, and technology.",
    },
    {
      title: "plant",
      text: "An organism that is not an animal and does not move. It has cells with cell walls and uses photosynthesis to make its own food.",
    },
    {
      title: "cherry blossoms",
      text: "A tree that blooms with delicate, colorful flowers in the spring. It is a symbol of beauty, renewal, and hope in many cultures.",
    },
  ];

  const r1 = {
    model: "models/embedding-001",
    content: { parts: [{ text: searchText }] },
    taskType: "RETRIEVAL_QUERY",
  };
  const r2 = texts.map(({ title, text }) => ({
    model: "models/embedding-001",
    content: { parts: [{ text }] },
    taskType: "RETRIEVAL_DOCUMENT",
    title,
  }));
  const obj = getEmbedding_([r1, ...r2]);
  const [ar1, ...ar2] = obj.embeddings.map((e) => e.values);
  const cs = ar2
    .map((ar, i) => ({ cs: cosineSimilarity_(ar1, ar), text: texts[i] }))
    .sort((a, b) => (a.cs < b.cs ? 1 : -1));
  const result = cs[0].text;

  console.log(result);
}
  • When this script is run with seachTest of "car", "plane", "dog", "cat", "bird", "fish", "human", "plant", "cherry blossoms", it was confirmed that the correct texts related from the inputted texts are returned.
  • As in other search text, when “toy poodle” is used as seachTest, { title: "dog", text: "A four-legged mammal that is domesticated and used as a companion or working animal. It has a keen sense of smell and hearing, and is often used for hunting, herding, or guarding." } is returned. Also, when crow is used, { title: 'bird', text: 'A feathered creature that flies through the air. It has a beak, wings, and feathers. It builds nests and lays eggs.' } is returned.

It seems that correct values are returned by the search text with the semantic search. Also, you can see that the words of "car", "plane", "dog", "cat", "bird", "fish", "human", "plant", "cherry blossoms" are not included in texts.

In this sample script, the following flow is run.

  1. Retrieve embedding values of searchText and texts using “Method: models.batchEmbedContents”.
  2. Calculate the value of cosine similarity between the embedding value of searchText and the embedding values of texts.
  3. Most similar text for searchText is retrieved using the calculation results.

When “Method: models.batchEmbedContents” is used, the embedding values of all texts can be retrieved by one API call.

B. Highlighting paragraph in Google Documents

Please copy and paste the following script to the script editor. This script uses the above functions getEmbedding_ and cosineSimilarity_.

function sample2() {
  // This is a search text.
  const searchText = "dog";

  // Please set your document ID.
  const documentId = "###";

  const body = DocumentApp.openById(documentId).getBody();
  const texts = body.getParagraphs().reduce((ar, p) => {
    const text = p.getText().trim();
    if (text) {
      ar.push(text);
    }
    return ar;
  }, []);
  const r1 = {
    model: "models/embedding-001",
    content: { parts: [{ text: searchText }] },
    taskType: "RETRIEVAL_QUERY",
  };
  const r2 = texts.map((text) => ({
    model: "models/embedding-001",
    content: { parts: [{ text }] },
    taskType: "RETRIEVAL_DOCUMENT",
  }));
  const obj = getEmbedding_([r1, ...r2]);
  const [ar1, ...ar2] = obj.embeddings.map((e) => e.values);
  const cs = ar2
    .map((ar, i) => ({ cs: cosineSimilarity_(ar1, ar), text: texts[i] }))
    .sort((a, b) => (a.cs < b.cs ? 1 : -1));
  const result = cs[0].text;
  body.findText(result).getElement().asText().setBackgroundColor("#f4cccc");
}

This sample script runs the following flow.

  1. Retrieve paragraphs from a Google Document.
  2. Search a paragraph using a search text.
  3. Highlight the searched paragraph.

When this script is run, the following result is obtained.

You can see that the paragraph of A four-legged mammal that is domesticated and used as a companion or working animal. It has a keen sense of smell and hearing, and is often used for hunting, herding, or guarding. is highlighted.

C. Semantic search of image files on Google Drive

Please copy and paste the following script to the script editor. This script uses the above functions getEmbedding_ and cosineSimilarity_.

Unfortunately, in the current stage (January 25, 2024), when the value of embedding of an image is tried to be retrieved, an error of embedContent only supports `text` content. occurs. I believe that this will be resolved in the future update. From this situation, as a workaround for the semantic search of the image files on Google Drive, I would like to introduce a simple sample script.

// Please set the folder ID of folder including image files.
const folderId = "###";

// First, please run this function. This function set the explanation of the image to the description of the file.
function sample_C1() {
  const apiKey = "###"; // Please set your API key.

  const files = DriveApp.getFolderById(folderId).getFiles();
  const token = ScriptApp.getOAuthToken();
  while (files.hasNext()) {
    const file = files.next();
    const bytes = UrlFetchApp.fetch(`https://drive.google.com/thumbnail?sz=w1000&id=${file.getId()}`, { headers: { authorization: "Bearer " + token } }).getContent();
    const base64 = Utilities.base64Encode(bytes);
    const url = `https://generativelanguage.googleapis.com/v1/models/gemini-pro-vision:generateContent?key=${apiKey}`;
    const payload = { contents: [{ parts: [{ text: "What is this picture?" }, { inline_data: { mime_type: "image/png", data: base64 } }] }] };
    const options = { payload: JSON.stringify(payload), contentType: "application/json" };
    const res = UrlFetchApp.fetch(url, options);
    const obj = JSON.parse(res.getContentText());
    const r = obj.candidates[0].content.parts[0].text;
    console.log(r);
    file.setDescription(r.trim());
  }
}

// After you ran "sample_C1", please run this function.
function sample_C2() {
  // This is a search text.
  const searchText = "dog";

  const descriptions = [];
  const files = DriveApp.getFolderById(folderId).getFiles();
  while (files.hasNext()) {
    const file = files.next();
    const description = file.getDescription().trim();
    if (description) {
      descriptions.push({ filename: file.getName(), fileId: file.getId(), text: description, request: { model: "models/embedding-001", content: { parts: [{ text: description }] }, taskType: "RETRIEVAL_DOCUMENT" } });
    }
  }
  const r1 = { model: "models/embedding-001", content: { parts: [{ text: searchText }] }, taskType: "RETRIEVAL_QUERY" };
  const obj = getEmbedding_([r1, ...descriptions.map(({ request }) => request)]);
  const [ar1, ...ar2] = obj.embeddings.map(e => e.values);
  const cs = ar2.map((ar, i) => ({ cs: cosineSimilarity_(ar1, ar), file: descriptions[i] })).sort((a, b) => a.cs < b.cs ? 1 : -1);
  const result = cs[0].file.filename;
  console.log(result);
}

This script runs the following flow.

  1. Run sample_C1. The explanations of the image files in the folder are set as the description of the file.
  2. Run sample_C2. Semantic search is run using the descriptions of the image files.
  3. Return the filename as the search result. If you want to see the file ID, you can obtain it with console.log(cs[0].file).

Note

  • In the current stage (January 25, 2024), when the value of embedding of an image is tried to be retrieved, an error of embedContent only supports `text` content. occurs. I believe that this will be resolved in the future update.

References

 Share!