Pseudo Function Calling for Gemini API Through Prompt Engineering

Gists

Abstract

This research explores “pseudo function calling” in Gemini API using prompt engineering with JSON schema, bypassing model dependency limitations.

Introduction

Large Language Models (LLMs) like Gemini and ChatGPT offer powerful functionalities, but their capabilities can be further extended through function calling. This feature allows the LLM to execute pre-defined functions with arguments generated based on the user’s prompt. This unlocks a wide range of applications, as demonstrated in these resources (see References).

However, the current implementation of function calling with Gemini API exhibits model dependency. While models like gemini-1.5-pro-002 support this functionality, others like gemini-1.5-flash-002 currently do not. This limitation is expected to be addressed in future updates.

In a recent publication, “Harnessing Gemini’s Power: A Guide to Generating Content from Structured Data,” we explored the control of JSON output format within Gemini using JSON schema in both the prompt and “response_schema” parameters. Ref This finding led us to hypothesize the possibility of achieving a pseudo function calling mechanism through careful prompt engineering with JSON schema. This report delves into this concept and explores how to leverage pseudo-function calling for enhanced LLM interactions.

Pseudo-Function Calling Flow Diagram

  1. User Input: A user enters a question.
  2. Prompt Creation: A prompt is generated, incorporating the inputted question and a JSON schema for selecting functions.
  3. Content Generation: Content is generated based on the created prompt.
  4. JSON Data Extraction: The generated content is analyzed to extract JSON data with a structure matching the function calling format. This data provides the function name and arguments.
  5. Function Execution: The selected function is executed using the extracted arguments.
  6. Result Return: The function’s result is returned to the user.

Usage

1. Create an API key

Please access https://ai.google.dev/gemini-api/docs/api-key and create your API key. At that time, please enable Generative Language API at the API console. This API key is used for the following scripts.

This official document can also be seen. Ref.

2. Sample scripts

In this report, Python is used for explaining the pseudo function calling.

1. Sample 1: Current Function Call

In this sample, the function call is used as function_declarations in the tools property.

import google.generativeai as genai

apiKey = "###"  # Please set your API key for using Gemini API.

# This is from https://ai.google.dev/gemini-api/docs/function-calling
function_declarations = {
    "function_declarations": [
        {
            "name": "find_movies",
            "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
            "parameters": {
                "type_": "OBJECT",
                "properties": {
                    "location": {
                        "type_": "STRING",
                        "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616",
                    },
                    "description": {
                        "type_": "STRING",
                        "description": "Any kind of description including category or genre, title words, attributes, etc.",
                    },
                },
                "required": ["description"],
            },
        },
        {
            "name": "find_theaters",
            "description": "find theaters based on location and optionally movie title which is currently playing in theaters",
            "parameters": {
                "type_": "OBJECT",
                "properties": {
                    "location": {
                        "type_": "STRING",
                        "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616",
                    },
                    "movie": {"type_": "STRING", "description": "Any movie title"},
                },
                "required": ["location"],
            },
        },
        {
            "name": "get_showtimes",
            "description": "Find the start times for movies playing in a specific theater",
            "parameters": {
                "type_": "OBJECT",
                "properties": {
                    "location": {
                        "type_": "STRING",
                        "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616",
                    },
                    "movie": {"type_": "STRING", "description": "Any movie title"},
                    "theater": {
                        "type_": "STRING",
                        "description": "Name of the theater",
                    },
                    "date": {
                        "type_": "STRING",
                        "description": "Date for requested showtime",
                    },
                },
                "required": ["location", "movie", "theater", "date"],
            },
        },
    ]
}
prompt = "Which theaters in Mountain View show Barbie movie?"  # or prompt = "What movies are showing in North Seattle tonight?"

genai.configure(api_key=apiKey)
model = genai.GenerativeModel("gemini-1.5-flash-latest", tools=genai.protos.Tool(function_declarations))
chat = model.start_chat()
response = chat.send_message(prompt)

function_call = response.candidates[0].content.parts[0].function_call
res = function_call.__class__.to_dict(function_call)
print(res)

When Which theaters in Mountain View show Barbie movie? is used as the prompt, the following result is obtained.

{
  "name": "find_theaters",
  "args": { "movie": "Barbie", "location": "Mountain View, CA" }
}

When What movies are showing in North Seattle tonight? is used as the prompt, the following result is obtained.

{
  "name": "find_movies",
  "args": { "description": "movie", "location": "North Seattle" }
}

Sample 2: Pseudo Function Call

In this sample, the function call is used as the prompt using JSON schema.

import google.generativeai as genai
import json

apiKey = "###"  # Please set your API key for using Gemini API.

# These schemas are from https://ai.google.dev/gemini-api/docs/function-calling
json_schema = [
    {
        "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "Value is always 'find_movies'.",
                "const": "find_movies",
            },
            "args": {
                "type": "object",
                "properties": {
                    "location": {
                        "type_": "STRING",
                        "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616",
                    },
                    "description": {
                        "type_": "STRING",
                        "description": "Any kind of description including category or genre, title words, attributes, etc.",
                    },
                },
                "required": ["description"],
            },
        },
    },
    {
        "description": "find theaters based on location and optionally movie title which is currently playing in theaters",
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "Value is always 'find_theaters'.",
                "const": "find_theaters",
            },
            "args": {
                "type": "object",
                "properties": {
                    "location": {
                        "type_": "STRING",
                        "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616",
                    },
                    "movie": {"type_": "STRING", "description": "Any movie title"},
                },
                "required": ["location"],
            },
        },
    },
    {
        "description": "Find the start times for movies playing in a specific theater",
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "Value is always 'get_showtimes'.",
                "const": "get_showtimes",
            },
            "args": {
                "type": "object",
                "properties": {
                    "location": {
                        "type_": "STRING",
                        "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616",
                    },
                    "movie": {"type_": "STRING", "description": "Any movie title"},
                    "theater": {
                        "type_": "STRING",
                        "description": "Name of the theater",
                    },
                    "date": {
                        "type_": "STRING",
                        "description": "Date for requested showtime",
                    },
                },
                "required": ["location", "movie", "theater", "date"],
            },
        },
    },
]
ar = sum(
    [
        [
            f"If the result is related to '{o['description']}', return the result by following 'json_schema{i + 1}'.",
            f"<json_schema{i + 1}>{json.dumps(o)}</json_schema{i + 1}>",
        ]
        for i, o in enumerate(json_schema)
    ],
    [],
)
prompt = "\n".join(
    [
        "Which theaters in Mountain View show Barbie movie?", # or "What movies are showing in North Seattle tonight?",
        *ar,
    ]
)

genai.configure(api_key=apiKey)
model = genai.GenerativeModel(
    "gemini-1.5-flash-latest",
    generation_config={"response_mime_type": "application/json"},
)
chat = model.start_chat()
response = chat.send_message(prompt)

res = json.loads(response.candidates[0].content.parts[0].text)
print(res)

When Which theaters in Mountain View show Barbie movie? is used as the prompt, the following result is obtained.

{
  "name": "find_theaters",
  "args": { "location": "Mountain View, CA", "movie": "Barbie" }
}

When What movies are showing in North Seattle tonight? is used as the prompt, the following result is obtained.

{
  "name": "find_movies",
  "args": {
    "location": "North Seattle",
    "description": "movies showing tonight"
  }
}

Result

From this result, it was found that even when JSON schema is used in the prompt, the same result with the function call can be obtained.

However, there are limitations to function calls at the current stage. For example, the model models/gemini-1.5-flash-002 cannot be used with function calls. In contrast, this pseud function call can be used to overcome this limitation.

Additionally, this result demonstrates that the JSON output format can be controlled through prompt engineering.

References

 Share!