Taming the Wild Output: Effective Control of Gemini API Response Formats with response_schema

Gists

Abstract

The Gemini API traditionally required specific prompts for desired output formats. This report explores two new GenerationConfig properties: “response_mime_type” and “response_schema”. These allow developers to directly specify formats like JSON, enhancing control and predictability. We analyze and compare the effectiveness of both properties for controlling Gemini API output formats.

Introduction

One of the key challenges when working with the Gemini API is ensuring the output data is delivered in the format your application requires. Traditionally, the response format heavily relied on the specific prompt you provided. For example, retrieving data as a structured JSON object necessitated including a “Return JSON” prompt within your input text. This approach could be cumbersome and error-prone if the desired format wasn’t explicitly requested.

To address this limitation, a new property called “response_mime_type” was recently introduced as part of the GenerationConfig configuration. Ref and Ref This property, along with supported JSON schemas, allows you to explicitly specify the desired output format, such as JSON. Ref This enhancement significantly improves the controllability and predictability of the Gemini API’s response format.

Further expanding on output format control, a new property named “response_schema” (both “response_schema” and “responseSchema” are accepted) has recently been added to the GenerationConfig configuration. Ref According to the official documentation, it offers an alternative approach for controlling the output format.

This report builds upon my previous work on specifying output types for Gemini API using Google Apps Script. Here, we will compare and analyze the results of using “response_schema” against JSON schemas for controlling Gemini API output formats.

Usage

In order to test this script, please do the following flow.

1. Create an API key

Please access https://ai.google.dev/gemini-api/docs/api-key and create your API key. At that time, please enable Generative Language API at the API console. This API key is used for this sample script.

This official document can be also seen. Ref.

2. Create a Google Apps Script project

In this report, Google Apps Script is used. Of course, the method introducing this report can be also used in other languages.

Here, in order to test the following sample scripts, please create a standalone Google Apps Script project. Of course, this script can be also used with the container-bound script.

And, please open the script editor of the Google Apps Script project.

3. Sample JSON schema

The sample JSON schema using in this report is as follows. This schema is used for the following 2 sample scripts.

const sampleSchema = {
  description:
    "List 5 popular cookie recipes by including the following properties. Don't change properties and don't add other proeprties.",
  type: "array",
  items: {
    description: "Cookie recipes.",
    type: "object",
    properties: {
      recipe_name: {
        description: "Names of recipe.",
        type: "string",
      },
      materials: {
        description: "Requirement materials for running the recipe.",
        type: "array",
        items: {
          description: "Don't add other proeprties.",
          type: "object",
          properties: {
            material: {
              description: "Requirement material for running the recipe.",
              type: "string",
            },
            amount: {
              description:
                "Requirement amount of material for running the recipe. Unit is grams.",
              type: "number",
            },
            cost: {
              description:
                "Cost of requirement material for running the recipe. Unit is dollar.",
              type: "number",
            },
          },
          required: ["material", "amount", "cost"],
        },
      },
      total_cost: {
        description: "Total cost of materials.",
        type: "number",
      },
    },
    required: ["recipe_name", "materials", "total_cost"],
  },
};

A. Use response_mime_type and JSON schema

This is the same with my previous report.

function sampleA() {
  const apiKey = "###"; // Please set your API key.
  const model = "models/gemini-1.5-pro-latest";
  const version = "v1beta";

  const prompt = `Follow JSON schema.<JSONSchema>${JSON.stringify(
    sampleSchema
  )}</JSONSchema>`;

  const url = `https://generativelanguage.googleapis.com/${version}/${model}:generateContent?key=${apiKey}`;
  const payload = {
    contents: [{ parts: [{ text: prompt }] }],
    generationConfig: { response_mime_type: "application/json" },
  };
  const options = {
    payload: JSON.stringify(payload),
    contentType: "application/json",
    muteHttpExceptions: true,
  };
  const res = UrlFetchApp.fetch(url, options);
  const obj = JSON.parse(res.getContentText());
  if (obj.candidates.length > 0 && obj.candidates[0].content.parts.length > 0) {
    const res = obj.candidates[0].content.parts[0].text;
    console.log(res);
  } else {
    console.log("No response.");
  }
}

When this script is run, the following result is obtained.

[
  {
    "recipe_name": "Chocolate Chip Cookies",
    "materials": [{ "material": "Flour", "amount": 250, "cost": 2.0 }, { "material": "Sugar", "amount": 150, "cost": 1.5 }, { "material": "Butter", "amount": 125, "cost": 3.0 }, { "material": "Chocolate Chips", "amount": 200, "cost": 4.0 }, { "material": "Eggs", "amount": 2, "cost": 1.0 }],
    "total_cost": 11.5
  },
  {
    "recipe_name": "Peanut Butter Cookies",
    "materials": [{ "material": "Flour", "amount": 200, "cost": 1.8 }, { "material": "Sugar", "amount": 100, "cost": 1.0 }, { "material": "Peanut Butter", "amount": 150, "cost": 3.5 }, { "material": "Eggs", "amount": 1, "cost": 0.5 }],
    "total_cost": 6.8
  },
  {
    "recipe_name": "Oatmeal Raisin Cookies",
    "materials": [{ "material": "Flour", "amount": 175, "cost": 1.6 }, { "material": "Sugar", "amount": 125, "cost": 1.25 }, { "material": "Butter", "amount": 100, "cost": 2.5 }, { "material": "Oatmeal", "amount": 150, "cost": 2.0 }, { "material": "Raisins", "amount": 100, "cost": 2.5 }],
    "total_cost": 9.85
  },
  {
    "recipe_name": "Sugar Cookies",
    "materials": [{ "material": "Flour", "amount": 260, "cost": 2.1 }, { "material": "Sugar", "amount": 175, "cost": 1.75 }, { "material": "Butter", "amount": 125, "cost": 3.0 }, { "material": "Eggs", "amount": 1, "cost": 0.5 }],
    "total_cost": 7.35
  },
  {
    "recipe_name": "Snickerdoodle Cookies",
    "materials": [{ "material": "Flour", "amount": 250, "cost": 2.0 }, { "material": "Sugar", "amount": 150, "cost": 1.5 }, { "material": "Butter", "amount": 125, "cost": 3.0 }, { "material": "Cinnamon", "amount": 10, "cost": 1.0 }],
    "total_cost": 7.5
  }
]

In this sample script, the prompt is very simple like Follow JSON schema.<JSONSchema>${JSON.stringify(sampleSchema)}</JSONSchema>. And, the result with the expected JSON structure could be obtained every run. From this result, it was found that Gemini API can correctly understand the JSON schema. The actual JSON schema is more complicated. So, when the JSON schema is used, for example, the excluded values will be also able to be set.

B. Use response_mime_type and “response_schema”

function sampleB() {
  const apiKey = "###"; // Please set your API key.
  const model = "models/gemini-1.5-pro-latest";
  const version = "v1beta";

  const prompt =
    "List 5 popular cookie recipes by including the following properties. Requirement materials for running the recipe. Requirement material for running the recipe. Requirement amount of material for running the recipe. Unit is grams. Cost of requirement material for running the recipe. Unit is dollar. Total cost of materials.";

  const url = `https://generativelanguage.googleapis.com/${version}/${model}:generateContent?key=${apiKey}`;
  const payload = {
    contents: [{ parts: [{ text: prompt }] }],
    generationConfig: {
      response_mime_type: "application/json",
      response_schema: sampleSchema,
    },
  };
  const options = {
    payload: JSON.stringify(payload),
    contentType: "application/json",
    muteHttpExceptions: true,
  };
  const res = UrlFetchApp.fetch(url, options);
  const obj = JSON.parse(res.getContentText());
  if (obj.candidates.length > 0 && obj.candidates[0].content.parts.length > 0) {
    const res = obj.candidates[0].content.parts[0].text;
    console.log(JSON.stringify(JSON.parse(res)));
  } else {
    console.log("No response.");
  }
}

When this script is run, the following result is obtained.

[
  {
    "cookie_name": "Chocolate Chip Cookies",
    "materials": [{ "material": "Flour", "amount": "250", "unit": "grams", "cost": "1.50" }, { "material": "Sugar", "amount": "200", "unit": "grams", "cost": "1.00" }, { "material": "Butter", "amount": "150", "unit": "grams", "cost": "2.00" }, { "material": "Eggs", "amount": "2", "unit": "pieces", "cost": "1.00" }, { "material": "Chocolate Chips", "amount": "200", "unit": "grams", "cost": "2.50" }],
    "total_cost": "8.00"
  },
  {
    "cookie_name": "Peanut Butter Cookies",
    "materials": [{ "material": "Flour", "amount": "200", "unit": "grams", "cost": "1.20" }, { "material": "Sugar", "amount": "150", "unit": "grams", "cost": "0.75" }, { "material": "Peanut Butter", "amount": "200", "unit": "grams", "cost": "3.00" }, { "material": "Eggs", "amount": "1", "unit": "pieces", "cost": "0.50" }],
    "total_cost": "5.45"
  },
  {
    "cookie_name": "Oatmeal Raisin Cookies",
    "materials": [{ "material": "Flour", "amount": "150", "unit": "grams", "cost": "0.90" }, { "material": "Sugar", "amount": "100", "unit": "grams", "cost": "0.50" }, { "material": "Butter", "amount": "100", "unit": "grams", "cost": "1.33" }, { "material": "Eggs", "amount": "2", "unit": "pieces", "cost": "1.00" }, { "material": "Oats", "amount": "100", "unit": "grams", "cost": "1.00" }, { "material": "Raisins", "amount": "50", "unit": "grams", "cost": "0.75" }],
    "total_cost": "5.48"
  },
  {
    "cookie_name": "Sugar Cookies",
    "materials": [{ "material": "Flour", "amount": "300", "unit": "grams", "cost": "1.80" }, { "material": "Sugar", "amount": "250", "unit": "grams", "cost": "1.25" }, { "material": "Butter", "amount": "200", "unit": "grams", "cost": "2.66" }, { "material": "Eggs", "amount": "1", "unit": "pieces", "cost": "0.50" }],
    "total_cost": "6.21"
  },
  {
    "cookie_name": "Gingerbread Cookies",
    "materials": [{ "material": "Flour", "amount": "250", "unit": "grams", "cost": "1.50" }, { "material": "Brown Sugar", "amount": "150", "unit": "grams", "cost": "0.75" }, { "material": "Butter", "amount": "100", "unit": "grams", "cost": "1.33" }, { "material": "Molasses", "amount": "100", "unit": "grams", "cost": "1.50" }, { "material": "Ginger", "amount": "5", "unit": "grams", "cost": "0.20" }, { "material": "Cinnamon", "amount": "5", "unit": "grams", "cost": "0.15" }],
    "total_cost": "5.43"
  }
]

In this sample script, the prompt is List 5 popular cookie recipes by including the following properties. Requirement materials for running the recipe. Requirement material for running the recipe. Requirement amount of material for running the recipe. Unit is grams. Cost of requirement material for running the recipe. Unit is dollar. Total cost of materials. . Also, the JSON structure was sometimes changed every run. And, the unexpected property names were used. For example, there was a case that the property “instructions” was added and how to cook was set as the value, also, there was a case that the result value is not an array like {"popular_cookie_recipes":[{"recipe_name":,,,]}}. As additional information, when Follow response schema. is used as the prompt, the completely unexpected answers are returned while the output format is JSON including the unexpected properties.

Summary

From these results, it was found that when you want to always obtain the output values in a consistent JSON format, the combination of “response_mime_type” set to “application/json” and JSON schema is a suitable approach. In this case, the JSON schema is used in the prompt to define the desired structure of the output.

On the other hand, if you want to obtain the output values with a JSON structure that might include more information than expected, the combination of “response_mime_type” set to “application/json” and “response_schema” is more suitable. In this scenario, the JSON schema is provided to the “response_schema” parameter.

The Gemini API is under active development, and it is expected that future updates will provide more accurate control over the output format.

 Share!