Analyzing Google Apps Script from Stackoverflow

Gists

Analyzing Google Apps Script from Stackoverflow

Abstract

The report details a novel Gemini API method to analyze big data beyond AI context window limits, which was validated with Stack Overflow data for insights into Google Apps Script’s potential.

Introduction

Generative AI models face significant limitations when processing massive datasets, primarily due to the constraints imposed by their fixed context windows. Current methods thus struggle to analyze the entirety of big data within a single API call, preventing comprehensive analysis. To address this challenge, I have developed and published a detailed report presenting a novel approach using the Gemini API for comprehensive big data analysis, designed to operate effectively beyond typical model context window limits. Ref

This report introduces and validates the results achieved by applying this Gemini API approach to a substantial real-world dataset. The dataset analyzed comprises questions, answers, and comments retrieved directly from Stack Overflow, demonstrating the method’s applicability to large-scale, complex textual data. The analysis results provide insights into the current situation and future potential of Google Apps Script.

Workflow

Analyzing Google Apps Script from Stackoverflow

The above workflow image is described as follows:

  1. An initial Input prompt is provided.
  2. Big data is retrieved from Stack Overflow.
  3. The retrieved big data is split into individual Questions including Answers and Comments. These are then organized into a list.
  4. Inside Class AnalyzeBigData, the data is split into chunks. Each chunk’s token count is kept below the input token limitation of the Gemini API.
  5. Generate a summary for each chunk using the given prompt. This step utilizes Gemini to process each chunk.
  6. If the Number of chunks is greater than 1, the generated summaries are processed again by the AnalyzeBigData class. This indicates a recursive or iterative process.
  7. If the number of chunks is 1, the final result is returned. This is the termination condition for the process.

Big data

The data for this report was sourced from Stack Overflow, specifically focusing on the google-apps-script tag. Utilizing the StackExchange API, a comprehensive dataset was retrieved, encompassing all questions posted within the specified timeframe, alongside all associated answers and comments. The data collection included essential metadata for each post, such as creation date, score, view count, and relationships between questions, answers, and comments.

The chosen timeframe for data retrieval is from January 1, 2020, through April 24, 2025. This period was selected to align with ongoing annual trend analysis and to specifically investigate recent shifts in community activity. As previously documented in reports such as “Trend of google-apps-script Tag on Stackoverflow 2025”, Fig. 1 illustrates a decrease in the number of new questions tagged google-apps-script in recent years. This decline is understood to be significantly influenced by the increasing use of AI tools for coding assistance. Therefore, collecting data from 2020 allows for an analysis that covers the period leading up to and during the more prominent rise of AI tools, enabling a focused examination of their potential impact on the google-apps-script community on Stack Overflow and providing insights into the future trajectory of this technology under these changing circumstances.

The data for all questions from 2020 to April 24, 2025, on Stack Overflow is as follows.

Script

In this case, in order to obtain the final result, I used the following 2 scripts.

  1. The original JSON data was split into chunks, and JSON data was generated from each chunk.
  2. A summary was generated from the generated JSON data.

When you test these scripts, first, please get analyze_big_data_by_Gemini.py from my repository. https://github.com/tanaikech/analyze_big_data_by_Gemini

Also, before you test these scripts, it is required to retrieve the data from Stackoverflow. The JSON schema of the data can be seen at “json_schema” of “Script 1”.

Script 1

In this script, the big data retrieved from Stackoverflow is split into chunks, and the summaries are generated from each chunk. And, the result is saved as a file generated_contents.txt.

In this case, both the start and end data are JSON data.

from analyze_big_data_by_Gemini import AnalyzeBigData
import json
import os

api_key = "###"  # Your API key.
filename = "sample_data_file.txt"
file_path = os.path.join("./", filename)

data = []
with open(file_path, "r", encoding="utf-8") as f:
    content = f.read()
    data = json.loads(content)

json_schema = {
    "title": "Stack Overflow Data Schema",
    "description": "Schema for Stack Overflow question and answer data",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "data": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "title": {
                            "type": "string",
                            "description": "Title of the Stack Overflow question",
                        },
                        "link": {
                            "type": "string",
                            "format": "url",
                            "description": "Link to the Stack Overflow question",
                        },
                        "date": {
                            "type": "string",
                            "format": "date-time",
                            "description": "Date and time the question was posted",
                        },
                        "question": {
                            "type": "string",
                            "description": "Content of the Stack Overflow question. This is a markdown format.",
                        },
                        "comments": {
                            "type": "array",
                            "description": "Comments on the question",
                            "items": {"type": "string"},
                        },
                        "answers": {
                            "type": "array",
                            "description": "Answers to the question",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "date": {
                                        "type": "string",
                                        "format": "date-time",
                                        "description": "Date and time the answer was posted",
                                    },
                                    "is_accepted": {
                                        "type": "boolean",
                                        "description": "Indicates if the answer is accepted",
                                    },
                                    "answer": {
                                        "type": "string",
                                        "description": "Content of the answer. This is a markdown format.",
                                    },
                                    "comments": {
                                        "type": "array",
                                        "description": "Comments on the answer. This is a markdown format.",
                                        "items": {"type": "string"},
                                    },
                                },
                                "required": [
                                    "date",
                                    "is_accepted",
                                    "answer",
                                    "comments",
                                ],
                            },
                        },
                    },
                    "required": [
                        "title",
                        "link",
                        "date",
                        "question",
                        "comments",
                        "answers",
                    ],
                },
            }
        },
        "required": ["data"],
    },
}
prompt = [
    "There is JSON data in the uploaded file of the attachment. This JSON data is retrieved from Stackoverflow.",
    "The JSON schema of the JSON data is JSONSchema.",
    f"<JSONSchema>${json.dumps(json_schema)}</JSONSchema>",
    "Understand the given whole data, including questions, answers to each question, comments to each question, and comments to each answer, and summarize it to investigate a trend of Google Apps Script.",
    "<IncludePoints>",
    "- Relationship between Google Apps Script and AI.",
    "- Activity of users with Google Apps Script on Stackoverflow.",
    "- Highly interested in fields with Google Apps Script.",
    "- The reason why the number of questions and answers decreases every year after 2020.",
    "- Future potential of Google Apps Script.",
    "- Future challenges of Google Apps Script.",
    "- Strategy for increasing the users of Google Apps Script on Stackoverflow.",
    "- Future role of Stackoverflow to the users of Google Apps Script.",
    "</IncludePoints>",
    "Include the above points in the property of 'content' in the response schema.",
]
response_schema = {
    "description": "JSON schema for the generated content.",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "date": {
                "type": "object",
                "properties": {
                    "start": {
                        "type": "string",
                        "description": "Start date of the question in the data.",
                    },
                    "end": {
                        "type": "string",
                        "description": "End date of the question in the data.",
                    },
                },
                "required": ["start", "end"],
            },
            "content": {"type": "string", "description": "Generated summary."},
        },
        "required": ["date", "content"],
    },
}
object = {
    "api_key": api_key,
    "data": data,
    "prompt": "\n".join(prompt),
    "response_mime_type": "application/json",
    "response_schema": response_schema,
}
res = AnalyzeBigData().run(object)
with open("./generated_contents.txt", "w", encoding="utf-8") as f:
    json.dump(res, f, indent=2)

Script 2

In this script, the file generated_contents.txt created by “Sample 1” is split into chunks, and a summary is generated from all chunks. And, the result is saved as a file result.txt. This is the final result.

In this case, the start and end data are JSON data and text data, respectively.

from analyze_big_data_by_Gemini import AnalyzeBigData
import json
import os

api_key = "###"  # Your API key.
filename = "generated_contents.txt"
file_path = os.path.join("./", filename)

data = []
with open(file_path, "r", encoding="utf-8") as f:
    content = f.read()
    data = json.loads(content)

json_schema = {
    "description": "JSON schema for the generated content.",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "date": {
                "type": "object",
                "properties": {
                    "start": {
                        "type": "string",
                        "description": "Start date of the question in the data.",
                    },
                    "end": {
                        "type": "string",
                        "description": "End date of the question in the data.",
                    },
                },
                "required": ["start", "end"],
            },
            "content": {"type": "string", "description": "Generated summary."},
        },
        "required": ["date", "content"],
    },
}
prompt = [
    "There is JSON data in the uploaded file of the attachment. This JSON data is retrieved from Stackoverflow.",
    "The JSON schema of the JSON data is JSONSchema.",
    f"<JSONSchema>${json.dumps(json_schema)}</JSONSchema>",
    "Understand the given whole data, including questions, answers to each question, comments to each question, and comments to each answer, and summarize it to investigate a trend of Google Apps Script.",
    "<IncludePoints>",
    "- Relationship between Google Apps Script and AI.",
    "- Activity of users with Google Apps Script on Stackoverflow.",
    "- Highly interested in fields with Google Apps Script.",
    "- The reason why the number of questions and answers decreases every year after 2020.",
    "- Future potential of Google Apps Script.",
    "- Future challenges of Google Apps Script.",
    "- Strategy for increasing the users of Google Apps Script on Stackoverflow.",
    "- Future role of Stackoverflow to the users of Google Apps Script.",
    "</IncludePoints>",
    "Include the above points in the result.",
]
object = {
    "api_key": api_key,
    "data": data,
    "prompt": "\n".join(prompt),
    "limit_length": 1000000,
}
res = AnalyzeBigData().run(object)
if len(res) == 1:
    with open("./result.txt", "w", encoding="utf-8") as f:
        f.write(res[0])

Result

When the summary is generated from this big data with the above workflow using these 2 scripts, the following result was obtained. It’s content in the file result.txt. You can see that the questions related to Google Apps Script from January 1, 2020, to April 24, 2025, are reflected in the result.

Generated summary

Summary

 Share!