Benchmark: Process cost for Parsing XML data using Google Apps Script

Gists

Introduction

In order to retrieve the values from XML data, when XML data is parsed using Google Apps Script, there are several methods for parsing the data. Class XmlService, which is a built-in Class for managing XML data, might be the first way to come up with it. At Stackoverflow, it is posted questions that XML data is often parsed using Class XmlService. It is considered that Class XmlService is suitable for managing XML data.

But, when I have created applications using XML data with Google Apps Script before, I have felt that the process of Class XmlService might be high. So, in this report, the process cost for parsing XML data using Google Apps Script. As the result, it was found that after the V8 runtime had been released, the methods except for Class XmlService in order to parse XML data can be also used.

Experimental procedure

In order to measure the process cost for parsing XML data, I used the following sample situation.

  1. Prepare multiple sample XML data.

    • In this report, 5 XML data are used.
    • You can see the sample XML data at “Appendix” section at the bottom of this report.
  2. Parse XML data and retrieve values using Google Apps Script.

    • As the methods for parsing and retrieving values from XML data, I measured the process costs of 3 sample scripts. You can see the sample scripts for testing at “Appendix” section at the bottom of this report.

      1. Using Class XmlService.
      2. Using cheerio for Google Apps Script. Ref1
      3. Using regex.

By the way, at GAS, the processing time is not stable as you know. So the average value for more than 100 times measurements was used for each data point which is shown in the figure. At this time, the fluctuation of the average values was less than 1 %. I worry that each detailed-data point in my environment might be different from that of other user’s environments. But I think that the trend of this result can be used.

Results and discussions

Fig 1. Process costs for parsing XML data.

Figure 1 shows the process costs for parsing XML data using Google Apps Script. This figure is the semi-log chart. In this case, the vertical axis is the log scale.

From Fig. 1, it was found that the process cost for parsing XML data using the regex is the lowest of all. When the script of cheerio is seen, the regex is used. But, “cheerio” has various methods and it can be used for various XML data. By this, it is considered that the process cost of “cheerio” is a bit higher than that of the simple script using the regex.

About Class XmlService, the process cost is much higher than those of “cheerio” and the simple script using regex. Here, I thought that in this case, it is required to consider the process of the scripts. By releasing V8 runtime, the process cost of Google Apps Script had been much reduced. Ref2 and Ref3 The scripts of “cheerio” and the simple script using regex are run with the script editor enabling V8 runtime. These scripts are benefitted from the V8 runtime. On the other hand, in the case of Class XmlService, the script is run on the internal serverside. By this, the script might not be benefitted from the V8 runtime. I thought that this might be the reason for this large difference in the process cost. Namely, it is considered that Class XmlService might be existing outside of the V8 runtime.

From these results, it was found that when the V8 runtime is enabled, in order to parse the XML data, when the script benefitting the V8 runtime is used without using Class XmlService, the process cost can be much reduced than the script using Class XmlService.

But, I think that Class XmlService is very useful for managing XML data. For example, Class XmlService can not only parse and read XML data but also update XML data and create new XML data. By this, I would like to propose reducing the process cost of Class XmlService. So, I reported this to the Google issue tracker. Ref4

Summary

In this report, the process costs for parsing XML data using Google Apps Script. From this report, the following result was obtained.

  1. When the V8 runtime is enabled, in order to parse the XML data, when the script benefitting the V8 runtime is used without using Class XmlService, the process cost can be much reduced than the script using Class XmlService.

  2. In the current stage, the process cost of Class XmlService is much higher than those of “cheerio” and the simple script using regex.

The process cost of this report might be modified by future updates of Google.

Appendix

Sample XML data

Sample XML data using in this report. When the number of x-axis in the figure is increased, the number of item is increased like <item id="sampleId1">,,,</item>. The number of sampleName is constant.

<?xml version="1.0"?>
<main1>
    <main2>
        <contents>
            <item id="sampleId1">
                <sampleName1>sample value 1</sampleName1>
                <sampleName2>sample value 2</sampleName2>
                <sampleName3>sample value 3</sampleName3>
                <sampleName4>sample value 4</sampleName4>
                <sampleName5>sample value 5</sampleName5>
                <sampleName6>sample value 6</sampleName6>
                <sampleName7>sample value 7</sampleName7>
                <sampleName8>sample value 8</sampleName8>
                <sampleName9>sample value 9</sampleName9>
                <sampleName10>sample value 10</sampleName10>
            </item>
            <item id="sampleId2">
                <sampleName1>sample value 1</sampleName1>
                <sampleName2>sample value 2</sampleName2>
                <sampleName3>sample value 3</sampleName3>
                <sampleName4>sample value 4</sampleName4>
                <sampleName5>sample value 5</sampleName5>
                <sampleName6>sample value 6</sampleName6>
                <sampleName7>sample value 7</sampleName7>
                <sampleName8>sample value 8</sampleName8>
                <sampleName9>sample value 9</sampleName9>
                <sampleName10>sample value 10</sampleName10>
            </item>
            <item id="sampleId3">
                <sampleName1>sample value 1</sampleName1>
                <sampleName2>sample value 2</sampleName2>
                <sampleName3>sample value 3</sampleName3>
                <sampleName4>sample value 4</sampleName4>
                <sampleName5>sample value 5</sampleName5>
                <sampleName6>sample value 6</sampleName6>
                <sampleName7>sample value 7</sampleName7>
                <sampleName8>sample value 8</sampleName8>
                <sampleName9>sample value 9</sampleName9>
                <sampleName10>sample value 10</sampleName10>
            </item>
            ,
            ,
            ,
            ,
            ,
        </contents>
    </main2>
</main1>

Sample scripts

Using XmlService

// Using XmlService
function sample1_(xml) {
  const root = XmlService.parse(xml).getRootElement();
  const contents = root
    .getChild("main2", root.getNamespace())
    .getChild("contents", root.getNamespace())
    .getChildren();
  const h = ["id", ...contents[0].getChildren().map((e) => e.getName())];
  const v = contents.map((e) => [
    e.getAttribute("id").getValue(),
    ...e.getChildren().map((f) => f.getValue()),
  ]);
  return [h, ...v];
}

Using cheerio

function sample2_(xml) {
  const $ = Cheerio.load(xml);
  const contents = $("contents").children().toArray();
  const h = [
    "id",
    ...$(contents[0])
      .children()
      .toArray()
      .map((f) => f.name),
  ];
  const v = contents.map((e) => [
    $(e).attr("id"),
    ...$(e)
      .children()
      .toArray()
      .map((f) => $(f).text()),
  ]);
  return [h, ...v];
}

Using Regex

function sample3_(xml) {
  const items = [...xml.matchAll(/<item(.+?)>(.+?)<\/item>/g)];
  const h = [
    "id",
    ...[...items[0][2].matchAll(/<(.+?)>(.+?)<\//g)].map(([, g1]) => g1),
  ];
  const v = items.map((e) => [
    e[1].split('"')[1],
    ...[...e[2].matchAll(/<(.+?)>(.+?)<\//g)].map(([, , g2]) => g2),
  ]);
  return [h, ...v];
}

Sample output values

When the above sample scripts are used for the above sample XML data, the following same result is obtained.

[
  [
    "id",
    "sampleName1",
    "sampleName2",
    "sampleName3",
    "sampleName4",
    "sampleName5",
    "sampleName6",
    "sampleName7",
    "sampleName8",
    "sampleName9",
    "sampleName10"
  ],
  [
    "sampleId1",
    "sample value 1",
    "sample value 2",
    "sample value 3",
    "sample value 4",
    "sample value 5",
    "sample value 6",
    "sample value 7",
    "sample value 8",
    "sample value 9",
    "sample value 10"
  ],
  [
    "sampleId2",
    "sample value 1",
    "sample value 2",
    "sample value 3",
    "sample value 4",
    "sample value 5",
    "sample value 6",
    "sample value 7",
    "sample value 8",
    "sample value 9",
    "sample value 10"
  ],
  [
    "sampleId3",
    "sample value 1",
    "sample value 2",
    "sample value 3",
    "sample value 4",
    "sample value 5",
    "sample value 6",
    "sample value 7",
    "sample value 8",
    "sample value 9",
    "sample value 10"
  ],
  ,
  ,
  ,
]

Reference

 Share!