Unicode normalization using Google Apps Script

Overview

This is a script for converting strings from NFD (Normalization Form Decomposition) to NFC (Normalization Form Composition) using Google Apps Script.

Description

Here, I would like to introduce a script for the unicode normalization using Google Apps Script. There are the characters with which is the voiced dot and the characters with which is the semi-voiced dot in Japanese language. When these are used for some applications, there are 2 kinds of usages for the character. For example, when for (\u306f) HA with the voiced dot, there are and ば. These unicodes are \u3070 and \u306f\u3099. Namely, there are the case which displayed 1 character as 2 characters. In most cases, the characters like \u3070 are used. This called NFC (Normalization Form Composition). But we sometimes meet the characters like \u306f\u3099. This called NFD (Normalization Form Decomposition). When the document including such characters which are displayed as 2 characters is converted to PDF file, each character is separated like は ゙. So users often want to convert the characters constructed by 2 characters to the single characters. Recently, String.prototype.normalize was added at ES2015. But ES2015 cannot be used at Google Apps Script yet. And although I had looked for the scripts like this for GAS, unfortunately, I couldn’t find. So I created this script.

The detail information and how to get this are https://github.com/tanaikech/ConvertNFDtoNFC.

 Share!