After researching on this topic, it was mentioned that JSON is supposed to be automatically written using surrogate pairs.
However, this has not been the case in my personal experience.
Despite running the code below with Node.js version 6.9.2
, some characters are still not encoded using surrogate pairs in the output file.
const fs = require('fs')
const infile = fs.readFile('raw.json', 'utf8', (err, data) => {
if (err) {
throw err
}
data = JSON.stringify(data)
fs.writeFile('final.json', data, 'utf8', (err) => {
if (err) {
throw err
}
console.log('done')
})
})
In my text editor, which supports unicode well and uses a font with glyphs for all characters, I noticed special characters like "題"
in the contents of the file raw.json
.
Unfortunately, even after saving the file as final.json
, those characters remain unchanged without being converted into surrogate pairs.
I also attempted switching the encoding from utf8
to utf16le
for the output file, but that did not solve the issue either.
Is there any method or technique to enforce the usage of surrogate pairs during JSON encoding?