To properly handle UTF-16 decoding, it is important to understand that the result should be a string of abstract characters, not just a conversion to UTF-8. While JavaScript uses internal encodings like UTF-16 or UCS-2 for strings, the focus should be on manipulating characters without having to worry about encodings.
It is crucial to note that simply removing nulls will not suffice for decoding utf-16, as this method may only work for the first 256 code points of Unicode. Using this approach with other Unicode characters can result in garbled output, especially with non-ASCII characters such as em dashes and smart quotes.
The example provided in the code snippet appears to be working with UTF-16LE encoding.
//Simple decoder function assuming valid input
function decodeUTF16LE(binaryStr) {
var cp = [];
for(var i = 0; i < binaryStr.length; i+=2) {
cp.push(
binaryStr.charCodeAt(i) |
(binaryStr.charCodeAt(i+1) << 8)
);
}
return String.fromCharCode.apply(String, cp);
}
var base64decode = atob; //Native method available for base64 decoding in Chrome and Firefox
var base64 = "VABlAHMAdABpAG4AZwA";
var binaryStr = base64decode(base64);
var result = decodeUTF16LE(binaryStr);
Furthermore, it is possible to handle special characters like smart quotes by adjusting the decoding process:
var base64 = "HCBoAGUAbABsAG8AHSA="
var binaryStr = base64decode(base64);
var result = decodeUTF16LE(binaryStr);
//"“hello”"