Characters in text that are perceived by users, known as graphemes, can consist of multiple codepoints in unicode.
According to Unicode® Standard Annex #29:
Users may perceive a character as a single unit of writing in a language, but it could actually be represented by several Unicode code points. This concept is called a user-perceived character to avoid confusion with the computer's use of the term character. For example, "G" + grave-accent forms a user-perceived character which consists of two Unicode code points. These characters are approximated by grapheme clusters that can be determined programmatically.
Is there a regular expression available (in javascript) that will match a single grapheme cluster? e.g.
"한bar".match(/*?*/)[0] === "한"
"நிbaz".match(/*?*/)[0] === "நி"
"aa".match(/*?*/)[0] === "a"
"\r\n".match(/*?*/)[0] === "\r\n"
"💆♂️foo".match(/*?*/)[0] === "💆♂️"