Currently, I am in the process of creating a proof-of-concept conjugation practice application using Vue.js. One crucial aspect of this application is that when you enter an answer for a conjugation, it compares the input text using String.startswith()
. However, a challenge arises when unicode characters are involved. In most cases, the unicode characters you type do not match those stored in the database. This discrepancy becomes evident even in a simple node CLI example where the "ț" character I type appears different from the one in the database, which is "ţ".
Below is an illustration of the typed input, its value, and unicode value compared:
input: anunț // anun\u21B
comparison: anunţ // anun\u163
I have experimented with methods like .normalize()
but unfortunately, it seems that neither the inputted string nor the comparison string is affected by it.
> var input = 'anunț'
> var comparison = 'anunţ'
> input === comparison
false
> input.normalize() === comparison
false
> input.normalize() === comparison.normalize()
false
> input === comparison.normalize()
false
/// etc etc with NFC, NFD, NFKC, NFKD forms
> input.normalize()
'anunț'
> comparison.normalize()
'anunţ'
// i've also tried .normalize() with the string decoded into unicode
I attempted converting to unicode and manually replacing one set of strings, but this method has limitations, including issues such as difficulty in making positive comparisons until the entire string is entered.
Exploring regex comparisons was my next step, although I suspect it might lead me down another complex path.
At its core logic, without any previous attempts, here is what I aim to achieve:
if (this.conjugation.startsWith(this.input)) {
this.status = "correct";
} else {
this.status = "incorrect";
}
if (conjugation === val) {
// okay, we are done
}
Any suggestions on how I could overcome this hurdle? Currently, I am focusing on testing with Romanian verbs, so the characters seem to fall within the following unicode ranges:
\u0000-\u007F
, \u0180-\u024F
, \u0100-\u017F