To ensure accurate spelling check in JavaScript, I need to implement text normalization to remove extra whitespaces before checking for typos. However, it is crucial to keep the original text intact and adjust typo indexes accordingly after normalization.
Normalizing the text allows me to cache the results so that I can easily retrieve them without having to run the spell check again each time.
The process involves three main steps:
- Replace multiple whitespaces with a single space.
// "I lik cat." -> "I lik cat"
// "I lik cat." -> "I lik cat"
const regex = / +/ig;
"I lik cat.".replaceAll(regex, ' ')
"I lik cat.".replaceAll(regex, ' ')
- Identify the location of words in the cleaned text.
// "lik" starts at index 2.
let regexp = /lik/g;
let str = 'I lik cat.';
let starting_indexes = [];
let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
starting_indexes.push(match.index);
});
- Revert the normalized text back to its original form and readjust the starting indexes accordingly.
"I lik cat" -> "I lik cat.", starting index = 6
"I lik cat" -> "I lik cat.", starting index = 2
I am still exploring ways to efficiently revert normalized text back to its original state.