Imagine having a regular expression that appears as follows:
\w+
In this case, the string "helloworld" would be accepted:
helloworld
However, "héllowörld" would not pass the test:
héllowörld
The regex will stop at é
(and also break atö
) even though to a human, héllowörld
doesn't seem too far-fetched as a complete word.
Is there a way to enhance the functionality of \w
to include special word characters? Or must every unique Latin character be added manually to the regex pattern like so:
[\wéèåöä...........]+
The idea of finding and adding all possible special Latin characters does not appear to be the most efficient solution.
What other alternatives are available?