Currently, I am working on developing a JSON validator from the ground up and have hit a roadblock when it comes to the string component. My original plan was to create a regex pattern that aligns with the sequence specified on JSON.org:
https://i.sstatic.net/JKW9V.gif
Here is the regex I have come up with so far:
/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4}))*\"$/
This regex successfully matches instances where there is a backslash followed by a character and an empty string. However, my dilemma lies in incorporating UNICODE characters.
Is there a regex pattern that can identify any UNICODE character excluding " or \ or control characters? Will it also detect a newline or horizontal tab?
I noticed that while the regex matches the string "\t", it does not recognize " " (four spaces meant to signify a tab). While extending the regex is an option, my hunch is that the horizontal tab is a UNICODE character.
Credit goes to Jaeger Kor for updating my regex to the following:
/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4})|[^\\"]*)*\"$/
This revised regex seems accurate, but should I be checking for control characters separately? Or is this unnecessary considering they fall under non-printable characters as per regular-expressions.info? The input being validated always originates from a textarea.
As an update, here is the finalized regex for reference:
/^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\0-\x1F\x7F]+)*")$/