When parsing CSV files, I encountered values that include strings representing JSON objects, along with boolean, normal strings, and other data. The CSV file has a header, and as I loop through the non-header rows, I utilize Javascript's split method with a specific regex pattern to extract the value from each 'cell' of the CSV row:
let currentLine = lines[i].split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/)
However, the current method fails to properly separate lines containing valid JSON strings. Some strings are captured correctly, but others like the one below are truncated in strange places, disrupting the parsing of the entire CSV file:
'{"an object" : [{"sub-object 1": {
"description": "sub-object is for blah blah",
"nestedArray": ["param1","param2","param3"]}},
{"sub-object 2": {
"description": "sub-object is for blah blah",
"nestedArray": ["param1","param2","param3"]}},
{"sub-object 3": {
"description": "sub-object is for blah blah",
"nestedArray": ["param1","param2","param3"]}}
]
}'
Upon inspection, the above appears to be valid JSON (strangely, validates it when pasted without the single quotes - the reason behind this behavior is unclear). Any suggestions on how to correctly handle this parsing issue? While I initially thought it would be straightforward, embedding JSON within CSV values is proving to be a challenge. I'm unsure if there is a standard or recommended approach to address this.
EDIT: To clarify, if anyone has a regex pattern that can correctly capture the JSON string within a CSV file as shown above, I would greatly appreciate it if you could share it with me