While parsing CSV files may seem straightforward at first glance, it can quickly become complex depending on the source of your CSV data (whether it's user-generated, standardized from an API, etc.):
- Dealing with headers
- Varying numbers of headers and data columns
- Different delimiters (e.g., Germans using
;
instead of ,
)
- Number formatting differences (e.g., Germans using
,
as decimal separators)
- Quoted data that may contain delimiters
- Whitespace handling
- Different line endings
- ...
That is why there are numerous CSV parsers available on npm (https://www.npmjs.com/search?q=csv). Some focus on speed like https://www.npmjs.com/package/fast-csv, while others prioritize convenience such as https://www.npmjs.com/package/papaparse.
Most of these parsers return row by row to accommodate processing streams that pass data line by line rather than column by column.
Here is a code snippet to organize your data column-wise:
const input = `header_1,header_2
value 1,value 2
value 3,value 4`
// Separate rows based on known line ending \n
const rows = input.split('\n');
// Extract the header row
const header = rows.shift();
// Determine number of columns by splitting the header using , delimiter
const numberOfColumns = header.split(',').length
// Initialize 2D-array with fixed size
const columnData = [...Array(numberOfColumns)].map(item => new Array());
for(var i=0; i<rows.length; i++) {
var row = rows[i];
var rowData = row.split(',');
// Assuming equal number of columns in header and data rows
for(var j=0; j<numberOfColumns; j++) {
columnData[j].push(rowData[j]);
}
}
console.log("columnData = " + JSON.stringify(columnData, null, 4));
Output will be:
columnData = [
[
"value 1",
"value 3"
],
[
"value 2",
"value 4"
]
]
Note: This example does not cover tasks such as removing whitespace or converting numerical values.
For easier handling, you can utilize papaparse
to parse data row by row and then apply a nested for
loop to arrange the data into columns:
const Papa = require('papaparse');
// Example data with ; delimiter and empty lines
const input = `header_1;header_2
1;"2"
3;4`;
// Papaparse options configuration
const parseOptions = {
quoteChar: '"',
delimiter: ';',
skipEmptyLines: true,
dynamicTyping: true,
}
const parseResult = Papa.parse(input, parseOptions);
const parsedData = parseResult.data;
// Extract the header row
const header = parsedData.shift();
const numberOfColumns = header.length;
// Initialize 2D array with fixed size
const columnData = [...Array(numberOfColumns)].map(item => new Array());
for(var i=0; i<parsedData.length; i++) {
var rowData = parsedData[i];
for(var j=0; j<numberOfColumns; j++) {
columnData[j].push(rowData[j]);
}
}
console.log("columnData = " + JSON.stringify(columnData, null, 4));
Output will be:
columnData = [
[
1,
3
],
[
2,
4
]
]