Currently, I am sending a large JSON string to a node express endpoint that is set up like this:
import bodyParser from 'body-parser';
const app = express();
const jsonParser = bodyParser.json({ limit: '4mb' });
const databaseUri = '<db connection string>';
const databaseClient = new MongoClient(databaseUri);
app.use(cors());
app.post('/fillDatabase', jsonParser, async (request, response) => {
const subjects = request.body;
console.log(`Filling database with ${subjects.length} subjects`);
const database = databaseClient.db('wanikani_db');
const subjectsTable = database.collection('subjects');
const result = await subjectsTable.insertMany(subjects).then((response) => {
console.log('inserted ' + response.insertedCount + ' subjects');
}).catch(() => {
console.log('failed to insert subjects');
}).finally(() => {
console.log('finally after inserting subjects');
});
});
After posting the JSON string, the function in the endpoint runs successfully to completion but the request does not finish. A stack trace is displayed with an error message:
SyntaxError: Bad control character in string literal in JSON at position 289
at JSON.parse (<anonymous>)
at parse (/Users/maxc/Documents/repos/wanikani-flashcards/backend/node_modules/body-parser/lib/types/json.js:92:19)
at /Users/maxc/Documents/repos/wanikani-flashcards/backend/node_modules/body-parser/lib/read.js:128:18
at AsyncResource.runInAsyncScope (node:async_hooks:206:9)
at invokeCallback (/Users/maxc/Documents/repos/wanikani-flashcards/backend/node_modules/raw-body/index.js:238:16)
at done (/Users/maxc/Documents/repos/wanikani-flashcards/backend/node_modules/raw-body/index.js:227:7)
at IncomingMessage.onEnd (/Users/maxc/Documents/repos/wanikani-flashcards/backend/node_modules/raw-body/index.js:287:7)
at IncomingMessage.emit (node:events:514:28)
at endReadableNT (node:internal/streams/readable:1376:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
The JSON object causing the issue can be found here. The problematic character is located at data -> characters.
The hexadecimal representation of that character is as follows:
000001c0: 2020 2020 2020 2020 2020 2022 6368 6172 "char
000001d0: 6163 7465 7273 223a 2022 e4b8 8022 2c0a acters": "...",.
Position 289 contains the Japanese character 一. Any suggestions on how to handle this specific character and others like it?