I've been tasked with cleaning up data in a MongoDB collection that contains addresses and generic customer contact information.
The data sometimes includes carriage returns, which can cause issues when loading the data into a MySQL table. My solution involves using Javascript to execute a replace(/\n//g, '')
on specific fields. However, even after applying this code, the data dump still appears messy, as shown below:
"_id"|"UserID"|"PhoneNumber"|"Source"|"PrivateLabelID"|"OptOut"|"Blocked"|"Deleted"|"Note"|"CreatedAt"|"UpdatedAt"|"FirstName"|"LastName"|"Email"|"Custom1"|"Custom2"|"Custom3"|"Custom4"|"Custom5"|"GroupIDs"
"5e37169df3369f47583355dc"|"127342"|"8645169963"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Timothy.. I mainly buy in the SW area of Florida. Please send me what you have"|"1580668573"|"1580668573"|"Lee"|"Burnside"|"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c3a0afa6aeb0acadb4acb1afa7a6adb7a6b1b7a2aaadaea6adb783a4aea2aaafeda0acae">[email protected]</a>"|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e3712c6958b2b1896070f2b"|"127342"|"8452063505"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Yes I am looking in the lower to central Florida market. Multi family units."|"1580667590"|"1580667591"|"Daniel "|"Lepore"|"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d3b7b2bdbab6bfbfb6a3bca1b693bab0bfbca6b7fdb0bcbe">[email protected]</a>"|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e37107f61befe0bea740cfa"|"127342"|"3867770002"|"1"|"1"|"undefined"|"undefined"|"undefined"|"He's with Habib
His last name is not Thompson that Habib name"|"1580667007"|"1580667007"|"Thompson"|""|""|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e370e08853f2702e40828fa"|"127342"|"4073712312"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Indeed we are looking for Buy, Fix and Sell and strong rentals including duplexes, triplexes etc.
"|"1580666376"|"1580666376"|"Gisela "|"Escobar"|"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ddb7bfb8a9bbb4b3bcb3beb4bcb19dbab0bcb4b1f3beb2b0">[email protected]</a>"|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e3709f351798f62ea228e08"|"127342"|"4077774697"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Yes I am buying in that area or any area in Florida if the numbers are right
only in Flipping houses
The biggest issue lies within the "Note" field. When running cat --show-all filename
, LF "$" characters appear at the end of each record, as well as inside the "Note" field itself.
I attempted to use tr '\n' ' ' <filename
, but this removed all LF characters. Is there a method to solely eliminate LF characters within the "Note" field?
PS: View the original data file (consisting of 9 lines) to verify the issues.