Delete line breaks only within a particular field, leaving the rest of the text

I've been tasked with cleaning up data in a MongoDB collection that contains addresses and generic customer contact information.

The data sometimes includes carriage returns, which can cause issues when loading the data into a MySQL table. My solution involves using Javascript to execute a replace(/\n//g, '') on specific fields. However, even after applying this code, the data dump still appears messy, as shown below:

"_id"|"UserID"|"PhoneNumber"|"Source"|"PrivateLabelID"|"OptOut"|"Blocked"|"Deleted"|"Note"|"CreatedAt"|"UpdatedAt"|"FirstName"|"LastName"|"Email"|"Custom1"|"Custom2"|"Custom3"|"Custom4"|"Custom5"|"GroupIDs"
"5e37169df3369f47583355dc"|"127342"|"8645169963"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Timothy.. I mainly buy in the SW area of Florida. Please send me what you have"|"1580668573"|"1580668573"|"Lee"|"Burnside"|"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c3a0afa6aeb0acadb4acb1afa7a6adb7a6b1b7a2aaadaea6adb783a4aea2aaafeda0acae">[email protected]</a>"|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e3712c6958b2b1896070f2b"|"127342"|"8452063505"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Yes I am looking in the lower to central Florida market. Multi family units."|"1580667590"|"1580667591"|"Daniel "|"Lepore"|"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d3b7b2bdbab6bfbfb6a3bca1b693bab0bfbca6b7fdb0bcbe">[email protected]</a>"|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e37107f61befe0bea740cfa"|"127342"|"3867770002"|"1"|"1"|"undefined"|"undefined"|"undefined"|"He's with Habib
His last name is not Thompson that Habib name"|"1580667007"|"1580667007"|"Thompson"|""|""|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e370e08853f2702e40828fa"|"127342"|"4073712312"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Indeed we are looking for Buy, Fix and Sell and strong rentals including duplexes, triplexes etc.
"|"1580666376"|"1580666376"|"Gisela "|"Escobar"|"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ddb7bfb8a9bbb4b3bcb3beb4bcb19dbab0bcb4b1f3beb2b0">[email protected]</a>"|"undefined"|"undefined"|"undefined"|"undefined"|"undefined"|"[object Object]"
"5e3709f351798f62ea228e08"|"127342"|"4077774697"|"1"|"1"|"undefined"|"undefined"|"undefined"|"Yes I am buying in that area or any area in Florida if the numbers are right
only in Flipping houses

The biggest issue lies within the "Note" field. When running cat --show-all filename, LF "$" characters appear at the end of each record, as well as inside the "Note" field itself.

I attempted to use tr '\n' ' ' <filename, but this removed all LF characters. Is there a method to solely eliminate LF characters within the "Note" field?

PS: View the original data file (consisting of 9 lines) to verify the issues.

Answer №1

It seems like you are attempting to eliminate the occurrence of \n unless it is preceded by a quotation mark.

sed ':a;N;$!ba;s/[^"]\n/ /g' filename.txt

Answer №2

Criteria:

  • When examining the file using od, any embedded return/linefeed character appears as a single \n
  • The file may contain multiple instances of the embedded \n
  • After removing the embedded \n(s), each line should consist of 20 fields separated by |

In this scenario, I will analyze a smaller file containing only 6 fields per line. The comments provided are solely for documentation purposes and do not reflect the actual content of the data file:

$ cat abc.dat
f1|f2|f3|f4                    # part1 of line1
f4|f5|f6                       # part2 of line1
g1|g2|g3|g4 g4|g5|g6           # line2
h1|h2|h3|h4 h4|h5|h6           # line3
i1|i2|i3|i4                    # part1 of line4
f4|i5|i6                       # part2 of line4
j1|j2|j3|j4 j4|j5|j6           # line5
k1|k2|                         # part1 of line6
k3|k4 k4|k5                    # part2 of line6
|k6                            # part3 of line6
l1|l2|l3|l4 l4|l5|l6           # line7

In lieu of dealing with extracting the embedded \n, the approach is to concatenate lines together (with a space in between) until reaching 6 fields, followed by adding our own \n at the end.

An awk solution (comments included for reference):

awk -F"|" '
BEGIN          { prevNF=0    }                             # set initial value of previous NF to 0
(NF+prevNF)==6 { printf "%s\n",$0 ; prevNF=0 ; next      } # if total fields equals 6, print current line with \n and move to next line
               { printf "%s " ,$0 ; prevNF=(prevNF+NF-1) } # otherwise, print line ending with a space and update previous NF
END            { printf "\n" }                             # add final \n for new line
' abc.dat

Executing the above command on my abc.txt produces:

f1|f2|f3|f4 f4|f5|f6
g1|g2|g3|g4 g4|g5|g6
h1|h2|h3|h4 h4|h5|h6
i1|i2|i3|i4 f4|i5|i6
j1|j2|j3|j4 j4|j5|j6
k1|k2| k3|k4 k4|k5 |k6
l1|l2|l3|l4 l4|l5|l6

Here's a bash fiddle link

Answer №3

Consider utilizing MongoDB aggregation to implement the change before exporting.

If preserving the original data is important, you can create a new collection with the modified data:

db.inputCollection.aggregate([{$addFields:{Note:{$reduce:{input:{$split:["$Note","\n"]}, initialValue:"", in:{$concat:["$$value","$$this"]}}}}},{$out:"outputCollection"}])

Breaking it down:

$addFields adds new fields to each document, replacing existing ones if necessary Assigns the Note field the outcome of the $reduce operation
The input for the reduce function creates an array by splitting the Note field at every newline character {$split:["$Note","\n"]}
The in part of the reduce specifies the concatenation function {$concat["$$value", "$$this"]}, which merges the current value with the previous one This stage is akin to using .split("\n").join("")

$out saves the output in the specified collection for easy export

If altering the original collection is permissible, you can update that field like this:

db.inputCollection.find({Note:/\n/},{Note:1}).forEach(function(d){  
  db.inputCollection.update({_id:d._id}, {$set:{Note:d.Note.replace(/\n/g, '')}})
})

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Exploring the Scope of HTTP Requests

Is it possible to have the variable `pass_or_fail` assigned within the `.then` clause and have that assignment reflected outside of the `.then` clause? this.$http.post('http://localhost:3000/signup?username='+this.username+'&password=&a ...

The .angular-cli.json file is causing issues with loading scripts

I have recently updated my .angular-cli.json file by adding new scripts in the following format: "apps": [ { "root": "src", "outDir": "dist", "assets": [ "assets", "favicon.ico" ], "index": "index.html ...

passing data between functions using a shared global variable within an angular controller

Having some difficulty passing a value from one function to another within an Angular controller. I have an event: onTimeRangeSelected: function (args) { $scope.dayPilotCal.clearSelection(); $scope.createNewEventModalWindow(args); ...

"Exploring the utilization of Access Control Lists within the Algolia Search platform

Looking for the best method to handle access control lists in Algolia. We have lists of items, but prefer not to use a universal list. Each user has unique listings. Our application utilizes an ACL logic - is it possible to integrate this access logic wi ...

Head and tail tally

I've been working on coding a heads or tails system in JavaScript, but I've hit a roadblock. I am struggling to find a solution. Here is what I have so far: const userInput = prompt("Heads or Tails?"); const arr = [0, 0, 1, 1, 1, 0, 1, 0, 1]; ...

Converting MongoDB to DynamoDB: Dealing with Floating Point Errors in Python

Python has become a part of my daily routine, but I recently encountered an issue that's been giving me trouble. While this script works flawlessly with other collections by simply changing the collection and table names (MongoDB and DynamoDB), it fal ...

Creating random data in Vue is a fantastic way to enhance user

I'm attempting to retrieve a random quote from an API by generating a random index. However, when I click on the button associated with the index, I am not receiving a random quote as expected. Despite not encountering any obvious errors, the issue pe ...

select a date (excluding dates that are invalid)

Hey everyone, I'm facing an issue with my datepicker. I have disabled weekends, but users are still able to manually input a weekend date. $("[name=startdatecontract]").datepicker({ minDate : 2, beforeShowDay: $.datepicker.noWeekends }); I ...

The jQuery slider's next button is not functioning as intended

jQuery('#slider-container').bjqs({ 'animation' : 'slide', 'width' : 1060, 'height' : 500, 'showControls' : false, 'centerMarkers' : false, animationDuration: 500, rotationS ...

What is the functionality of the reduce() higher-order function?

Here is the reduce() function implementation: function reduce(array, combine, start) { let current = start; for (let element of array) { current = combine(current, element); } return current; } Now, I am tackling this question: ...

Events for focusing and blurring top level windows

I'm encountering an issue with a basic frameset setup <frameset rows="100, 200"> <FRAME name="listener" src="frame1.html"> <FRAME name="content" src="http://www.someexternalurl.com/"> </frameset> Within the listener ...

Create a password variable while typing

Looking to avoid interference from browsers with autocomplete and password suggestions while maintaining the show/hide letters feature. The goal is to keep the password stored as a variable, regardless of whether the characters are shown or hidden. The is ...

Unable to exhibit missing data by utilizing conditional syntax within jsx

I've created a search system where users can input a place. If the place is found, it shows the details of the location; otherwise, it displays a "not found" message. Below is an excerpt of the code: code render() { var margin = { marginTop ...

I'm having some trouble with my middleware test in Jest - what could be going wrong?

Below is the middleware function that needs testing: export default function validateReqBodyMiddleware(req: Request, res: Response, next: NextFunction) { const { name, email }: RequestBody = req.body; let errors: iError[] = []; if (!validator.isEmai ...

Enhance the efficiency of inserting data into MySQL database using Node.js

I've been developing an ETL tool for converting data from MongoDB to MySQL, and you can find the project on GitHub at mongodb-to-mysql-conversion. Currently, the speed of insertion queries is limited to around 600 insertions per second. Are there any ...

What is the best way to eliminate the nesting in this ternary operation?

Object.values(filter).filter(item => (Array.isArray(item) && item.length > 0) || (typeof item === "boolean" && item === true) || (item !== null)).length ? filterIcon : unFilledIcon In this code, I aim to simplify the nested ternary operator and ...

How can I implement JavaScript validations specifically for elements that are currently visible on an HTML page when utilizing jQuery's .hide() and .show() functions?

Below is a snippet of my jQuery code. $("#firstBankDetail").hide(); $("#secondBankDetail").hide(); $("#thirdBankDetail").hide(); $("#fourthBankDetail").hide(); $("#noOfBankDetails").change(function(){ var value = $(this).val(); if(value == 0) { ...

What is the process of relocating JSON and JS code from within an HTML file to external files?

My goal is to separate JSON and JavaScript code from the HTML file by moving them into external files. The examples shown below were part of a test I conducted to verify that the data was being successfully imported. As I begin to include real data, the J ...

Utilize a function in React Native without the need for an onPress event

As a beginner in react native, I am exploring ways to retrieve a 'key' without using the onpress in button. I simply want to fetch a 'key' when opening a component. Is there a way to achieve this? import React, { Component } from &apo ...

What's the best way to update and search in MongoDB using embedded structures?

Exploring the capabilities of MongoDB for storing results from a nmap network scan and contemplating the optimal way to structure the data. Currently, my structure looks like this: { "_id" : ObjectId("525f94097025166583e18eba"), "ip" : "127.0.0.1 ...