What is the best way to eliminate duplicate entries in MongoDB with a specific condition?

Question

What is the best way to eliminate duplicate entries in MongoDB with a specific condition?

{
"_id" : ObjectId("5d3acf79ea99ef80dca9bcca"), 
"memberId" : "123",
"generatedId" : "00000d2f-9922-457a-be23-731f5fefeb14",
"memberType" : "premium"
},

{
"_id" : ObjectId("5e01554cea99eff7f98d7eed"), 
"memberId" : "123",
"generatedId" : "34jkd2092sdlk02kl23kl2309k2309kr",
"memberType" : "premium"
}

I possess a dataset consisting of 1 million documents in this particular format, and I am seeking guidance on how to eliminate duplicate documents based on the "memberId" field. Specifically, my goal is to delete duplicated documents where the value of "generatedId" does not contain a hyphen ("-"). As per the provided example, the second document should be removed due to the absence of a hyphen in the "generatedId" value. I would greatly appreciate any suggestions or insights on how to achieve this task.

javascript arrays mongodb mongodb-query aggregation-framework

Answer 1

Answer №1

It's important to have a strategy when dealing with your data, as the outcome can vary based on different factors.

One approach is to group your documents by their Ids to identify duplicates, and then filter out entries where the generatedId does not contain hyphens "-". By deleting these duplicate docs without hyphens in their generatedIds, you can clean up your dataset efficiently.

const result = await Collection.aggregate([
{
    $project: {
        _id: 1,
        doc: "$$ROOT",
    },
},
{
    $group: {
        _id: "$doc.memberId",
        count: { $sum: 1 },
        generatedId: { $first: "$doc.generatedId" },
        memberType: { $first: "$doc.memberType" },
    },
},
{
    $match: {
        count: { $gt: 1 },
        generatedId: { $regex: /^((?!-).)*$/g },
    },
},
]);

You'll end up with a list of documents that are duplicated based on memberId and do not have hyphens in their generatedIds. These can be safely deleted from your database.

Important Note: Be cautious when deleting data, as some duplicated memberIds may not contain hyphens in their generatedIds, resulting in unintentional deletions. Always backup your data before making any significant changes.

Answer 2

It's important to have a strategy when dealing with your data, as the outcome can vary based on different factors.

One approach is to group your documents by their Ids to identify duplicates, and then filter out entries where the generatedId does not contain hyphens "-". By deleting these duplicate docs without hyphens in their generatedIds, you can clean up your dataset efficiently.

const result = await Collection.aggregate([
{
    $project: {
        _id: 1,
        doc: "$$ROOT",
    },
},
{
    $group: {
        _id: "$doc.memberId",
        count: { $sum: 1 },
        generatedId: { $first: "$doc.generatedId" },
        memberType: { $first: "$doc.memberType" },
    },
},
{
    $match: {
        count: { $gt: 1 },
        generatedId: { $regex: /^((?!-).)*$/g },
    },
},
]);

You'll end up with a list of documents that are duplicated based on memberId and do not have hyphens in their generatedIds. These can be safely deleted from your database.

Important Note: Be cautious when deleting data, as some duplicated memberIds may not contain hyphens in their generatedIds, resulting in unintentional deletions. Always backup your data before making any significant changes.

Answer 3

Answer №2

db.collection.aggregate([
{ 
 // Find all records with a "-" in the generatedId field
 "$match" : { "generatedId" : { "$regex": "[-]"} } },

 // Group them by memberId
  { 
   "$group": { 
        "_id": "$memberId", 
  }}

])

Answer 4

db.collection.aggregate([
{ 
 // Find all records with a "-" in the generatedId field
 "$match" : { "generatedId" : { "$regex": "[-]"} } },

 // Group them by memberId
  { 
   "$group": { 
        "_id": "$memberId", 
  }}

])

What is the best way to eliminate duplicate entries in MongoDB with a specific condition?

Answer №1

Answer №2

Similar questions

Changing the main directory name in a Three.JS project triggers an unexpected aliasing glitch

"Enhancing Your List: A Comprehensive Guide to Editing List Items with the Power of AJAX, jQuery, and

Tips for incorporating a time interval in every iteration of a for loop

Is integrating Vuetify with a project template created using vue-cli 3 causing issues when adding script tags to App.vue?

Adding an item to a nested array within an object in a Redux reducer

Issues encountered when updating values in MaterialUI's TextField using Formik

Creating a state in React and populating the rows of a Material-UI table with fixed data

The DB GridFS API is causing files to become corrupt when downloading .zip files

The React-FontAwesome icon is unable to display when a favicon has been set

Removing a Request with specified parameters in MongoDB using NodeJS

The absence of a form data boundary in the content-type of the POST request header

JQuery scrolling animation not functioning on a specific page

What is the reason behind RxJs recording 2 events during a long keypress?

When the button is clicked, the image vanishes into thin

Treating Backbone Collection as an object instead of an array

Display the identical page using JavaScript within MVC4

Reorganize a collection of objects to a more intricate structure

Understanding the importance of maintaining the execution order is crucial when working with NodeJS applications that utilize

An expected expression was encountered near the if condition

What is the JavaScript method for updating an HTML5 datalist to show new options?