Eliminating redundant subdocuments in MongoDB

This particular schema represents a single document out of thousands in the database, all housed within the same collection.

Document 1:

{
    pageNumber: 0,
    results: [
        {
            jobkey: "AAA", 


        },
           {
            jobkey: "BBB",


        },
           {
            jobkey: "CCC",


        }
    ]
}

Document 2:

{
    pageNumber: 0,
    results: [
        {
            jobkey: "RRR", 


        },
           {
            jobkey: "VVV",


        },
           {               //This Entire Object needs to be removed
            jobkey: "AAA", //Duplicate jobkey value of document 1
                           //remaining objects in array should stay

        }
    ]
}

In this structure, each document contains a key called "results" that holds an array of objects. These objects consist of a jobkey and its corresponding value. Importantly, no two jobkeys within the same results array can have the same value.

The Problem:

If there is a duplicate jobkey value across different documents or within the same document, I require one of the duplicates to be deleted from the database. This situation might occur where the same jobkey exists in multiple result arrays.

Despite efforts in the mongo shell and through mongoose, a solution to remove these duplicate values has proven elusive.

Answer №1

This concept of "duplicates" seems slightly unconventional to me, especially considering that the values are stored in separate documents. As a result, enforcing this definition in future operations would require querying the entire collection each time to check if a value already exists before adding it to a target document.

To detect and eliminate these so-called "duplicates," you may need to execute an operation like the following:

db.collection.aggregate([
    // Filter out arrays with no content
    { "$match": { "results.0": "$exists } },

    // Unwind the array
    { "$unwind": "$results" },

    // Group keys by count and store corresponding doc _id's
    { "$group": {
        "_id": "$results.jobkey",
        "_ids": { "$push": "$_id" },
        "count": { "$sum": 1 }
    }},

    // Identify duplicate matches only
    { "$match": { "count": { "$gt": 1 } }
]).forEach(function(doc) {
    doc._ids.shift();    // remove the first element
    db.collection.update(
        { "_id": { "$in": doc._ids } },
        { "$pull": { "results": { "jobkey": doc._id } } },
        { "multi": true }
    )
})

In essence, this process involves identifying a list of terms designated as "duplicates" and then iteratively removing instances of such duplicates within other documents' arrays that contain a matching value.

It's important to note that this method assumes the designation of the "first" document containing the duplicated value as the definitive location for retention. If alternative criteria exist for determining where a duplicate should be retained, consider using $sort before $group.

The approach retains a list because only non-"first" documents require updates. The subsequent $match filter excludes cases where a grouped key has just one occurrence of the same value.

By cycling through these results, you essentially exclude the "first" document from the list since it is considered the original. The subsequent .update() function targets the "duplicate" documents in the list using $pull to remove array elements matching the specified jobkey value across all matched documents.

If your objective is to retain sub-document elements with unique jobkey values, consider storing these documents in a separate collection and referencing them within the parent array. In a distinct collection, apply a "unique constraint" on the index to prevent duplicate values from being inserted.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Navigating through pages using React Native is a straightforward process that involves utilizing

I'm currently working on a component that lists items and I want to add functionality to navigate to a different page that displays detailed information about each item. Below is the code I have for listing items: import React, { Component } from &ap ...

Possible Rewrite: "Is it possible to dynamically adjust the `<p>` value when modifying the range meter in JavaScript?"

I'm looking to replicate this functionality, but in the current code, I have a separate function for each DIV. Is there another way to achieve the same outcome with fewer functions? As it stands, adding multiple options would require writing multiple ...

Summarize the array of objects and find the average value for each distinct object name

I'm facing a challenge with an array structure: const originalArray = [ { name: "a", value: 1 }, { name: "a", value: 2 }, { name: "a", value: 3 }, { name: "b", ...

Using Firebase with Angular 4 to fetch data from the database and show it in the browser

Currently diving into Angular 4 and utilizing Firebase database, but feeling a bit lost on how to showcase objects on my application's browser. I'm looking to extract user data and present it beautifully for the end-user. import { Component, OnI ...

Closing the Main Dropdown Navbar in Bootstrap 4: What happens once you click on a Menu Option?

How can I close the dropdown menu in the navbar when an option is clicked after it switches to a mobile view? Below is the HTML code for my navbar. <nav class="navbar fixed-top navbar-expand-lg navbar-custom"> <div class="container-fluid"> ...

How can I troubleshoot the overflow-y problem in a tab modal from w3 school?

https://www.example.com/code-sample overflow: scroll; overflow-y: scroll; When I type a lot of words, they become hidden so I need to use overflow-y=scroll. Despite trying both overflow: scroll; and overflow-y: scroll;, I have been unable to achieve the ...

Prevent onClick event in jQuery and extract parameters from a function call

We've been tasked with implementing a temporary solution to some code, so it might seem a bit silly at first. But please bear with us. The goal is to block the onclick method of an anchor tag, extract the parameters from the function call, and then u ...

Utilizing React Redux Loading Bar: Error - Unable to access property 'default' of an undefined object

UPDATE After updating and installing the library with its newer package, I encountered the following error: TypeError: Cannot read property 'default' of undefined Function.mapStateToProps [as mapToProps] node_modules/react-redux-loading-bar/buil ...

Synchronize numerous PouchDB databases with a single CouchDB database

After reading the PouchDB documentation, I learned that sync occurs between a local database and a remote CouchDB database. Currently, I am working on developing a native application that includes a unique local database for each user (multiple databases) ...

The method `collectionGroup` is not recognized as a function within the context of _firebase__WEBPACK_IMPORTED_MODULE_10__.usersCollection

Having an issue similar to the title. I am encountering an error while attempting to download my 'hives' using collectionGroup, and I'm unsure of how to resolve it. view image details here Below is the code snippet: async fetchHives() { ...

Counting occurrences of characters in a string: A simple guide

I'm working on a function to identify characters from an array within a given string and count how many of them are present. I've attempted to cover every possible pattern, but the task seems overwhelming. I also experimented with the alternativ ...

`The functionalities of classList.add and classList.remove aren't behaving as anticipated.`

I'm currently working on a list of items (ul, li) that have a class applied to them which adds a left border and bold highlight when clicked. My goal is to reset the style of the previously clicked item back to its original state when a new item is c ...

javascript tabs not functioning as expected in HTML

I am currently putting the finishing touches on a basic website for a project I'm involved in. The site features two tab controls - one on the right side of the page and the other on the left. Each control has 3 tabs that, when clicked, display differ ...

Tips for displaying complex JSON structures in VueJS

I want to extract this information from a JSON dataset <ol class="dd-list simple_with_drop vertical contract_main"> <li class="alert mar" data-id="1" data-name="Active" style=""> <div class="dd-handle state-main">Active<span cl ...

Sign up process through AngularJS

Looking to implement a straightforward registration process using AngularJS. Initially, I retrieve a user with a specific email and assign it to $scope.users. If the method "GetUserByEmail" returns multiple users, I attempt to display a message stating "Us ...

Using toLocaleDateString method in Node for handling dates

After using toLocaleDateString in the browser, I noticed that it returns: n = new Date() n.toLocaleDateString() "2/10/2013" However, when using node.js, the format is completely different: n = new Date() > n.toLocaleDateString() 'Sunday, Februar ...

Instructions for enabling the touch slider feature in the Igx carousel component with Angular 6 or higher

Looking to enable the touch slider for Igx carousel using angular 6+? I am trying to implement the igx carousel for image sliding with reference from a stackblitz demo (https://stackblitz.com/edit/github-j6q6ad?file=src%2Fapp%2Fcarousel%2Fcarousel.compone ...

What is the best way to loop through unsorted JSON data and organize it into an array?

Currently, I am extracting data from an API in the form of a JSON object. The issue is that this JSON data is unordered by default. I am aware that JSON is meant to be unordered, but I would like to figure out how to loop through these keys and organize ...

Pattern matching to exclude specific characters

To enhance security measures, I am looking to restrict users from inputting the following characters: ~ " # % & * : < > ? / \ { | } . The key requirement is that all other characters should be permitted, while ensuring that only the sp ...

Size attribute set to 0 for an HTML input element

When an input element is rendered, it should have a width of 0. However, when there is no text in the input, the width should remain 0. To achieve this, you can use the following jQuery code: $('input').on('input', function () { $(th ...