Delete Entries in MongoDB Collection According to Unique User Pairs

I have a collection of messages stored in MongoDB and I need to keep only the latest 500 records for each pair of users. Users are identified by their sentBy and sentTo attributes.

/* 1 */
{
    "_id" : ObjectId("5f1c1b00c62e9b9aafbe1d6c"),
    "sentAt" : ISODate("2020-07-25T11:44:00.004Z"),
    "readAt" : ISODate("1970-01-01T00:00:00.000Z"),
    "msgBody" : "dummy text",
    "msgType" : "text",
    "sentBy" : ObjectId("54d6732319f899c704b21ef7"),
    "sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}

// More document examples here...

/* and more... assume it's over 10,000 documents */

I'm thinking of an algorithm that involves:

  • Grouping based on user pairs using the OR operator
  • Sorting the records in descending order based on timestamp
  • Limited to 500 records per user pair
  • Extract the array of _id values to be preserved
  • Use these IDs in a new query with .deleteMany() and $nin condition

I've been struggling with this issue and would greatly appreciate any help or guidance. Thank you so much :)

Answer №1

Based on the size of the collection, two different approaches could be considered:

  1. If the scale is manageable and you can group the entire collection within a reasonable timeframe, a similar method to your suggestion can be implemented:
db.collection.aggregate([
    {
        $sort: {
            sentAt: 1
        }
    },
    {
        $group: {
            _id: {
                $cond: [
                    {$gt: ["$sentBy", "$sentTo"]},
                    ["$sendBy", "$sentTo"],
                    ["$sentTo", "$sendBy"],
                ]
            },
            roots: {$push: "$$ROOT"}
        }
    },
    {
        $project: {
            roots: {$slice: ["$roots", -500]}
        }
    },
    {
        $unwind: "$roots"
    },
    {
        $replaceRoot: {
            newRoot: "$roots"
        }
    },
    {
        $out: "this_collection"
    }
])

The sorting stage should precede other operations as you cannot sort an inner array post grouping. The $cond in the group stage emulates the logic of the $or operator. Instead of fetching the result and using deleteMany with $nin, the $out operator can be used to rewrite the current collection.

  1. If the scale is too large for the previous approach, iterating user by user and following your initial suggestion would be more efficient. Here's a simple example:

let userIds = await db.collection.distinct("sentBy");

let done = [1];
for (let i = 0; i < userIds.length; i++) {
    
    let matches = await db.collection.aggregate([
        {
            $match: {
                $and: [
                    {
                        $or: [
                            {
                                "sentTo": userIds[i]
                            },
                            {
                                "sendBy": userIds[i]
                            }
                        ]
                    },
                    {  
                        $or: [
                            {
                                sendTo: {$nin: done}
                            },
                            {
                                sendBy: {$nin: done}
                            }
                        ]   
                    }
                ]
            }
        },
        {
            $sort: {
                sentAt: 1
            }
        },
        {
            $group: {
                _id: {
                    $cond: [
                        {$eq: ["$sentBy", userIds[i]]},
                        "$sendTo",
                        "$sentBy"
                    ]
                },
                roots: {$push: "$$ROOT"}
            }
        },
        {
            $project: {
                roots: {$slice: ["$roots", -500]}
            }
        },
        {
            $unwind: "$roots"
        },
        {
            $group: {
                _id: null,
                keepers: {$push: "$roots._id"}
            }
        }
    ]).toArray();
    
    if (matches.length) {
        await db.collection.deleteMany(
            {
                $and: [
                    {
                        $or: [
                            {
                                "sentTo": userIds[i]
                            },
                            {
                                "sendBy": userIds[i]
                            }
                        ]
                    },
                    {  
                        $or: [
                            {
                                sendTo: {$nin: done}
                            },
                            {
                                sendBy: {$nin: done}
                            }
                        ]
                    },
                    {
                        _id: {$nin: matches[0].keepers}
                    }
                ]
            }
        )
    }
    
    done.push(userIds[i])
}

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What could be the reason for the page scrolling upwards when clicking on href="#"?

I am encountering an issue with a hyperlink <a href="#" id="someID">Link</a> that is located further down the page. This link is used to trigger an Ajax request, but whenever it is clicked, the page automatically scrolls back to the top. I have ...

I can't seem to get anything to show up on my React

After starting to work with routing on react.JS, I encountered a challenge where the simple products.jsx file is supposed to display a simple message upon clicking, but it's not showing anything. Here is the code from Index.JS import React from &apos ...

Information backed by the HTML5 Local Storage feature

Is it possible to use a Local Storage object to store another Local Storage object? Thank you in advance. ...

What is the best way to ensure that JavaScript form errors disappear as soon as new input is entered?

Below is the code snippet: var nameInput = formHandle.f_Name; var idInput = formHandle.f_Id; // VALIDATING NAME if(nameInput.value === ""){ nameMsg = document.getElementById("nameErr"); nameMsg.style.background ...

Updating/Timer feature for a single feed out of two -- Combining data with $q.all()

I'm revisiting a question I previously asked here. The approach I took involved using the $q.all() method to resolve multiple http calls and then filtering and merging the data. Everything was working fine, but now I want to refresh one of the feeds ...

Having trouble with your Ajax post request?

I am currently working on creating a form that allows users to input information and submit it without the page refreshing. The processing of the form data will occur after the user clicks the submit button. To achieve this, I am utilizing jQuery and Ajax ...

Error: The options object provided for CSS Loader is not valid and does not match the API schema. Please make sure to provide the correct options when

Summary My Nuxt.js project was created using the command yarn create nuxt-app in SPA mode. However, I encountered an error after installing Storybook where running yarn dev resulted in failure to start the demo page. ERROR Failed to compile with 1 errors ...

Unable to locate the Chart object within the chartjs-plugin-labels.js file

Hello there, I am currently working on an Angular project where I want to incorporate a chart plugin. To achieve this, I executed the following commands: npm install angular2-chartjs npm install chartjs-plugin-labels Following that, I imported it into my ...

A method for highlighting duplicate rows in a DataTable by formatting them with the same color

I am currently utilizing the DataTable plugin to display rows in a table. I would like to highlight duplicate rows in the same color scheme. Can someone please assist me with this issue? In the example below, the entry for Black Winters is duplicated. I ai ...

Navigating through various div elements in Javascript and sending parameters to a script

Context In my project, I am using PHP to generate a series of voting sections. Each section follows the same format except for a unique number assigned to it, which increases with each iteration of the PHP loop. To keep track of the unique numbers, I uti ...

Transform an array of integers into a mapping of values and corresponding IDs for selected values

I've been able to successfully load values from my API into react-select for single select, but I'm encountering some issues with multi-select. const fetch_companies = () => { API.get("/companies") ... data = data.map(({ id: ...

Extracting information from a JSON file using the array name provided in the URL

Currently, I am facing an issue with printing data in JSON format from my 'data.json' file. In my PHP file (alldata.php), I have successfully managed to obtain all the data (arrays) and print them out in a pretty format. However, my challenge lie ...

Masonry layout organizes images in a vertical stack, with one image per row

After implementing Masonry on my website, I encountered an issue where all the images stack vertically in a single row once the page loads. Instead, I would like them to stack both horizontally and vertically. You can view the problem on My Jsfiddle. This ...

Generate a JavaScript File/Blob from an image's URI

Can I generate a File or Blob object from an image URI? I am utilizing the Cordova Image Picker plugin in my mobile application to retrieve photo URIs like this: "file:///data/user/0/..../image.jpg" However, I am struggling to create a File or Blob objec ...

Dealing with problematic hover behaviors in Cypress: A guide

I encountered an issue with Cypress hover functionality while trying to access a sub menu that appears after hovering over a main menu item. The error message I received was This element is not visible because it has CSS property: position: fixed and it&ap ...

The antithesis of jQuery's .parents() selector

I am currently developing a userscript for a webpage with an old-fashioned design consisting mainly of tables. My goal is to access a long table of fields so that they can be filled in automatically by the script. To simplify, the structure is as follows: ...

What is the most foolproof method for detecting when a checkbox has been marked as checked, regardless of the circumstances?

Is there a way to detect changes in the checked value of a checkbox when it is changed by a script that I do not control? I have tried using `addEventListener()` and jQuery's `on()` method, but they do not work in all cases. <input type="checkbox" ...

HTML/PHP/JS - Submit Form and Upload File Simultaneously

I need to create a form with a Photo Upload Input that allows users to upload pictures before submitting the form for faster processing. The form should also support uploading multiple files, display a progress bar, and provide a preview of the uploaded im ...

Cease making commitments or declarations

I am currently working on validating an image using an API. I have implemented promises and want the execution to stop and call a function if the API check fails. Here is my code: The function for calling the promises: checkPhotos(options,formData, "fr ...

Utilizing Reactjs to efficiently transmit name and value to Material UI autocomplete

I am trying to customize the Material UI Autocomplete component. I need to pass name, value and onchange similarly to how it is done for TextField. Can someone please help me achieve this? My current code is not functioning as expected. < ...