Utilizing aggregation techniques or Map Reduce processes to generate standardized 'Distinct Paying Customers for each Vendor' data

I am currently working on generating a report for the number of Unique Paying Users Per Vendor using either Map Reduce or the Aggregation Framework in MongoDB. The challenge lies in normalizing the totals so that each user contributes a total of 1 across all vendors they have purchased from. For instance,

{
   "account": "abc",
   "vendor": "amazon",
},
{
   "account": "abc",
   "vendor": "overstock",
},
{
   "account": "ccc",
   "vendor": "overstock",
}

would result in

{
   "vendor": "amazon",
   "total" : 0.5
},
{ 
   "vendor": "overstock",
   "total": 1.5
}

In this scenario, the user 'abc' made two purchases and contributed equally to both vendors. Additionally, the sum of vendor totals equals the number of unique paying users.

Initially, I approached this aggregation process with four steps:

1. Start by storing the number of purchases per vendor for each user.
2. Calculate the total purchases for each user and distribute these among the respective vendors.
3. Merge the normalized purchase data for each user into a final vendor map through addition.

While effective with smaller datasets, this method proves to be slow and memory-intensive when dealing with larger sets.

Utilizing the Aggregation framework, I managed to calculate the total users but struggled with normalizing them effectively.

agg = this.db.aggregate(
[
    {
        $group :
        {
            _id :
            {
                vendor : '$vendor',
                user : '$account'
            },
            total :
            {
                $sum : 1
            }
        }
    }
]);

var transformed = {};
for( var index in agg.result)
{
    var entry = agg.result[index];

    var vendor= entry._id.vendor;
    if(!transformed[vendor])
    {
        transformed[vendor] = 0;
    }
    transformed[vendor] += 1;
}

I'm seeking guidance on restructuring this query to properly normalize the users' totals.

Answer №1

When it comes to analyzing data using MongoDB's .aggregate() or .mapReduce() methods, there are multiple approaches to consider. The efficiency of these approaches will depend on the size of your dataset.

If you choose to use the aggregate method, you will need to calculate totals per "vendor" and overall totals per user to determine percentages. This involves grouping operations and possibly creating arrays using the $unwind operator:

db.collection.aggregate([
    { "$group": {
        "_id": { "account": "$account", "vendor": "$vendor" },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.account",
        "purch": { "$push": { "vendor": "$_id.vendor", "count": "$count" } },
        "total": { "$sum": "$count" },
    }},
    { "$unwind": "$purch" },
    { "$project": {
        "vendor": "$purch.vendor",
        "total": { 
            "$divide": [ "$purch.count", "$total" ]
        }
    }},
    { "$group": {
        "_id": "$vendor",
        "total": { "$sum": "$total" }
    }}
])

Alternatively, the mapReduce approach involves two steps: first reducing responses by vendor for each user, then further reducing them down to vendor totals:

db.collection.mapReduce(
    function () {
        emit(
            this.account,
            {
                "data": [{
                    "vendor": this.vendor,
                    "count": 1,
                }],
                "total": 1,
                "seen": false
            }
        );
    },
    function (key,values) {

        var reduced = { data: [], total: 0, seen: true };

        values.forEach(function(value) {
            value.data.forEach(function(data) {
                var index = -1;
                for (var i = 0; i <=reduced.data.length-1; i++) {

                    if ( reduced.data[i].vendor == data.vendor ) {
                        index = i;
                        break;
                    }
                }

                if ( index == -1 ) {
                    reduced.data.push(data);
                } else {
                    if (!value.seen)
                        reduced.data[index].count += data.count;
                }
            });
        });

        reduced.data.map(function(x) {
            reduced.total += x.count;
        });

        return reduced;
    },
    { 
        "out": { "replace": "output" },
        "finalize": function (key,value) {

            var result = {
                data: []
            };

            result.data = value.data.map(function(x) {
                var res = { };
                res["vendor"] = x.vendor;
                res["total"] = x.count / value.total;
                return res;
            });

            return result;
        }
    }
)

The second part involves outputting the results:

db.output.mapReduce(
    function () {
        this.value.data.forEach(function(data){
            emit( data.vendor, data.total );
        });
    },
    function(key,values) {
        return Array.sum( values );
    },
    { "out": { "inline": 1 } }
)

Considering the size of your data, the mapReduce approach may be slower as it requires additional processing and output to a collection. On the other hand, the aggregation framework approach typically runs faster, but could face slowdowns depending on the size of the vendor array per user.

Answer №2

In response to Neil Lunn's previous answer, I had a moment of realization yesterday that the aggregation process would need to be multi-step if using map reduce. I appreciate your approach of utilizing map reduce to write to a collection, especially in dealing with larger datasets. I'm also intrigued by the potential performance boost from trying out the .aggregate() method.

After some experimentation, I settled on the following solution for our dataset:

1. Utilize the aggregation framework to calculate purchases per account.
2. Convert the results into a map for quicker access.
3. Implement map reduce on the collection, leveraging the 'scope' field to pass in the account total map created in step 2.

The code snippet resembles the following:

var agg = this.db.aggregate([
    {
        $group: {
            _id: {
                user: '$account'
            },
            total: {
                $sum: 1
            }
        }
    }
]);

var accountMap = {};
for (var index in agg.result) {
    var entry = agg.result[index];
    addToMap(accountMap, entry._id.user, entry.total);
}

// Free up memory by deleting agg?
delete agg;

var mapFunction = function() {
    var key = this.vendor;
    var value = 1 / accountMap[this.account];

    emit(key, value);
};

var reduceFunction = function(key, values) {
    return(Array.sum(values));
};

var res = this.db.mapReduce(mapFunction, reduceFunction, {
    out: {
        inline: 1
    },
    scope: {
        'accountMap': accountMap
    }
});

// Clearing up accountMap
delete accountMap;

var transformed = {};
for (var index in res.results) {
    transformed[entry._id] = entry.value;
}

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

How to Transfer a Props Value to an External Function in Vue 3

I'm currently working on a page that displays user details and need to also show the invoices associated with that user. <template> <div> <div> // Child component one ..... </div> <div ...

Is there a method to stop react-select (Select.Async) from erasing the search input value when it loses focus?

Situation: In my setup, I have a standard select element (categories), which dictates the options displayed in a Select.Async component from react-select. Problem: Consider this scenario: a user is searching for an option within Select.Async whil ...

Tips for preventing the use of inline styling in React

I am facing an issue with some variables set at the top of my functional component that are not being used except in styles: const testWidth = 100; const testHeight = 100; Although I use some of these variables in my styles... I am considering moving my ...

javascript cannot utilize html reset functionality

My drop down menu includes an onChange event that triggers a JavaScript method. However, when I select a new value and then click the reset button, the dropdown reverts back to its original value but the onChange event does not fire. <select onChange= ...

Error: Unable to execute state.productDetails as a function

import React, { useEffect } from "react"; import Loader from "../layout/Loader"; import { useAlert } from "react-alert"; import { useDispatch, useSelector } from "react-redux"; import { getProductDetails, cle ...

An unexpected error occurred in the Angular unit and integration tests, throwing off the script

I seem to be facing a recurring issue while running unit/integration tests for my Angular project using Karma. The tests have a 50:50 success/failure rate, working fine on my machine but failing consistently on our build server, making the process quite un ...

Search as you type and populate multiple HTML form fields simultaneously

Up until now, I have been copy-pasting my way through this project! I possess a customer database and an HTML form for data entry. Upon clicking submit, it generates a fillable PDF on the server. While typing a name, results are displayed, and upon selec ...

Using JavaScript to dynamically populate form fields

I am currently working on developing a form that allows users to add additional fields themselves. They will have the ability to add up to 5 new "options", with each option containing up to 50 "variants". For example, an option could be "size" with varian ...

Choosing an element from a multiple group listbox with jQuery

Is there a way to select items in the listbox using jQuery when dealing with optgroup elements? $('#id option[value=<?php echo $row; ?>]').attr('selected','selected'); The above code works for regular options, but how ...

Is it possible to accurately measure the width of elements down to the partial pixel level?

This dilemma has been causing me frustration for quite some time. My goal is to achieve a simple task - float li elements in a ul container and give them a hover effect. Here's the code snippet: Everything seems to be in order, with all CSS reset, dis ...

The ng-model does not reflect changes when using the JQuery datepicker

Currently, I have set up two textboxes for users to choose a date. I am utilizing JqQuery's datepicker UI to showcase a small calendar pop-up whenever the user clicks on the textbox. However, an issue arises when I select a date in the calendar popup ...

I am struggling to get the pop-up to close using the current code. I suspect that the issue might be related to the variable I was previously using in WordPress. I have made changes but the pop-up

While delving deeper into the realm of Javascript, I encountered a stumbling block with a single popup intended for the main page of a WordPress website I am constructing. Despite my attempts to modify the code's variables, they seem unyielding. Surpr ...

What is the best way to have an image trail another in JavaScript coding for my game?

Hey, can someone help me create a zombie game where a zombie image moves towards a player image (zs.png) at a specific speed and decreases the player's health upon contact? This game is designed for the 3ds browser, so it must be coded in JavaScript ...

A guide on populating a dropdown menu with spring and hibernate in JSP

I'm a newcomer here and seeking ideas from you all. I have 4 dropdown lists and want to populate one select box based on the selection made in another select box using database values. I have already set up the database, but unsure how to proceed. Any ...

Using jQuery, learn how to successfully call a selector from dynamic content

I am currently facing a challenge with a table that is generated server-side and then appended to the view page (client-side). Since the table is not directly included in the DOM, I am using the StickyTableHeaders jQuery plugin to create a sticky header fo ...

Issue with radio button list not functioning properly in Mozilla and Chrome browsers

My web application has a radiobuttonlist with an onclick event. It functions properly in IE, but not in some other browsers. Here is a snippet of the code: <asp:RadioButtonList ID="rbgThreadStatus" runat="server" RepeatDirection=&quo ...

Incorporate checkboxes within a dropdown menu using JavaScript

Is there a way to achieve something similar in pure JavaScript, without relying on jQuery? ...

In AngularJS, create a fresh window

I am struggling with a seemingly simple issue that I just can't figure out... Here's the scenario: I have a link set up to open a non-angular popup using onclick: <a href="page.html" ng-click="popitup('page.html')">Page</a> ...

Generating a two-dimensional array and setting its values in JavaScript

I need assistance with creating and initializing a two-dimensional array in JavaScript within an AngularJS application. My current approach is as follows: $scope.invalidVote = []; for (var i = 0; i < $scope.arry1.length; i += 1) { $scope.answersCou ...

Bulma Steps failing to advance to the next step despite clicking submit

I am currently working on implementing Buefy Steps in a Vue project. The Buefy steps are functioning correctly, but I am facing an issue where the 'Submit' button does not progress to the next step (e.g., from "Account" to "Profile"). App.vue: & ...