I am currently working on generating a report for the number of Unique Paying Users Per Vendor using either Map Reduce or the Aggregation Framework in MongoDB. The challenge lies in normalizing the totals so that each user contributes a total of 1 across all vendors they have purchased from. For instance,
{
"account": "abc",
"vendor": "amazon",
},
{
"account": "abc",
"vendor": "overstock",
},
{
"account": "ccc",
"vendor": "overstock",
}
would result in
{
"vendor": "amazon",
"total" : 0.5
},
{
"vendor": "overstock",
"total": 1.5
}
In this scenario, the user 'abc' made two purchases and contributed equally to both vendors. Additionally, the sum of vendor totals equals the number of unique paying users.
Initially, I approached this aggregation process with four steps:
1. Start by storing the number of purchases per vendor for each user.
2. Calculate the total purchases for each user and distribute these among the respective vendors.
3. Merge the normalized purchase data for each user into a final vendor map through addition.
While effective with smaller datasets, this method proves to be slow and memory-intensive when dealing with larger sets.
Utilizing the Aggregation framework, I managed to calculate the total users but struggled with normalizing them effectively.
agg = this.db.aggregate(
[
{
$group :
{
_id :
{
vendor : '$vendor',
user : '$account'
},
total :
{
$sum : 1
}
}
}
]);
var transformed = {};
for( var index in agg.result)
{
var entry = agg.result[index];
var vendor= entry._id.vendor;
if(!transformed[vendor])
{
transformed[vendor] = 0;
}
transformed[vendor] += 1;
}
I'm seeking guidance on restructuring this query to properly normalize the users' totals.