Is it possible to access and modify a control variable while performing the aggregation process?
Unfortunately, it is not feasible to track previous or next elements during the execution of the aggregation pipeline.
The concept involves transforming events into arrays based on their respective time values.
You have two choices.
Breakdown
Video Play : [1,5,7]
Video Pause : [3,6,10]
Features : [2,4,8,9]
Play-Features : 2 8,9
Video play-pause pair : [1,3],[5,6],[7,10]
Pause-Features : 4
Video pause-play pair : [3,5],[6,7],[10,-]
Desired Outcome
{count:3}
First Option: (Perform all actions within the aggregation pipeline )
Add additional stages to convert documents into an event-array structure.
Consider the following documents
db.collection.insertMany([
{eventName:"Video Play",creation:1},
{eventName:"Click Features 1",creation:2},
{eventName:"Video Pause",creation:3},
{eventName:"Click Features 1",creation:4},
{eventName:"Video Play",creation:5},
{eventName:"Video Pause",creation:6},
{eventName:"Video Play",creation:7},
{eventName:"Click Features 1",creation:8},
{eventName:"Click Features 1",creation:9},
{eventName:"Video Pause",creation:10}
]);
You can utilize the following aggregation
This aggregation involves two $group
stages to convert events into their respective time arrays, followed by a $project
stage to project ($let
) each event creations array as variables.
To understand the logic inside $let
, refer to option 2 below
db.collection.aggregate([
{
"$sort": {
"eventName": 1,
"creation": 1
}
},
{
"$group": {
"_id": "$eventName",
"creations": {
"$push": "$creation"
}
}
},
{
"$group": {
"_id": "null",
"events": {
"$push": {
"eventName": "$_id",
"creations": "$creations"
}
}
}
},
{
"$project": {
"count": {
"$let": {
"vars": {
"video_play_events": {
"$arrayElemAt": [
"$events.creations",
{
"$indexOfArray": [
"$events.eventName",
"Video Play"
]
}
]
},
"click_features_event": {
"$arrayElemAt": [
"$events.creations",
{
"$indexOfArray": [
"$events.eventName",
"Click Features 1"
]
}
]
},
"video_pause_events": {
"$arrayElemAt": [
"$events.creations",
{
"$indexOfArray": [
"$events.eventName",
"Video Pause"
]
}
]
}
},
"in": {*}
}
}
}
}
])
*You now have events creations array for each event. Insert the following aggregation code and replace $video_play_events
with $$video_play_events
and so forth to access variables from the $let
stage.
Second Option: (Store events in distinct arrays)
db.collection.insert([
{
"video_play_events": [
1,
5,
7
],
"click_features_event": [
2,
4,
8,
9
],
"video_pause_events": [
3,
6,
10
]
}
])
You can manage array growth by including a "count" field to limit the number of events stored in a single document.
Multiple documents can be created for a specified time interval.
This simplifies the aggregation process to the below query.
The following aggregation iterates through video_play_events
and filters all features between each play and pause pair (pl
and pu
).
$size
is used to count the number of feature elements between every play and pause pair, followed by $map
+ $sum
to determine the total number of feature events across all play-pause pairs.
db.collection.aggregate([
{
"$project": {
"count": {
"$sum": {
"$map": {
"input": {
"$range": [
0,
{
"$subtract": [
{
"$size": "$video_play_events"
},
1
]
}
]
},
"as": "z",
"in": {
"$let": {
"vars": {
"pl": {
"$arrayElemAt": [
"$video_pause_events",
"$$z"
]
},
"pu": {
"$arrayElemAt": [
"$video_play_events",
{
"$add": [
1,
"$$z"
]
}
]
}
},
"in": {
"$size": {
"$filter": {
"input": "$click_features_event",
"as": "fe",
"cond": {
"$and": [
{
"$gt": [
"$$fe",
"$$pl"
]
},
{
"$lt": [
"$$fe",
"$$pu"
]
}
]
}
}
}
}
}
}
}
}
}
}
}
])
Notes:
There is a risk of reaching the 16 MB document limit depending on the number of documents being aggregated in both scenarios.
Utilize the async module for running parallel queries with appropriate filters to contain the data being aggregated, followed by client-side processing to calculate all components.