Today I encountered a scenario where I needed to sync data from a mongoDB collection to Vertica (SQL Database) with the object keys serving as columns in the SQL table. To achieve this, I utilized the mongoDB aggregation framework to query, manipulate, and project the desired result document before syncing it to Vertica.
The schema that I aimed to aggregate is structured as follows:
{
userId: 123
firstProperty: {
firstArray: ['x','y','z'],
anotherAttr: 'abc'
},
secondProperty: {
secondArray: ['a','b','c'],
anotherAttr: 'def'
}
}
Given that the values within arrays are not directly related to one another, my objective was to have each value of a nested array within a separate result document. To achieve this, I implemented the following aggregation pipeline:
db.collection('myCollection').aggregate([
{
$match: {
$or: [
{'firstProperty.firstArray.1': {$exists: true}},
{'secondProperty.secondArray.1': {$exists: true}}
]
}
},
{
$project: {
userId: 1,
firstProperty: 1,
secondProperty: 1
}
}, {
$unwind: {path:'$firstProperty.firstAray'}
}, {
$unwind: {path:'$secondProperty.secondArray'},
}, {
$project: {
userId: 1,
firstProperty: '$firstProperty.firstArray',
firstPropertyAttr: '$firstProperty.anotherAttr',
secondProperty: '$secondProperty.secondArray',
seondPropertyAttr: '$secondProperty.anotherAttr'
}
}, {
$out: 'another_collection'
}
])
The expected result should look like:
{
userId: 'x1',
firstProperty: 'x',
firstPropertyAttr: 'a'
}
{
userId: 'x1',
firstProperty: 'y',
firstPropertyAttr: 'a'
}
{
userId: 'x1',
firstProperty: 'z',
firstPropertyAttr: 'a'
}
{
userId: 'x1',
secondProperty: 'a',
firstPropertyAttr: 'b'
}
{
userId: 'x1',
secondProperty: 'b',
firstPropertyAttr: 'b'
}
{
userId: 'x1',
secondProperty: 'c',
firstPropertyAttr: 'b'
}
However, the actual output is more like:
{
userId: 'x1',
firstProperty: 'x',
firstPropertyAttr: 'b'
secondProperty: 'a',
secondPropertyAttr: 'b'
}
{
userId: 'x1',
firstProperty: 'y',
firstPropertyAttr: 'b'
secondProperty: 'b',
secondPropertyAttr: 'b'
}
{
userId: 'x1',
firstProperty: 'z',
firstPropertyAttr: 'b'
secondProperty: 'c',
secondPropertyAttr: 'b'
}
I am trying to pinpoint what exactly I am missing and how I can rectify the issue. Any insights would be greatly appreciated.