The idea being discussed can be referred to as "forward paging". One important distinction is that unlike the use of .skip()
or .limit()
modifiers, this method does not allow for going back to a previous page or jumping directly to a specific page without significant effort in storing "seen" or "discovered" pages. If your goal involves this type of "links to page" paging, then it is recommended to stick with the .skip()
and .limit()
approach despite any performance limitations.
If you are open to only "moving forward", here is the fundamental concept:
db.junk.find().limit(3)
{ "_id" : ObjectId("54c03f0c2f63310180151877"), "a" : 1, "b" : 1 }
{ "_id" : ObjectId("54c03f0c2f63310180151878"), "a" : 4, "b" : 4 }
{ "_id" : ObjectId("54c03f0c2f63310180151879"), "a" : 10, "b" : 10 }
These are the first three items on your initial page. You can iterate over the cursor using the following code snippet:
var lastSeen = null;
var cursor = db.junk.find().limit(3);
while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if (!cursor.hasNext())
lastSeen = doc._id;
}
This loop iterates the cursor, performing actions, and stores the value of the present _id
under lastSeen
when the end of the cursor is reached:
ObjectId("54c03f0c2f63310180151879")
In subsequent iterations, you feed this stored _id
value back into the query:
var cursor = db.junk.find({ "_id": { "$gt": lastSeen } }).limit(3);
while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if (!cursor.hasNext())
lastSeen = doc._id;
}
{ "_id" : ObjectId("54c03f0c2f6331018015187a"), "a" : 1, "b" : 1 }
{ "_id" : ObjectId("54c03f0c2f6331018015187b"), "a" : 6, "b" : 6 }
{ "_id" : ObjectId("54c03f0c2f6331018015187c"), "a" : 7, "b" : 7 }
This process repeats until no further results can be obtained.
This serves as a basic guide for natural order paging like _id
. For other scenarios, the process becomes more intricate. Review the following example:
{ "_id": 4, "rank": 3 }
{ "_id": 8, "rank": 3 }
{ "_id": 1, "rank": 3 }
{ "_id": 3, "rank": 2 }
To split these entries into two pages sorted by rank, you must keep track of what has been "seen" and exclude those records. For the first page:
var lastSeen = null;
var seenIds = [];
var cursor = db.junk.find().sort({ "rank": -1 }).limit(2);
while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if ( lastSeen != null && doc.rank != lastSeen )
seenIds = [];
seenIds.push(doc._id);
if (!cursor.hasNext() || lastSeen == null)
lastSeen = doc.rank;
}
{ "_id": 4, "rank": 3 }
{ "_id": 8, "rank": 3 }
In the next iteration, you want ranks less than or equal to the last seen rank while excluding previously viewed documents. Employ the $nin
operator for this purpose:
var cursor = db.junk.find(
{ "_id": { "$nin": seenIds }, "rank": "$lte": lastSeen }
).sort({ "rank": -1 }).limit(2);
while (cursor.hasNext()) {
var doc = cursor.next();
printjson(doc);
if ( lastSeen != null && doc.rank != lastSeen )
seenIds = [];
seenIds.push(doc._id);
if (!cursor.hasNext() || lastSeen == null)
lastSeen = doc.rank;
}
{ "_id": 1, "rank": 3 }
{ "_id": 3, "rank": 2 }
The number of "seenIds" retained depends on the granularity of your results where this value is likely to change. In this case, you can reset the seenIds
list if the current rank differs from the lastSeen
, preventing unnecessary growth.
These are the core concepts of "forward paging" for practical learning purposes.