Utilizing MongoDB for data storage and Mongoose in Express for handling modifications, I believe my query processes should be optimized for speed. Correct me if I'm mistaken, but the dataset is large yet not overwhelmingly huge, especially since there are no filtering operations during the query.
The model schema I employ:
const DataSchema = mongoose.Schema({
title: String,
text: String,
text_transform: Array,
provider: String,
url: {
type: String,
unique: true,
},
word_count: Array,
HOW: Array,
date: Date,
click_count: Number,
});
A few key points to note:
text field can vary in size from 150 to potentially over 7000 characters.
text_transform is an array containing words from the text field.
provider is a short string with maximum length of 12 characters.
word_count contains key-value pairs represented as arrays within the array like this:
[
["key", value], ["other_key", other_value], ...
]
The values are numbers while the keys are strings.
HOW is an array with at most 5 elements, each element being a string ranging from 6-10 characters.
Upon triggering a query from my own API endpoint, I encounter challenges when trying to retrieve all the data.
My request goes to this endpoint:
router.get("/", async (req, res) => {
try {
const data = await DataSchema.find({}).lean();
res.json(data);
} catch (err) {
res.status(404);
}
});
As evident, I utilize lean() method for simplicity and return a JSON response. However, querying on the client side revealed sluggish performance even with around 270 documents in the database.
I suspect it's not a network issue because limiting the query to the first 10 elements results in noticeable improvement but still slower than anticipated, taking roughly a second.
Are there optimization methods to expedite this process?
In addition, it's worth noting that for development purposes, I am using the free tier of MongoDB Atlas without access to profilers or performance analytics. Also, lacking a production environment for testing and opting for Belgium region (europe-west1) due to proximity factors.
Furthermore, I conducted tests by removing text_transform field temporarily, yielding minimal impact on query speed.
EDIT:
Remembering to mention, I implemented an endpoint returning hardcoded JSON under 1 ms processing time, emphasizing the perceived slowness solely during queries.
.explain() outcomes
[
{
queryPlanner: {
plannerVersion: 1,
namespace: 'test.articles',
indexFilterSet: false,
parsedQuery: {},
winningPlan: [Object],
rejectedPlans: []
},
executionStats: {
executionSuccess: true,
nReturned: 301,
executionTimeMillis: 0,
totalKeysExamined: 0,
totalDocsExamined: 301,
executionStages: [Object],
allPlansExecution: []
},
serverInfo: {
host: 'some.database-shard.mongodb.net',
port: 27017,
version: '4.2.6',
gitVersion: '20364840b8f1af16917e4c23c1b5f5efd8b352f8'
},
ok: 1,
'$clusterTime': { clusterTime: '6831457389108002820', signature: [Object] },
operationTime: '6831457389108002820'
}
]
Thank you for your time and assistance in advance.