Recently, I've been working on creating a web scraper that utilizes data extracted from MongoDB to generate an array of URLs for periodic scraping using puppeteer. My goal is to make the scraper function run periodically with the help of setIntervalAsync.
The issue I'm facing right now is that my current code throws an error stating "UnhandledPromiseRejectionWarning: TypeError: Cannot convert undefined or null to object at Function.values..."
puppeteer.js
async function scrape(array){
// Code block to initiate loop
let port = '9052'
if(localStorage.getItem('scrapeRunning')=='restart'){
clearIntervalAsync(scrape)
localStorage.setItem('scrapeRunning') == 'go'
}else if(localStorage.getItem('scrapeRunning') != 'restart'){
/// Actual URL scraping using Puppeteer here ///
}
server.js
app.post('/submit-form', [
// Form Validation Here //
], (req,res)=>{
async function submitForm(amazonUrl,desiredPrice,email){
// Establish connection to MongoDB and update existing entry/new entry
// based on post request data
createMongo.newConnectToMongo(amazonUrl,desiredPrice,email)
.then(()=>{
// Set local variable to alert scraper to use clearIntervalAsync///
localStorage.setItem('scrapeRunning','restart');
// Fetch updated MongoDB data before continuing...
return createMongo.pullMongoArray();
})
.then((result)=>{
// Restart scraping with new data
puppeteer.scrape(result)
})
submitForm(req.body.amazonUrl, req.body.desiredPrice,req.body.email);
}
}
createMongo.pullMongoArray()
.then((result)=>{
setIntervalAsync(puppeteer.scrape, 10000, result);
})
The current behavior of the scraper is such that it starts running as expected after launching the server and maintains a 10-second interval between successive scrapes. However, upon submitting a form, updating the MongoDB collection with the new data, and initializing the localStorage item, the scraper encounters the aforementioned TypeError. Despite various attempts to rectify this issue, including adjusting the placement of setIntervalAsync and clearIntervalAsync within the post request code block, I have yet to find a solution. As someone relatively new to coding, especially asynchronous programming, any insights into what might be causing this problem would be greatly appreciated!
I suspect that the async nature of the code may be contributing to the problem, as all my efforts seem to indicate that the pullMongoArray function executes before newConnectToMongo finishes its task.