Using Puppeteer-cluster along with cheerio in an express router API results in an unexpectedly blank response

I am currently working on developing an API using express, puppeteer-cluster, and cheerio to extract anchor elements containing specific words that can be used as query parameters. My aim is to utilize puppeteer to capture dynamically generated elements as well. However, I am facing an issue where I am receiving an empty array as the output displayed on the browser.

Despite dedicating the past 2 days to mastering this library, I have made no significant progress. Any assistance or guidance would be greatly appreciated.

Update: I have updated all my functions to include async operations, which have enabled them to run successfully. However, the end result still remains empty :(

Update 2: Upon thorough logging and debugging, I have identified that the data.name is being passed as a Promise to the cheerio function. This appears to be the root cause of the issue, although I am uncertain about how to resolve it at this point.

Update 3: One of the key challenges I encountered was the mishandling of the page content (HTML code) being passed to the cheerio function. Despite the content being empty in the browser response, an error is displayed in the console:

Error handling response: TypeError: Cannot read properties of undefined (reading 'innerText').

This leads me to believe that the response is not formatted as JSON. Could it be that res.json() is not the correct method for formatting?

Here is a snippet of my code:

app.js

[code snippet for app.js]

cluster.js

[code snippet for cluster.js]

Despite not encountering any errors, I am presented with an empty output on the screen. Can anyone identify where I may be going wrong with this implementation? :(

Answer №1

It is not recommended to combine Puppeteer with another selection library like Cheerio as it can lead to redundancy and potential issues. Using a separate HTML parser with Puppeteer requires snapshotting the HTML and then parsing it with Cheerio, introducing unnecessary complexity.

By opting for Puppeteer's await page.content() instead of referencing document.body.innerHTML, you can avoid these complications.

Additionally, there is no need for Cheerio functions to be asynchronous since the API is fully synchronous.

For a basic setup using Cheerio with Puppeteer, ensure you understand the implications and consider if this extra layer is necessary for your specific use case:

const cheerio = require("cheerio"); // 1.0.0-rc.12
const puppeteer = require("puppeteer"); // ^19.0.0

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const url = "https://www.example.com";
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const html = await page.content();
  const $ = cheerio.load(html);

  // perform Cheerio operations synchronously
  console.log($("h1").text()); // => Example Domain
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

The process is similar for puppeteer-cluster: integrate the lines involving await page.content(); into the cluster.task callback that interacts with page.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Creating a webpage that supports multiple languages using either JavaScript or jQuery

I am currently developing a web application that will have support for multiple languages. To achieve this, I have created a table in my database with translations. However, I am unsure of how to populate the web application with these translations. My ide ...

Troubleshooting: Why isn't setMetadata working in NestJS from authGuards

When I call my decorators, I want to set metadata for logging purposes. Within my controller, the following decorators are used: @Post("somePath") @Permission("somePermission") @UseGuards(JwtAuthGuard) @HttpCode(200) @Grafana( ...

Steps for aligning items in a column horizontally in Bootstrap 5

After creating a Grid system with only 2 columns, I placed text in the first column and a carousel in the second. Despite setting a custom size for the carousel image, I'm facing difficulty centering it horizontally within the column. .title-sec { ...

Show only the record with the greatest price

I am working on a vanilla JavaScript application with Express on the backend and MongoDB. The app is a restaurant builder where each customer can reserve a table using a form. The form includes inputs for name, a select box with available tables, and bid p ...

Understanding the functionality of app.locals within app.get in an Express application and how to effectively parse data

I am currently developing a parse application using express. In my index file, I want to display different information to users based on whether they are logged in or not. However, I am facing an issue with storing the flag and logged-in user name using ap ...

Issue with Wicket: unable to generate sound for resource paths

I am having some trouble with a path in my wicket project. There is a sounds folder located within the Web Pages directory. Within my JavaScript code, I am using the following path to play sounds: audioElement.setAttribute('src', 'sounds/s ...

A step-by-step guide on how to refresh a circular loading indicator

I have been researching how to create a circular progress bar using canvas, and I came across this code. After making some modifications to the code snippets that I found online, I encountered an issue - I can't seem to reload the circular path once i ...

Having trouble with the `click()` function not working on a button while using Selenium in

Currently, I am running a selenium test on a remote server in headless mode using the chrome driver. However, when trying to click on a button with the following step, the button does not get clicked. Below is the test step attempting to click the element ...

What is the best way to move the Grid upward when the above content is not visible?

Let me demonstrate what I have been working on. Currently, I am creating a weather application to explore the functionalities of https://material-ui.com/. I am attempting to prototype an animation inspired by Google Flights, which can be seen here: https: ...

How to Trigger a Javascript Function from an OnItemClick Event in ASP.NET ListView

In order to achieve the functionality of refreshing ListView No.2 without refreshing the entire page when a user clicks on an item in ListView No.1, I am attempting to use JavaScript. Here is what I have tried so far: ListView No.1: <asp:ListView ID=" ...

Having trouble with filtering an array using the some() method of another array?

When utilizing the code below, my goal is to filter the first array by checking if the item's id exists in the second array. However, I am encountering an issue where the result is coming back empty. dialogRef.afterClosed().subscribe((airlines: Airli ...

Creating a stylish background gradient between two handles on a jQuery Slider with CSS

After watching a presentation by Lea Verou on CSS Variables titled CSS Variables: var(--subtitle);, I was inspired to create a gradient effect between jQuery Slider handles: $(function() { var max = 400; var $slider = $('.slider'); funct ...

Initiate an "execute.document" command directly from the address bar

While reviewing my old website, I stumbled upon this snippet: <input type="image" id="QuickPass" name="QuickPass" src="/images/QuickPass.gif" alt="Quick Pass" onclick="document.pressed=this.value" value="QuickPass" /> nestled within this form: & ...

The progress indicator for AJAX file uploads is malfunctioning; the addEventListener('progress') method is not functioning as intended

For some reason, the event progress listener isn't firing in AJAX when using Chrome web browser. However, when simply using the form submit function to the form action, the file uploads as expected. Even though the file is uploading behind the scenes ...

There is an issue with the functionality of Java code that has been adapted from Javascript logic

I'm currently using NetBeans8 IDE. Check out this java script function from this Fiddle function animate() { xnow = parseInt(item.style.left); item.style.left = (xnow+1)+'px'; ynow = parseInt(item.style.top); item.style. ...

Learning the integration of vuex with defineCustomElement in Vue 3.2

V3.2 of Vue introduces the ability to create custom elements with the defineCustomElement feature. Learn more here. Can anyone explain how to connect a store (Vuex) with a defineCustomElement? ...

Focus on an empty <input> tag with just the type attribute

In what way can React Testing Library be utilized to isolate a blank <input> element that solely possesses a type attribute? For instance, consider an input field that will eventually have attributes added dynamically, much like the surrounding labe ...

What is the best way to populate a dropdown menu by matching keys from an array within an ng-repeat

My JSON structure looks like this: 101 : "List": [ { "Name": "Pink" }, { "Name": "Black" } ] 102 : "List": [ { "Name": "Red" }, { "Name": "Yellow" } ] $sco ...

What is the best way to position and align an angle with the axis of a moving 3D figure?

My latest project involves a rotating planet, specifically Saturn with its iconic rings. To capture the best view of this celestial marvel, I configured the camera with precision: var camera = new THREE.PerspectiveCamera(45, width / height, 0.05, 1000); ...

Strange interaction observed when working with Record<string, unknown> compared to Record<string, any>

Recently, I came across this interesting function: function fn(param: Record<string, unknown>) { //... } x({ hello: "world" }); // Everything runs smoothly x(["hi"]); // Error -> Index signature for type 'string' i ...