Utilizing PhantomJS for efficient scraping of web pages with simultaneous data loading

Question

Utilizing PhantomJS for efficient scraping of web pages with simultaneous data loading

My current challenge involves scraping a page with PhantomJS: . The goal is to extract the links of all products listed on the page. However, new data is loaded dynamically as I scroll down, adding 12 new items each time.

I discovered a hidden form within the HTML code that allowed me to retrieve 61 elements out of a total of 110 when submitted.

The question now is, how can I obtain the links of all products?

Below is the code snippet I have been working on:

var system  = require("system");
var fs      = require("fs");
var path = 'productLinks.txt';
var url = "http://www.avrilgau.com/fr/5-chaussures";
var page = require('webpage').create();
page.onConsoleMessage = function(msg) {
 console.log(msg);
};

page.open(url, function (status) {

var content = page.evaluate(function()
 {
  var allUrl=[];

  var tempNodeArray =document.querySelectorAll("#content > ul > li > div > div a.img");

  for (var i = 0; i < tempNodeArray.length; i++) {
    allUrl.push((tempNodeArray[i]).href);
  };

  return allUrl.join("\n")+"\n";


 });

console.log(content);
fs.write(path, content, 'a');

phantom.exit();

});

javascript ajax web-scraping phantomjs

Answer 1

Answer №1

I have noticed that there are exactly 61 products listed in the specified category. The infinite scroll feature seems to stop displaying products after reaching this count, as intended by the site's design. Can you please clarify where the number 110 came from?

Answer 2

I have noticed that there are exactly 61 products listed in the specified category. The infinite scroll feature seems to stop displaying products after reaching this count, as intended by the site's design. Can you please clarify where the number 110 came from?

Utilizing PhantomJS for efficient scraping of web pages with simultaneous data loading

Answer №1

Similar questions

Which JavaScript library or template engine would be most suitable for this scenario?

Update a JSON value using an MUI Switch element

Transferring Information between a Pair of Controllers

“The asynchronous wicket task”

Troubleshooting Issue with Mongoose Virtual Field Population

RN TypeScript is handling parameters with an implicit any type

What are the steps to reset a JavaScript game?

I have run into a roadblock when trying to generate dynamic pages with Gatsby

Creating directional light shadows in Three.JS: A Step-by-Step Guide

React element failing to appear on webpage

Tips for resolving the error message "Cannot assign type 'string' to type '...' in NextJS"

Utilizing nprogress.Js for a Dynamic Progress Bar

Detect if the user is using Internet Explorer and redirect them to a different

Angular: Monitoring changes in forms

Ways to store data in the localStorage directly from a server

A selection dropdown within a grid layout

Can someone assist me with navigating through my SQL database?

Guidance on creating cookies using Angular

Creating a rotating wheel using JavaScript that is activated by a key press event

Tips for deactivating a single edit button