PhantomJS 2.0.0 not delaying page loading

In the script provided, there is an array of URLs called links. The function gatherLinks() is designed to collect additional URLs from the sitemap.xml file related to the ones in the links array. Once the number of URLs in the links array reaches a certain limit determined by the variable limit, the function request() is executed for each URL in the links array. This function sends a request to the server, retrieves the response, and saves the image using the page.render() function.

An issue arises when running the script with PhantomJS 2.0.0 as many images end up missing content, indicating that PhantomJS may not be waiting for all the content to load. However, everything works fine when using PhantomJS 1.9.8. What could be causing this inconsistency?


var webpage = require('webpage');
var system = require('system');
var fs = require('fs');
var links = [];

links = [
    "http://somesite.com",
    "http://someothersite.com",
       . 
       .
       .
 ];

var index = 0, fail = 0, limit = 20;
finalTime = Date.now();

var gatherLinks = function(link){
  var page = webpage.create();
  link = link + "/sitemap.xml";
  console.log("Fetching links from " + link);

  page.open(link, function(status){
    if(status != "success"){
      console.log("Sitemap Request FAILED, status: " + status);
      fail++;
      return;
    }

    var content = page.content;
    parser = new DOMParser();
    xmlDoc = parser.parseFromString(content, 'text/xml');
    var loc = xmlDoc.getElementsByTagName('loc');

    for(var i = 0; i < loc.length; i++){
      if(links.length < limit){
        links[links.length] = loc[i].textContent;
      } else{
        console.log(links.length + " Links prepared. Starting requests.\n");
        index = 0;
        page.close();
        request();
        return;
      }
    }

    if(index >= links.length){
      index = 0;
      console.log(links.length + " Links prepared\n\n");
      page.close();
      request();
      return;
    }

    page.close();
    gatherLinks(links[++index]);
  });
};

var request = function(){
  t = Date.now();
  var page = webpage.create();
  page.open(links[index], function(status) {
    console.log('Loading link #' + (index + 1) + ': ' + links[index]);
    console.log("Time taken: " + (Date.now() - t) + " msecs");

    if(status != "success"){
      console.log("Request FAILED, status: " + status);
      fail++;
    }

    page.render("img_200_" + index + ".jpeg", {format: 'jpeg', quality: '100'});
    if(index >= links.length-1){
      console.log("\n\nAll links done, final time taken: " + (Date.now() - finalTime) + " msecs");
      console.log("Requests sent: " + links.length + ", Failures: " + fail);
      console.log("Success ratio: " + ((links.length - fail)/links.length)*100 + "%");
      page.close();
      phantom.exit();
    }

    index++;
    page.close();
    request();
  });
}

gatherLinks(links[0]);

Answer №1

When using PhantomJS, it is important to note that the page.open callback may be called at any point during the page loading process. This means there are no specific timing guarantees.

To address issues with dynamic sites, you can consider adding a static wait time using setTimeout(). Another approach is to monitor pending requests by tracking the number of requests sent and completed using page.onResourceRequested, page.onResourceReceived, page.onResourceTimeout, and page.onResourceError.

If you suspect a bug in PhantomJS, you can experiment with different command line switches as potential solutions.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Using JQuery's $.post function can be unreliable at times

Looking for help with a function that's giving me trouble: function Login() { var username = document.getElementById('username').value; var password = document.getElementById('password').value; $.post("Login.php", { ...

What can you do to prevent a div from taking up the entire page if its height is not specified?

Currently, I am experiencing an issue with my layout. I have a hidden div with a fixed position that becomes visible when a button on the page is clicked. Inside this div, there is a table of buttons for the user to choose from. The problem arises when I ...

Yii: Error: The method 'typeahead' is not defined for the object [object Object]

I am currently working on a project using Yii and I encountered a small issue with the Typeahead widget from Yiistrap. It seems that jQuery is being included multiple times - twice before the inclusion of bootstrap.js and once after. Uncaught TypeError: O ...

JavaScript makes Chrome Hard Reload a breeze

We've created a browser-based app using AngularJS and are currently testing it with a group of clients. One major challenge we're facing is explaining how to "Empty Cache and Hard Reload" using the Chrome browser (by pressing F12, then clicking o ...

search for a specific value within a nested subfield of an asterisk star field in Firestore

Here is the data I have: { root: { _rEG: { fen: 'value' }, _AS: { fen: 'value' }, _BSSA: { fen: 'value' } } } I would like to query using where('root.*.fen', '==', 'value'). ...

Creating a CSS animation to repeat at regular intervals of time

Currently, I am animating an SVG element like this: .r1 { transform-box: fill-box; transform-origin: 50% 50%; animation-name: simpleRotation,xRotation; animation-delay: 0s, 2s; animation-duration: 2s; animation-iterat ...

The error message in threejs is "GL_INVALID_OPERATION: Attempting to perform an invalid operation on

I'm having an issue when trying to combine post-processing with the "THREE.WebGLMultisampleRenderTarget." The console shows the error message: [WebGL-000052BE06CD9380] GL_INVALID_OPERATION: Invalid operation on multisampled framebuffer When using th ...

Is it possible to check if something is "ready" by using a combination of setTimeout and recursive functions?

I am currently working on a solution to determine when an asynchronous call is "ready" or not. I have a function that uses $.ajax which, upon success, sets a boolean variable in the global scope and some other data. Prior to making the ajax call, the boole ...

Is it possible for the Redux inside a React component from npm to clash with the Redux in the container?

I am looking to bundle a React component with npm and incorporate Redux to handle state within the component. If another React project imports my component, will it cause conflicts with the Redux instance of that project? For example: The component code ...

Exploring ways to loop through a JSON array and embed it into an HTML element

After the ajax request, the data returned is structured as follows: Data = [ ["18/02/2019", "A"], ["19/03/2019", "B"], ["21/05/2019", "C"], ] The ajax request was successful and the data is stored in a variable named Data within a function. ...

I am struggling with clicking on Bootstrap's pagination using AngularJS

I'm still getting the hang of Angularjs. I managed to set up a pagination system, but for some reason, I can't seem to interact with it when I run my project. Take a look at this screenshot that illustrates my issue: https://drive.google.com/fil ...

Having trouble with my ASP.Net OnClick Subs not functioning as desired

I have implemented onClick handlers to process button clicks that query SQL. The issue I am facing is that these queries sometimes take between 10 to 30 seconds to return a response. To prevent click-stacking during this time, I disabled the buttons. Howev ...

Exploring Linked Data in JSON API Response with Rails API, Active Model Serializer, and Vue.js Single Page Application

I am working with data structured in a tree/hierarchical model for climbing areas, where they are connected through parent and child relationships. By utilizing the JSON API adapter along with my Active Model Serializer, class AreaSerializer < ActiveM ...

Creating a table from data in your database using Firebase

Can anyone guide me on how to craft a data table similar to this examplehttps://i.sstatic.net/xiUNn.pngusing information from a Firebase database like shown here https://i.sstatic.net/YBzjz.png The table should have columns for ID, Title, Number of Answer ...

How can I trigger an audio element to play using onKeyPress and onClick in ReactJS?

While attempting to construct the Drum Machine project for freeCodeCamp, I encountered a perplexing issue involving the audio element. Despite my code being error-free, the audio fails to play when I click on the div with the class "drum-pad." Even though ...

What is the best way to ensure that circles only touch each other by their edges?

Trying to align three circles touching each other has been a challenge for me. Although I have successfully managed to make two touch, the third one remains elusive. How can I ensure that all three circles are in contact with each other, especially when th ...

Spin a Material UI LinearProgress

I'm attempting to create a graph chart using Material UI with the LinearProgress component and adding some custom styling. My goal is to rotate it by 90deg. const BorderLinearProgressBottom = withStyles((theme) => ({ root: { height: 50, b ...

How can I add navigation dots to my slider?

I've been experimenting with my slider and I managed to make it slide automatically. However, the issue is that there is no option to manually navigate through the slides. I am looking to add navigation dots at the bottom so users can easily switch be ...

Express - An error occurred: Unable to access the property 'prototype' as it is undefined in the file request.js on line 31

My time has been consumed trying to troubleshoot this issue, yet I am puzzled by its origin and the reason behind this error. I am in the process of creating a basic website to hone my skills in React and have been attempting to retrieve data from Riot&ap ...

The "Read more" feature is not functional on mobile devices

I am encountering issues with my Wordpress blog on mobile screens. The "read more" button is not functioning, and the screen size does not fit properly on mobile devices. Visit abood250.com for more information. CSS Styling: /* =Global ----------------- ...