PhantomJS 2.0.0 - Error in Selection: Argument is Not Valid

In the script provided below, there are URLs stored in the "links" array. The function gatherLinks() is designed to collect additional URLs from sitemap.xml based on the URLs present in the "links" array. Once the "links" array reaches a certain number of URLs (determined by the variable "limit"), the function request() is executed for each URL within the "links" array to send a request to the server and retrieve the response time. The total time taken by the program is reported upon completion.

I developed a PhantomJS program (source included) to send requests and measure the time taken, aiming to compare the performance between versions 2.0.0 and 1.9.8. I extract links using the sitemap.xml file from sites manually added to the "links" array.

When running with PhantomJS 2.0.0, after around 65 requests, the program begins outputting the following message while executing the method page.open():

var system = require('system');
var fs = require('fs');
var links = [];

links = [
    "http://somesite.com",
    "http://someothersite.com",
       . 
       .
       .
 ];

var index = 0, fail = 0, limit = 300;
finalTime = Date.now();

var gatherLinks = function(link){
  var page = require('webpage').create();
  link = link + "/sitemap.xml";
  console.log("Fetching links from " + link);

  page.open(link, function(status){
    if(status != "success"){
      console.log("Sitemap Request FAILED, status: " + status);
      fail++;
      return;
    }

    var content = page.content;
    parser = new DOMParser();
    xmlDoc = parser.parseFromString(content, 'text/xml');
    var loc = xmlDoc.getElementsByTagName('loc');

    for(var i = 0; i < loc.length; i++){
      if(links.length < limit){
        links[links.length] = loc[i].textContent;
      } else{
        console.log(links.length + " Links prepared. Starting requests.\n");
        index = 0;
        request();
        return;
      }
    }

    if(index >= links.length){
      index = 0;
      console.log(links.length + " Links prepared\n\n");
      request();
    }

    gatherLinks(links[index++]);
  });
};

var request = function(){
  t = Date.now();
  var page = require('webpage').create();
  page.open(links[index], function(status) {
    console.log('Loading link #' + (index + 1) + ': ' + links[index]);
    console.log("Time taken: " + (Date.now() - t) + " msecs");

    if(status != "success"){
      console.log("Request FAILED, status: " + status);
      fail++;
    }
    if(index >= links.length-1){
      console.log("\n\nAll links done, final time taken: " + (Date.now() - finalTime) + " msecs");
      console.log("Requests sent: " + links.length + ", Failures: " + fail);
      console.log("Success ratio: " + ((links.length - fail)/links.length)*100 + "%");
      phantom.exit();
    }

    index++;
    request();
  });
}

gatherLinks(links[0]);

Having experimented extensively with the code, no clear pattern emerges regarding the issues described below. With version 2.0.0, successfully sending 300 requests without errors only occurred once. Various combinations of URLs were tested, but failures typically happened between the 50th and 80th request. I keep record of failed URLs which all function normally when tested individually using another PhantomJS application. Version 1.9.8 displays greater stability, though intermittent crashes still occur without discernible patterns.

Answer №1

Your code has several issues that need addressing. One major problem is the continuous creation of new pages for each request without properly closing them, leading to potential memory depletion.

To mitigate this issue, consider reusing a single page for all requests instead of creating a new one each time. Move the line

var page = require('webpage').create();
to the global scope outside of gatherLinks() and request(). Alternatively, remember to call page.close() once you are done with it, keeping in mind PhantomJS's asynchronous nature.

If using multiple page objects was intended to prevent cache re-use for subsequent requests, note that this approach does not address the problem. Pages within a single PhantomJS process share cookies and cache, similar to tabs or windows. To isolate each request, consider running them in separate processes, possibly utilizing the Child Process Module.


Additionally, there seems to be an error in your code snippet within gatherLinks():

if(index >= links.length){
  index = 0;
  console.log(links.length + " Links prepared\n\n");
  request();
  return; // ##### THIS #####
}

gatherLinks(links[index++]); 

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Iterate through JSON and dynamically populate data

My goal is to dynamically populate content by iterating through JSON data using jQuery. First, an Ajax request is made Then the data is looped through to create HTML tags dynamically A div container with the class "product-wrapper" is created for each J ...

Tips for Properly Positioning Floating Pop-Up Messages Using CSS and jQuery

I was experimenting with adding a button that triggers a popup message to my website. I followed a coding tutorial using jQuery and CSS, which looks like this: Javascript : !function (v) { v.fn.floatingWhatsApp = function (e) {...} }(jQuery); CSS : .fl ...

What is the process for evaluating the variable `results` in this scenario?

Can anyone provide some insight on how this code is functioning? I grasp the concept of callbacks but during a tutorial video, I encountered this code snippet: function addAndHandle(n1, n2, cb) { const result = n1 + n2; cb(result); } addAndHandle ...

Transmitting payment card details to the node

I'm considering utilizing dibs payment dibspayment I came across this Node.js wrapper for the: DIBS API wrapper for Node.js However, using this would involve sending credit card information through a POST request to my node server. My main concern ...

What methods can I use to locate the datetime format within an HTML document using JavaScript?

I am working on a page where I need to locate and convert all datetime values. Specifically, I am looking to identify Hijri datetime values and convert them to standard datetimes using JavaScript. Could someone please advise me on how to locate datetime ...

Interact with a dynamic hyperlink using Selenium

I am having trouble clicking on a dynamically generated link that triggers some JavaScript. Despite trying to use various methods such as 'onclick' and JavaScripExecutor, I cannot seem to locate the element on the page. The code snippet I am curr ...

Top method for identifying browser window modifications such as navigating back, altering the URL, refreshing, or closing the window

Currently, I am developing a testing application that requires me to trigger a finsihTheTest() function in specific situations. These situations include: When the user attempts to reload the page. When the user tries to navigate back from the page. If the ...

What could be causing the sluggish performance of my JavaScript code?

Could someone offer insight into why my JavaScript code is performing slowly? Are there any optimizations I can implement to improve its speed? Thank you! $(document).ready(function() { /* Trigger animation when window is scrolled */ $(window).scroll( ...

Integrate MongoDB with JavaScript for seamless data management

When attempting to connect to MongoDB, I encountered the error message: "throw new MongooseError('Mongoose.prototype.connect() no longer accepts a callback');" const mongoURI = "mongodb://localhost:27017/?directConnection=true"; const connectTo ...

Developing a quiz using jQuery to load and save quiz options

code: http://jsfiddle.net/HB8h9/7/ <div id="tab-2" class="tab-content"> <label for="tfq" title="Enter a true or false question"> Add a Multiple Choice Question </label> <br /> <textarea name ...

Pressing a button triggers the Django for-loop to activate the view multiple times

Recently, I've been developing a blog website where users can post various content. In my index.html file, I have included the following code snippet: {% for post in posts %} {% include "blog/featuredpost.html" %} {% endfor %} In addition to this, I ...

Using THREE.js to create interactive clickable objects is a fun way to engage users. By clicking on a cube, you can trigger the appearance of a second cube,

I have a goal in mind. When the first box is clicked, I want a second box to appear. Then, when the second box is clicked, I want the color of the original cube to change. I've managed to achieve the first step, but I'm unsure how to implement th ...

Ways to remove a row from a div element in JavaScript without relying on tables

Hey everyone, I'm looking for a way to delete a row within a div when a delete button is pressed using JavaScript. The catch is that I need to accomplish this without using a table, only with div elements. Can anyone provide a solution for me? func ...

Retrieve the data from a Sequelize Promise without triggering its execution

Forgive me for asking, but I have a curious question. Imagine I have something as simple as this: let query = User.findAll({where: {name: 'something'}}) Is there a way to access the content of query? And when I say "content," I mean the part g ...

Using Python to interact with a Silverlight form and potentially utilizing JavaScript for submission

I know this may seem like a sketchy situation, but I'm facing an unusual scenario where I am leading a project that was originally built by another vendor who still owns the server and the server-side code. Despite this obstacle, I have managed to au ...

Issue with NPM package installation due to hostname/IP address not matching certificate's altnames

Need help with npm package installation issue I encountered a problem with NPM while trying to install a package. The error message reads: "NPM is not installing the package. The hostname/IP address doesn't match the certificate's altnames:" H ...

Access scope information when clicking with ng-click

I am currently using a php api to update my database, but I want the ability to choose which item gets updated with ng-click. app.controller('ReviewProductsController', function ($scope, $http) { $scope.hide_product = function () { ...

Exploring the possibilities of JQuery for smoothly transitioning to internal link destinations?

As a newcomer to JQuery, I have implemented internal links on my website and have a question. Is there a way to create a smooth 'slide down' effect when a user clicks on an internal anchored link text to navigate to the link destination? < ...

Having trouble getting JavaScript to work when returned via AJAX

After triggering a lightbox by clicking on a business name, I encountered an issue when trying to replicate the same functionality using AJAX. The lightbox fails to show up. Can anyone provide assistance? The following code represents a 3rd party publishe ...

A guide on utilizing the .getLastRow() function in Google Apps Script

I am relatively new to Google Script and I am currently working on a piece of code that is giving me some trouble. My goal is to have the program loop through a range of cells in a spreadsheet, printing them out until it reaches the last row. Despite try ...