Using PhantomJS to extract data from an ASP.NET website that features a dynamic combo dropdown box

I am attempting to scrape this particular page that is coded in ASP.NET and contains 7 dynamic combo drop down boxes using PhantomJS v1.9.8.

The JavaScript code I am using is as follows:

var page = require('webpage').create();
console.log('Current user agent: ' + page.settings.userAgent);
page.settings.userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';
page.open('http://www.etcfinance.com.hk/online_appraise.aspx', function(status) {
    page.injectJs("https://code.jquery.com/jquery-latest.js", function() {
        page.evaluate(function() {
          $("#ddlArea").val('香港');
          __doPostBack('ddlArea', '');
          setTimeout(function() {
            console.log('Zone: ' + $('#ddlZone').val());
          }, 1000);
        });
        phantom.exit();
    });
});

The output stalls at :

Current user agent: Mozilla/5.0 (Macintosh; PPC Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.8 Safari/534.34

and does not proceed further. How can I effectively select desired values for all of those combo dropdown boxes?

The relevant part of the HTML structure looks like this:

<table width="460" bgcolor="#E0F3FF" border="0" cellpadding="3" cellspacing="0" class="content">
<tbody><tr height="20"><td></td></tr>
<tr class="insidecontent"> 
  <td style="Padding-Left:20px;Padding-Right:20px;"> 
    <div align="left"> Region : </div>
  </td>
  <td valign="top"> 
  <select name="ddlArea" onchange="javascript:setTimeout('__doPostBack(\'ddlArea\',\'\')', 0)" id="ddlArea" class="textbox" style="width:29em">
        <option selected="selected" value="">Select Region</option>
        <option value="Hong Kong">Hong Kong</option>
        <option value="Kowloon">Kowloon</option>
        <option value="New Territories/Outlying Islands">New Territories/Outlying Islands</option>

    </select>
  </td>
</tr>
... (remaining HTML structure) ...

Note: The above HTML code may contain errors.

Additionally, the reason for opting to use page.injectJS over page.includeJS is due to the latter triggering the following error:

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://parse.js. Domains, protocols, and ports must match.

Answer №1

page.injectJs does not have a callback function and can only inject local files. The desired code is not executed using this method. To include remote scripts, you should use page.includeJs.

You have two options: either download jQuery and place it in the local directory to use page.injectJs (the simpler solution) or try to make it work with remote scripts using page.includeJs. This may require running with commandline options such as --web-security=false and

--local-to-remote-url-access=true
.

Also note that jquery-latest.js is permanently fixed at version 1.11.1. If you need a newer version of jQuery, specify the actual version number.

Another issue in your script is early exiting. Using setTimeout interrupts the execution flow and page.evaluate finishes immediately. The exit call happens before the setTimeout callback. Here's a workaround:

page.evaluate(function() {
    $("#ddlArea").val('香港');
    __doPostBack('ddlArea', '');
});
setTimeout(function() {
    page.evaluate(function() {
        console.log('Zone: ' + $('#ddlZone').val());
    });
    phantom.exit();
}, 1000);

This improves the script, but you still might not see output on the console. You also need to register for the page.onConsoleMessage event.

Chained version:

var selects = [
    ['ddlArea', '香港'], 
    ['ddlZone', '...'], 
    ...
];

selects.forEach(function(sel, i){
    setTimeout(function() {
        page.evaluate(function(sel) {
            $("#"+sel[0]).val(sel[1]);
            __doPostBack(sel[0], '');
        }, sel);
    }, i * 1000);
});
setTimeout(function() {
    phantom.exit();
}, 1000 * selects.length);

A better approach would be to simulate actual clicks and utilize waitFor paired with async.js to wait until the next select is populated.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The handleChange function fails to trigger when selecting a date using Material UI components

I am currently facing an issue with the material ui datepicker. When I click on a date, the selected date is not chosen and the date window does not close. I suspect this is due to passing the date into another file and the handleChange function (from Form ...

Can Node support setting the ECMAScript version?

After some research, I discovered that Node utilizes Chrome's V8 JavaScript engine. If you're interested in learning more about ES6 support, check out this link as well as this one. Additionally, there is a command for viewing V8 options when usi ...

"JQPlot line chart fails to extend all the way to the edge of the

My jqPlot line plot looks great, but it's not extending all the way to the edge of the containing div, which is crucial for my application. I've tried adjusting all the options related to margins and padding, but there's still a noticeable g ...

The Ajax functionality seems to be malfunctioning when attempting to call a

Required Tasks : I have successfully developed an HTML page with the functionality to store a group of form fields into a separate file. Utilizing the Ajax function (AjaxLoad), I am able to send data to file.php and save it successfully. Although I ...

Methods for encoding and decoding special characters using either JavaScript or jQuery

I am looking for a solution to encode and decode various special characters using JavaScript or jQuery... ~!@#$%^&*()_+|}{:"?><,./';[]\=-` I attempted to encode them using the following code snippet... var cT = encodeURI(oM); // ...

How can you set an input field to be initially read-only and then allow editing upon clicking a button using Vue.js?

//I have two divs that need to be set as readonly initially. When an edit button is clicked, I want to remove the readonly attribute and make them editable. <div> <input type="text" placeholder="<a href="/cdn-cgi/l/email-protection ...

What is the best way to insert hyperlinks within every cell of a table using Angular?

I am currently working on a table created with ng-repeat in Angular. The cells are populated by variables in the scope. <tbody> <tr ng-repeat="item in items" myDirective> <td>{{item.title}}</td> <td>{{item.field}}&l ...

Creating a real-time email validation system that incorporates both syntax and domain checking

I recently came across a helpful example on that guided me in implementing basic validation for emails and usernames. This led me to another demonstration where ajax was used to call a live email checker PHP script. However, I've hit a roadblock whe ...

Can Angular retrieve the inner HTML content of an element?

Check out this demo . In this demonstration, I have created a list of "names" and I'm trying to retrieve the text of the selected element. However, whenever I click on an item from the list, I consistently get the same "innerHTML" for all users. Is ...

Float and tap

Can someone assist me with my code? I have 4 identical divs like this one, and when I hover over a link, all the elements receive the same code. <div class="Person-team"> <div class="profile-pic-d"> <a cl ...

Tips for maintaining the data on a page continuously updating in AngularJS

I have this code snippet: $cookieStore.put('profileData', $scope.profileData); var profileData = $cookieStore.get('profileData'); $scope.init = function(){ var profileData = $cookieStore.get('pr ...

How can the node version be set globally in NVM?

While utilizing Node Version Manager, setting the node version to the latest one in the current directory can be done using nvm use node. But how can you specify a specific version to use? ...

Determining the type of <this> in an Object extension method using TypeScript

I am attempting to incorporate a functionality similar to the let scope function found in Kotlin into TypeScript. My current strategy involves using declaration merging with the Object interface. While this approach generally works, I find myself missing ...

In my Google Sheets script, I am looking to introduce a 2-minute delay within the forEach loop for sending emails. Unfortunately, I have not had success

I am looking to introduce a 2-minute delay after each iteration of the loop. Specifically, I want to add a delay in the process of sending emails. You can find the complete code at this link. obj.forEach(function(row, rowIdx){ sleep(1200000); / ...

Encountering a VueJS error with Google Cloud Logging Winston: "TypeError: The 'original' argument must be a Function."

I have been attempting to integrate @google-cloud/logging-winston into my VueJS project by closely following the guidelines provided in both the npm package and Google Cloud docs. However, I am encountering an error message stating "TypeError: The &q ...

Confusion surrounding the concept of returning an arrow function from a Vuex storage getter

I delved into a Vuex course and the journey was smooth sailing until they introduced an arrow function in a getter, then utilized it in a computed property and action. Behold the code: item structure: const _products = [ { id: 1, title: "iPad 4 Mini", ...

Stripping the prefix from a string using the .split function leads to an error message stating "Unexpected

When dealing with a string containing a unique prefix, I attempted to separate it into an array after the backslash character "\" by using the split function. Here is the string: i:0#.w|itun\allepage_fg This was my approach: function claimOrder ...

Notify when the specified URL path is returned while the alert variable is present in jQuery

<script> $(document).ready(function(){ $("#search-jobs").click(function(e){ e.preventDefault(); searchTerm = $('#myInput').val(); location = $('#location').val(); alert(s ...

Error: The function $.get is not defined on line 1, character 3

Currently, I am developing a Google Chrome extension that customizes the YouTube main page. However, I have encountered an issue where I am unable to execute the $.get jQuery function. The error message displayed is Uncaught TypeError: $.get is not a funct ...

What could be causing React to render only the final two elements of my array?

I am currently working on a project where I need to display a series of images from a folder. While I have managed to accomplish this, I am facing an issue where the images are being displayed in rows of 2. Although the implementation is successful, it see ...