I am attempting to scrape this particular page that is coded in ASP.NET and contains 7 dynamic combo drop down boxes using PhantomJS v1.9.8.
The JavaScript code I am using is as follows:
var page = require('webpage').create();
console.log('Current user agent: ' + page.settings.userAgent);
page.settings.userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';
page.open('http://www.etcfinance.com.hk/online_appraise.aspx', function(status) {
page.injectJs("https://code.jquery.com/jquery-latest.js", function() {
page.evaluate(function() {
$("#ddlArea").val('香港');
__doPostBack('ddlArea', '');
setTimeout(function() {
console.log('Zone: ' + $('#ddlZone').val());
}, 1000);
});
phantom.exit();
});
});
The output stalls at :
Current user agent: Mozilla/5.0 (Macintosh; PPC Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.8 Safari/534.34
and does not proceed further. How can I effectively select desired values for all of those combo dropdown boxes?
The relevant part of the HTML structure looks like this:
<table width="460" bgcolor="#E0F3FF" border="0" cellpadding="3" cellspacing="0" class="content">
<tbody><tr height="20"><td></td></tr>
<tr class="insidecontent">
<td style="Padding-Left:20px;Padding-Right:20px;">
<div align="left"> Region : </div>
</td>
<td valign="top">
<select name="ddlArea" onchange="javascript:setTimeout('__doPostBack(\'ddlArea\',\'\')', 0)" id="ddlArea" class="textbox" style="width:29em">
<option selected="selected" value="">Select Region</option>
<option value="Hong Kong">Hong Kong</option>
<option value="Kowloon">Kowloon</option>
<option value="New Territories/Outlying Islands">New Territories/Outlying Islands</option>
</select>
</td>
</tr>
... (remaining HTML structure) ...
Note: The above HTML code may contain errors.
Additionally, the reason for opting to use page.injectJS
over page.includeJS
is due to the latter triggering the following error:
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://parse.js. Domains, protocols, and ports must match.