Downloading a file through a JavaScript link in HTMLUnit: A step-by-step guide

Looking to download a file with HTMLUnit from a javascript link is proving to be quite challenging. The journey begins at this page. When clicking on the "Authenticate with Java Web Start (new method)" link, a .jnlp file is downloaded, initiating a Java program window for authentication. Once authenticated, the original browser window loads up the desired information for scraping.

The source code snippet for the link on the starting page looks like this:

<tr>
<!-- onClick="return launchWebStart('authenticate');" -->
    <td><a href="javascript:void(0)" id="webstart-authenticate" ><font size="5">Authenticate with Java Web Start (new method)</font></a>
</tr>

The essential javascript file needed for this process can be found here. It essentially encodes a cookie, appends it to a URL, and requests the jnlp file. Emulating this process directly goes against the advice provided in the HTMLUnit documentation, which recommends interacting with the page elements as a user would.

The issue faced in HTMLUnit arises after clicking on the anchor element; the expected jnlp file is not received. Various attempts have been made, such as:

HtmlUnit and JavaScript in links and HtmlUnit to invoke javascript from href to download a file

A suggested code implementation that was tried out is detailed below:

// Relevant imports here...

public class Test {

    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

        // Open the starting webpage
        HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");

        String linkID = "webstart-authenticate";
        
        HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
        
        Page p = anchor.click();
        
        InputStream is = p.getWebResponse().getContentAsStream();
        int b = 0;
        while ((b = is.read()) != -1) {
            System.out.print((char)b);
        }
        webClient.close();
    }
}

However, running this code results in printing out the html content of the initial webpage instead of the anticipated jnlp file. Furthermore, status updates from the javascript WebConsole are also displayed, indicating some activity related to the javascript functions within the separate WebStart.js file.

An alternative approach using a CollectingAttachmentHandler object as outlined here was attempted as well:

// Relevant imports here...

public class Test2 {

    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

        // Open the starting webpage
        HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");

        String linkID = "webstart-authenticate";
        
        HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
        
        CollectingAttachmentHandler attachmentHandler = new CollectingAttachmentHandler();
        webClient.setAttachmentHandler(attachmentHandler);
        attachmentHandler.handleAttachment(anchor.click());
        List<Attachment> attachments = attachmentHandler.getCollectedAttachments();

        int i = 0;
        while (i < attachments.size()) {
            Attachment attachment = attachments.get(i);
            Page attachedPage = attachment.getPage();
            WebResponse attachmentResponse = attachedPage.getWebResponse();
            String content = attachmentResponse.getContentAsString();
            System.out.println(content);
            i++;
        }
        webClient.close();
    }
}

Similar to the first attempt, this code also ends up displaying the content of the initial webpage rather than fetching the desired file. With no success achieved so far, seeking guidance or suggestions on how to overcome this obstacle becomes crucial.

Answer №1

Here is a revised version of your Test2 code snippet:

    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

    // Open the initial webpage
    HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");

    // Identify the ID of the element where the link is located
    String linkID = "webstart-authenticate";

    // Find the appropriate anchor
    HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);

    CountDownLatch latch = new CountDownLatch(1);
    webClient.setWebStartHandler(new WebStartHandler(){

        @Override
        public void handleJnlpResponse(WebResponse webResponse)
        {
            System.out.println("Downloading...");
            try (FileOutputStream fos = new FileOutputStream("/Users/Franklyn/Downloads/uspto-auth.authenticate2.jnlp"))
            {
                IOUtils.copy(webResponse.getContentAsStream(),fos);
            } catch (IOException e)
            {
                throw new RuntimeException(e);
            }
            System.out.println("Downloaded");
            latch.countDown();
        }
    });

    anchor.click();
    latch.await(); // Wait for the download to finish

    webClient.close();

Why isn't your Test2 working? The issue lies in the response Content-Type of the downloaded file being application/x-java-jnlp-file. In order to handle this kind of response, you need to utilize the WebStartHandler. If the response headers include a header named 'Content-Disposition' and its value starts with 'attachment', then your Test2 might work correctly.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Pop-up confirmation dialog in JQuery following an AJAX request

In order to validate on the server side whether a person with a specific registration number already exists in the database, I have implemented a process. If the person is found registered, the program flow continues as usual. However, if the number is not ...

Adding a simulated $state object to an angular unit test

I'm facing some challenges with Angular unit testing as I am not very proficient in it. Specifically, I am struggling to set up a simple unit test. Here is my Class: class CampaignController { constructor($state) { this.$state = $state; ...

Tips for displaying HTML content in an AJAX success alert message with ASP.NET MVC and jQuery

I have an action result that sends content in the following format: public ActionResult MyAction() { string mystring = //doing something return Content(mystring , "html"); } Client Side: $.ajax({ url: "/MyController ...

Converting JSON to JavaScript Date using UTC time

I am struggling with formatting the dates in a JSON object to work with highcharts. The JSON looks like this: [ [ "26-Sep-14", 10 ], [ "29-Sep-14", 75 ] ] Highcharts requires dates to be in the format Date. ...

I need help figuring out the right way to define the scope for ng-model within a directive

I found a straightforward directive to automate sliders: app.directive('slider', function() { return { restrict: 'AE', link: function(scope, element, attrs) { element.slider({ value: scop ...

After executing webpack, it has been noticed that one of the dist files consistently ends up empty

As someone who is new to webpack, I have successfully used the quick start guide to process a simple JS file from src to dist. Everything was working fine. However, I encountered an issue when trying to process more than one JS file. The original tutorial ...

Neglecting to review the CSS - embracing ejs layouts in Express

I am encountering an issue with the express ejs layouts where only the mainPage is able to read the CSS, while the other pages are unable to do so (even though the HTML reads it). Additionally, if I want to use another layout such as "layout2.ejs", what s ...

Using jQuery to replace an HTML element multiple times

Seeking assistance for implementing functionality that replaces a button with an input field, where users can enter information and hit enter. Once the action is completed, the original button should reappear. The current script works effectively but lacks ...

Trouble with clicking buttons in Java and Selenium web scraping

Hey everyone, Currently, I'm working on a Java Selenium script that automates clicking and filling forms for me. I've written a line of code that's causing me some trouble. The intention is to click on a button, but it's not happening ...

Determining the nearest upcoming date from a JSON dataset

Looking to find the nearest date to today from the array "dates". For example, if today is 2011-09-10 -> the next closest date from the JSON file is "2012-12-20" -> $('div').append('date1: ' + dates.date1); For example 2, if today is ...

Asynchronous function nested within a loop

Hello there! I am currently working on converting a SQLite database to NeDb using the following code snippet: const sqliteJSON = require('sqlite-json'); const Datastore = require('nedb') const exporter = sqliteJSON('etecsa.db&apo ...

JQuery user interface dialog button

When using a jQuery UI dialog, I want to add tooltips to buttons. In my current code: buttons: { 'My Button' : function(e) { $(e.target).mouseover(function() { alert('test'); }); } This code triggers an action only a ...

Creating dynamic transformations and animations for characters and words within a paragraph in 3D

Looking to add animation effects to specific parts of a paragraph, but transforming the entire box instead. Remembering seeing a solution on StackOverflow before, now regretting not saving it. Spent over an hour searching for a similar answer without succ ...

Transform audio file into a base64 encoding

I am currently developing a Phonegap application for a friend that will allow users to record audio on their phone, save it to the browser's local storage, and then upload it at a later time. As far as I know, local storage does not support storing b ...

Retrieve JSON data within React without the need for importing it

Recently, I've been incorporating data from a JSON file into a React component in a unique way: import data from '../../public/json/data.json'; Using the innovative create-react-app tool, upon running npm run build, the expected behavior o ...

Best practices for assigning values to model field in EXT JS are as follows:

I'm facing an issue with a certain model structure: Ext.define('my.workspace.Area', { extend: 'Ext.data.Model', idProperty: 'id', fields: [ {name: 'id', type: 'string'}, {n ...

I am facing an issue where the table in my Laravel Vue component is not displaying the data from

Recently, I've been diligently following an instructional series on VUE applications by a highly recommended YouTuber. Every step was meticulously executed until I hit a roadblock out of nowhere. The data from my database refuses to display on the fro ...

Can you explain the significance of the res.render callback parameter in Express 4.0 for Node.js?

Can you explain the role of the res.render callback argument? When would it be necessary to use this callback argument, especially when there is already a template specified as the first argument? The following code snippet is taken from the official doc ...

Tips for checking the type radio button input with Angular.js

I want to implement validation for a radio button field using Angular.js. Below is the code snippet I am working with: <form name="myForm" enctype="multipart/form-data" novalidate> <div> <input type="radio" ng-model="new" value="true" ng- ...

Locate a specific class inside a div and switch the CSS style to hide one element and reveal another

I have two divs, each containing a span. By default, the display of each span class is set to none. My goal is to toggle the display property of the span within the clicked div. If the span is already visible, I want to hide it; if it's hidden, I want ...