Downloading a file through a JavaScript link in HTMLUnit: A step-by-step guide

Looking to download a file with HTMLUnit from a javascript link is proving to be quite challenging. The journey begins at this page. When clicking on the "Authenticate with Java Web Start (new method)" link, a .jnlp file is downloaded, initiating a Java program window for authentication. Once authenticated, the original browser window loads up the desired information for scraping.

The source code snippet for the link on the starting page looks like this:

<tr>
<!-- onClick="return launchWebStart('authenticate');" -->
    <td><a href="javascript:void(0)" id="webstart-authenticate" ><font size="5">Authenticate with Java Web Start (new method)</font></a>
</tr>

The essential javascript file needed for this process can be found here. It essentially encodes a cookie, appends it to a URL, and requests the jnlp file. Emulating this process directly goes against the advice provided in the HTMLUnit documentation, which recommends interacting with the page elements as a user would.

The issue faced in HTMLUnit arises after clicking on the anchor element; the expected jnlp file is not received. Various attempts have been made, such as:

HtmlUnit and JavaScript in links and HtmlUnit to invoke javascript from href to download a file

A suggested code implementation that was tried out is detailed below:

// Relevant imports here...

public class Test {

    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

        // Open the starting webpage
        HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");

        String linkID = "webstart-authenticate";
        
        HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
        
        Page p = anchor.click();
        
        InputStream is = p.getWebResponse().getContentAsStream();
        int b = 0;
        while ((b = is.read()) != -1) {
            System.out.print((char)b);
        }
        webClient.close();
    }
}

However, running this code results in printing out the html content of the initial webpage instead of the anticipated jnlp file. Furthermore, status updates from the javascript WebConsole are also displayed, indicating some activity related to the javascript functions within the separate WebStart.js file.

An alternative approach using a CollectingAttachmentHandler object as outlined here was attempted as well:

// Relevant imports here...

public class Test2 {

    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

        // Open the starting webpage
        HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");

        String linkID = "webstart-authenticate";
        
        HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);
        
        CollectingAttachmentHandler attachmentHandler = new CollectingAttachmentHandler();
        webClient.setAttachmentHandler(attachmentHandler);
        attachmentHandler.handleAttachment(anchor.click());
        List<Attachment> attachments = attachmentHandler.getCollectedAttachments();

        int i = 0;
        while (i < attachments.size()) {
            Attachment attachment = attachments.get(i);
            Page attachedPage = attachment.getPage();
            WebResponse attachmentResponse = attachedPage.getWebResponse();
            String content = attachmentResponse.getContentAsString();
            System.out.println(content);
            i++;
        }
        webClient.close();
    }
}

Similar to the first attempt, this code also ends up displaying the content of the initial webpage rather than fetching the desired file. With no success achieved so far, seeking guidance or suggestions on how to overcome this obstacle becomes crucial.

Answer №1

Here is a revised version of your Test2 code snippet:

    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

    // Open the initial webpage
    HtmlPage page = webClient.getPage("https://ppair.uspto.gov/TruePassWebStart/AuthenticationChooser.html");

    // Identify the ID of the element where the link is located
    String linkID = "webstart-authenticate";

    // Find the appropriate anchor
    HtmlAnchor anchor = (HtmlAnchor) page.getElementById(linkID);

    CountDownLatch latch = new CountDownLatch(1);
    webClient.setWebStartHandler(new WebStartHandler(){

        @Override
        public void handleJnlpResponse(WebResponse webResponse)
        {
            System.out.println("Downloading...");
            try (FileOutputStream fos = new FileOutputStream("/Users/Franklyn/Downloads/uspto-auth.authenticate2.jnlp"))
            {
                IOUtils.copy(webResponse.getContentAsStream(),fos);
            } catch (IOException e)
            {
                throw new RuntimeException(e);
            }
            System.out.println("Downloaded");
            latch.countDown();
        }
    });

    anchor.click();
    latch.await(); // Wait for the download to finish

    webClient.close();

Why isn't your Test2 working? The issue lies in the response Content-Type of the downloaded file being application/x-java-jnlp-file. In order to handle this kind of response, you need to utilize the WebStartHandler. If the response headers include a header named 'Content-Disposition' and its value starts with 'attachment', then your Test2 might work correctly.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

When utilizing DNS records (SRV and TXT) in Spring context, encountering issues with instantiating Mongo related beans

While working on a Java Maven project using Java 8 and Spring Data MongoDB Reactive, I encountered an issue with Mongo beans instantiation. Now, I am in the process of upgrading the project from Java 8 to Java 11 (jvm 11.0.8+10-LTS). The updated setup incl ...

How can I convert the date format from ngbDatepicker to a string in the onSubmit() function of a form

I'm facing an issue with converting the date format from ngbDatepicker to a string before sending the data to my backend API. The API only accepts dates in string format, so I attempted to convert it using submittedData.MaturityDate.toString(); and su ...

Issue with React router not functioning correctly when dealing with dynamically created anchor tags utilizing jquery

As a newcomer to React.js, I have incorporated jQuery into my project to handle mouse and click events for cloning navigation elements, specifically <li> tags within <a> tags. The cloned elements are displayed in a targeted div ID successfully. ...

Transform Ajax response into dropdown menu option

I have made an ajax call and received HTML as a response. Now, I need to convert this output into options and add them to select tags on my webpage. <div class="views-element-container"> <div class="view view-contact-view-id-conta ...

Issue with data interpretation between jQuery and Java

My character encoding is currently set to ISO-8859-1. I'm utilizing an AJAX call with jQuery.ajax to communicate with a servlet. After serialization by jQuery, the URL appears as follows: https://myurl.com/countryAndProvinceCodeServlet?action=getPro ...

Vue.js: The chart's dataset has been refreshed

I am utilizing vue-chart.js to create a basic chart. import { Line } from 'vue-chartjs'; export default { extends: Line, mounted() { this.renderChart({ labels: [this.getChartLabels], datasets: [ { label: &a ...

Rendering in Three JS involves efficiently utilizing one buffer to display the output within itself

I have been struggling with a particular issue and I really need some assistance: In my three js context, I have created a custom material and rendered it into a texture. ` /* Rendering in texture */ fbo_renderer_scene = new THREE.Scene(); fbo_r ...

Tips for making sure a header is consistently at the top of every page during printing

I need help with my website - I have a table that is quite tall and spans across multiple pages when printing. Is there a way to make the header row appear on top of each page when printing? ...

Having trouble making the menu stay at the top of the page in IE7

Check out the demo here: http://jsfiddle.net/auMd5/ I'm looking to have the blue menu bar stay fixed to the top of the page as you scroll past it, and then return to its original position when scrolling back up. This functionality works in all brows ...

Retrieve an element within a jQuery each loop

I'm currently implementing AJAX functionality to retrieve cart items from the server and display them within a cart when a customer clicks on the "My Cart" button. Here is the model for the cart: public class Cart { [Key] public i ...

Transferring a JavaScript variable to PHP using Ajax within the same webpage

Check out my HTML and JavaScript code: <form id="form" action="javascript:void(0)"> <input type="submit" id="submit-reg" value="Register" class="submit button" onclick="showtemplate('anniversary')" style='font-family: georgia;font- ...

Warning: Node 125008 has reached the maximum number of listeners, indicating a potential memory leak in the EventEmitter

(node:125008) MaxListenersExceededWarning: There may be a memory leak with EventEmitter as 11 ready listeners have been added. Try using emitter.setMaxListeners() to raise the limit Can anyone provide guidance on how to increase the listener event count? ...

Error: Trying to use Router without providing a middleware function. Please make sure to pass a valid middleware function while using Router

While working on my express application with MongoJS, I encountered an issue where despite returning a function, it was showing that an object has been returned instead. To address this, I made sure to include module.exports=router in my JavaScript file. H ...

In JavaScript, a true statement does not trigger a redirect

<label>Username:</label> <input name="username" id="username" type="text" value="testuser"> <label>Password:</label> <input name="password" id="password" type="password" value="test123"> <input value="Submit" name="su ...

Setting the foreign key value in the child table to 0 in a Hibernate one-to-many relationship

Welcome everyone, I am currently creating a one-to-many relationship between two tables: Emp and Project. In this setup, one Emp can have multiple Projects associated with it. Below are the bean classes for these entities. public class Emp { public ...

Maintaining Flexbox layout without triggering item re-rendering for a new container

This is the unique layout I'm aiming to create: I am facing a challenging flexbox layout that needs to be implemented. One of the items in this layout is a Webgl player, which cannot be conditionally rendered due to the restarting issue it may cause. ...

Tips for designing a unique style attribute for your Vue.js component

My goal is to utilize Vue.js without the need for a build step, but I've encountered an issue with its lack of a style property. To tackle this problem, I came up with the idea of creating a custom "style" property on my Vue component instance and dy ...

Utilizing ion-slide-box within an ion-content container that allows for scrolling

I've created an Ionic view with the following structure: <ion-content scroll="true"> <ion-list> ... some ion items... <ion-item> <ion-slide-box> <ion-slide ng-repeat="image i ...

Exploring the functionality of $.param in jQuery

After extensive research online, I found that the most helpful information was on the official jQuery site. Here is the code snippet I am currently using: var param = { branch_id : branch_id}; var str = $.param(param); alert(str); However, when I log or ...

Ways to employ data binding for extracting a user-input value and performing multiplication operations with the enclosed {{ ...}} tags

My API response includes the price of a product, which is represented as {{price}} I have a system where I can add or reduce the number of products: <div class="number-input"> <h2>Price: {{price }}</h2> <button oncli ...