Downloading files using headless Javascript with Selenium

Attempting to retrieve files from under headless conditions has proven challenging. Despite having a free account, the website seems to employ a series of javascript forms and redirections that complicate the process. In Firefox, I can extract the download URL using the element inspector and convert it into cURL to initiate the download on a headless machine. However, all my attempts to directly download the file on the headless machine have been unsuccessful so far.

I have successfully logged in with the following script:

#!/usr/bin/env python3

username="<my username>"
password="<my password>"

import requests
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities.PHANTOMJS
caps["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0"
driver = webdriver.PhantomJS("/usr/local/bin/phantomjs")
driver.set_window_size(1120, 550)
driver.get("http://www.oracle.com/technetwork/server-storage/developerstudio/downloads/index.html")
print("loaded")
driver.find_element_by_name("agreement").click()
print("clicked agreement")
driver.find_element_by_partial_link_text("RPM installer").click()
print("clicked link")
driver.find_element_by_id("sso_username").send_keys(username)
driver.find_element_by_id("ssopassword").send_keys(password)
driver.find_element_by_xpath("//input[contains(@title,'Please click here to sign in')]").click()
print("submitted")

print(driver.get_cookies())

print(driver.current_url)
print(driver.page_source)
driver.quit()

While the login appears successful as indicated by the cookies containing data related to my username, submitting the form in Firefox triggers the download after several redirections. However, in this instance, the page_source and current_url still reflect the login page, with no download initiation.

It's possible that the website is actively preventing such actions or that I may be missing something crucial. Any suggestions on how to proceed with downloading the file?

Answer №1

Thanks to TheChetan's input, I managed to get everything up and running smoothly. Instead of taking the javascript-blob path as suggested by Tarun Lalwani in , I opted for the requests method. It did take me some time to figure out that tweaking the user agent in the request was necessary. Here is the solution that eventually worked for me:

// Your Python code will be pasted here

<!-- Placeholder for original python code -->

In conclusion, the approach involved using both selenium and phantomjs for logging in, followed by utilizing the cookies for a regular request.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Include numerous values within a hyperlink

Is there a way to pass three values through a link without displaying them in the URL? ...

Tips for defining and executing Java methods in JMeter with the WebDriver Sampler

I am looking to implement a Java method that generates random strings. How can I define and call this method in a JMeter WebDriver sampler so that instead of manually providing a string in sendKeys(), I can generate a random string dynamically? WDS.browse ...

I encountered a 404 Error message while attempting to access a specific express route

My Angular CLI generated app is running with an Express.js and MongoDB server. After running npm start, I can access http://localhost:3000, which routes to my homepage. The other links on the navbar work well to control routes, e.g., http://localhost:3000/ ...

What is the correct way to utilize "data:" in a jQuery AJAX call?

There seems to be an issue with my code within the deletePost function. The problem lies in the fact that $_GET['title'] is empty. Although I set the title value in the ajax using postTitle: $(this).siblings("h3.blog").text(), it doesn't see ...

The `XMLHttpRequest.prototype.open` function does not capture every single HTTP request visible in the chrome-dev-tools

While utilizing a third-party embedded code that initiates HTTP requests with a request header origin different from my own, I encountered an issue. Despite attempting to intercept these HTTP requests using XMLHttpRequest, they do not get intercepted. This ...

The Challenge of Page Refresh in Express and Node.js

I am new to web development and servers, having only taken one course in university. I am facing a strange issue with a GET request where it stops being sent after multiple refreshes. Here is the output from npm start when it is working: GET / 304 0.350 m ...

What is the best way to automatically redirect to a different page once a video has finished playing after 10 seconds?

I'm working on creating an HTML page that will automatically redirect after 10 seconds once a video has finished playing. The issue I'm facing is that the page is currently redirecting as soon as it loads, rather than waiting for the video to fin ...

Issues encountered during the deployment process on Vercel

I've encountered an issue while deploying my Next.js app on Vercel. It seems to be related to an unsupported platform error. I have attached an image displaying the exact error message. npm ERR! code EBADPLATFORM npm ERR! notsup Unsupported platform f ...

Which would be more advantageous: using a single setter method or multiple setter methods for objects that have a set number of fields?

As I ponder over designing a class with a member variable of type object containing a fixed number of fields, the question arises: should I opt for a single setter function or multiple setters to modify these fields? To illustrate this dilemma clearly, I ...

Using the react-form library to create nested components with react.cloneElement

There is an issue that needs fixing with the library called react-form. Here is the error message I am currently facing: Uncaught Error: Element type is invalid: expected a string (for built-in components) or a class/function (for composite components) ...

Optimal approach for organizing a mysql+nodejs+express application

In my development process, I typically utilize mysql without sequelize. To establish the database connection, I usually create a module.export function that can be required in other files. Here's an example: var db; module.exports={ getConnection = f ...

What is the process of retrieving data from a Nextjs API route during the build and deployment stages?

I'm currently facing an issue while trying to deploy my nextjs web app on vercel. Upon deployment, I encounter the following error: > Build error occurred FetchError: request to http://localhost:3000/api/products failed, reason: connect ECONNREFUS ...

What are some alternative ways to redirect multiple pages in a Navbar component in React Js without encountering the 'useNavigate()' error?

How can I resolve this error while working with react js? Error: useNavigate() is only allowed within a <Router> component. ▶ 3 stack frames were collapsed. Navbar C:/Users/dell/OneDrive/Desktop/ReactJs/react-learning/src/components/Navbar.js:9 ...

Issue with AngularJS: Controller unable to access property of ng-model object

I am looking to create a reusable controller that can be used by multiple views. This controller will essentially serve as a template. The issue I'm facing is with setting up simple validation. The problem lies in the fact that the properties set in ...

Pre-loading custom fonts in Next.js for a smoother user experience

I am currently utilizing next.js. My objective is to ensure that the fonts are loaded before any content is displayed on the screen. I attempted to achieve this by including them in the Head component within the _.document file using the rel="prelo ...

The CssDependency dependency type in vue-cli@3 does not have any module factory available

After using vue-cli@3 to npm run build The system showed an error message: No module factory available for dependency type: CssDependency I have extensively searched for related solutions, but they all pertain to Angular. I also attempted the following ...

There is an error in ReactJS: TypeError - _this.props.match is not defined

I am experiencing a TypeError in my console tab and I can't seem to figure out where the error is occurring in my source code. I am relatively new to ReactJS so any help in identifying what I'm doing wrong would be greatly appreciated. Thank you ...

Transferring user information from Node.js server to Angular upon successful login

Attempting to login my user through Facebook using PassportJS and passing the user data to Angular has been a challenge. On the server side, everything seems fine with the code for the Facebook callback in the users controller: exports.facebookCallback = ...

Utilizing Django's URL dispatcher dynamically in JavaScript

Im attempting to implement a dynamic url dispatcher as follows: "{% url 'url_name' 'variable' %}" where variable is generated dynamically in my javascript. My goal is to redirect to another page when the value of a <selec ...

What advantages does var offer over let in JavaScript?

Since the introduction of the new keyword let for variable declaration in JavaScript ES6, I find it difficult to come up with valid reasons to continue using var. Personally, I have been using let exclusively and haven't encountered any drawbacks so f ...