Retrieve an Excel file using Selenium through a URL, but only obtain JavaScript code instead

Question

Retrieve an Excel file using Selenium through a URL, but only obtain JavaScript code instead

I am attempting to download an Excel file using its URL, but all I receive is JavaScript code. I'm unsure of how to retrieve the actual file instead of just the JS code.

Here is my current code:

# -*- coding: utf-8 -*-

from selenium import webdriver
import io
import re

path = 'C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe'
download_url = "http://samr.cfda.gov.cn/directory/web/WS01/images/localgov/gov_1540501658076.xls"  #URL provided by me

chrome_options = webdriver.ChromeOptions()
#chrome_options.add_argument('--headless')  # headless mode to disable GUI for Chrome
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')

prefs = {'profile.default_content_settings.popups': 0, 'download.default_directory': 'd:\\new'}
chrome_options.add_experimental_option('prefs', prefs)

client = webdriver.Chrome(path, chrome_options=chrome_options)

try:
    client.get(download_url)
except TimeoutError:
    print("Time took too long")

print(client.page_source)
client.quit()

Any assistance would be greatly appreciated.

javascript selenium url web-scraping download

Answer 1

Answer №1

Even though the printed output remains constant, introducing a brief delay to ensure the file downloads successfully.

# -*- coding: utf-8 -*-

from selenium import webdriver
import time

path = 'C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe'
download_url ="http://samr.cfda.gov.cn/directory/web/WS01/images/localgov/gov_1540501658076.xls"  #url i have 

chrome_options = webdriver.ChromeOptions()
#chrome_options.add_argument('--headless')  #headless mode 
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')

prefs = {'profile.default_content_settings.popups': 0, 'download.default_directory': 'd:\\new'}
chrome_options.add_experimental_option('prefs', prefs)

client = webdriver.Chrome(path,chrome_options=chrome_options)

try:
    client.get(download_url)
    time.sleep(5)
except TimeoutError:
    print("time too long")

print(client.page_source)
client.quit()

Answer 2

Even though the printed output remains constant, introducing a brief delay to ensure the file downloads successfully.

# -*- coding: utf-8 -*-

from selenium import webdriver
import time

path = 'C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe'
download_url ="http://samr.cfda.gov.cn/directory/web/WS01/images/localgov/gov_1540501658076.xls"  #url i have 

chrome_options = webdriver.ChromeOptions()
#chrome_options.add_argument('--headless')  #headless mode 
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')

prefs = {'profile.default_content_settings.popups': 0, 'download.default_directory': 'd:\\new'}
chrome_options.add_experimental_option('prefs', prefs)

client = webdriver.Chrome(path,chrome_options=chrome_options)

try:
    client.get(download_url)
    time.sleep(5)
except TimeoutError:
    print("time too long")

print(client.page_source)
client.quit()

Retrieve an Excel file using Selenium through a URL, but only obtain JavaScript code instead

Answer №1

Similar questions

Techniques for accessing the most recent input values within a loop

Can Regex expressions be utilized within the nodeJS aws sdk?

Reverse changes made to a massive object and then redo them

What is the reason behind encountering these errors while trying to run my Gatsby development server?

The connection between ng-model and ng-repeat, and the comprehension of $scope

What is the best way to enhance an object using a class in ES6?

The present IP address of the client through AJAX and PHP

Link clicking does not trigger URL routing properly

Efficiency levels of reach = within angular instructions

Concealing items by placing them strategically between the camera and certain objects in Three.js

Repetitive attempts have led to the cancellation of the AJAX and PHP petition statuses

Obtain the image URL using Selenium and Chrome WebDriver

Why does Res.send return an empty object when console.log indicates it is not empty?

angularjs: hide div based on text entered in textfield

Using an AngularJS array with ng-repeat

Having issues with using the class selector in SVG.select() method of the svg.js library when working with TypeScript

JavaScript's Automated Retail Machine Feature

Struggling to properly render JSON data

Sharing state between components in NextJS involves using techniques like Context API, passing

Add motion to the div element when hovering and moving the mouse away