What is the best way to extract the singular PDF link from a webpage?

Currently, I am attempting to utilize Selenium in Java to access DOM elements. However, I have encountered an issue while testing the code:

Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: stale element reference: element is not attached to the page document

Being a novice in this area, the code I am using for retrieving the DOM element is as follows:

 driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link");
String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href");

I suspect that the error might be due to the XPath being unable to locate the specified element, even though it does exist. Any assistance on this matter would be greatly appreciated.

Thank you.

Answer №1

  • The href attribute contains a URL for the PDF, but it opens within the webpage.

  • To extract the PDF URL from the href attribute and get the PDF name, I concatenated it with the

    https://www.qp.alberta.ca/documents/Acts/
    URL.

Below is the code to retrieve the PDF URL:

Code to Retrieve PDF URL:

    driver = new ChromeDriver();
    /*The URL below is hardcoded, you need to parameterize it as per your requirements.*/
    driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link");
    String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href");
    System.out.println("Page PDF URL: " + pagePdfUrl);
    String pdfName = StringUtils.substringBetween(pagePdfUrl, "page=", ".cfm&");
    driver.get("https://www.qp.alberta.ca/documents/Acts/" + pdfName + ".pdf");

Code to Download PDF:

Required ChromOptions:

   ChromeOptions options = new ChromeOptions();
   HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>();
       chromeOptionsMap.put("plugins.plugins_disabled", new String[] { "Chrome PDF Viewer" });
       chromeOptionsMap.put("plugins.always_open_pdf_externally", true);
       chromeOptionsMap.put("download.default_directory", "C:\\Users\\Downloads\\test\\");
       options.setExperimentalOption("prefs", chromeOptionsMap);
       options.addArguments("--headless");  
  

Accessing PDF:

    driver = new ChromeDriver(options);
    driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link");
    String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href");
    System.out.println("Page PDF URL: " + pagePdfUrl);
    String pdfName = StringUtils.substringBetween(pagePdfUrl, "page=", ".cfm&");
    System.out.println("Only PDF URL: "+"https://www.qp.alberta.ca/documents/Acts/" + pdfName + ".pdf");
    driver.get("https://www.qp.alberta.ca/documents/Acts/" + pdfName + ".pdf");

Output:

Page PDF URL: https://www.qp.alberta.ca/1266.cfm?page=2017ch18_unpr.cfm&leg_type=Acts&isbncln=9780779808571
Only PDF URL: https://www.qp.alberta.ca/documents/Acts/2017ch18_unpr.pdf

Import for StringUtils:

import org.apache.commons.lang3.StringUtils;

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Switching from module.exports in Javascript to Typescript format

My Node-express code currently uses module.exports to export functions. As I am converting the code to TypeScript, I need to find out how to replace module.exports in typescript. Can you help me with this? ...

Unable to invoke setState (or forceUpdate) on a component that has been unmounted

After updating the component, I encountered an issue with fetching data from the server. It seems that componentWillUnmount is not helpful in my case since I don't need to destroy the component. Does anyone have a solution for this? And when should I ...

Is there a way to duplicate items similar to MS Word by using a combination of ctrl + mouse click +

In my fabricjs application, I currently clone an object by clicking ctrl + left mouse click on it, which works fine. However, I would like to be able to clone the object in a similar way to MS WORD, by using ctrl + click + drag. Has anyone achieved this f ...

Show picture in web browser without the file extension

Is there a way to display an image in the browser without the file extension, similar to how Google and Unsplash do it? For example: Or like this: ...

A Simple Java Method to Locate Elements

I am curious about how to convert these 2 generic methods for Selenium in C# into a Java version, as I do not have any experience with Java: public static IWebElement ConvertMethodForJava(Func<IWebDriver, IWebElement> expectedCondtions, int timeou ...

Emphasize the close button within the popup window as soon as it appears

One of my coding challenges involves a div element, shown below: <div id="modal" tabindex="-1" ng-show="booleanvariable"></div> When the value of ng-show is true, this div is displayed. A "close" button located under the div should be focused ...

Problem encountered: Element not found during web scraping

Seeking assistance with web scraping using Selenium, encountering an error that states "no such element." What does this mean exactly? NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"a[1]/div[1]/img ...

Code snippet for a click event in JavaScript or jQuery

I initially wrote the code in JavaScript, but if someone has a better solution in jQuery, I'm open to it. Here's the scenario: I have multiple questions with corresponding answers. I want to be able to click on a question and have its answer dis ...

Loop through JSON array within an angular controller

I am currently attempting to iterate through a JSON array and display the values on the frontend of my application. I have provided my code, but I'm having trouble retrieving specific values (startDate, endDate) from within the array and displaying th ...

Warning: The use of executable_path is no longer recommended. Please instantiate a Service object instead. How can I proceed?

from bs4 import BeautifulSoup import requests from selenium import webdriver from selenium.webdriver.firefox.firefox_binary import FirefoxBinary from selenium.webdriver.common.desired_capabilities import DesiredCapabilities import time from selenium.webdri ...

Is there a way to retrieve data from a sealed JSON object using JavaScript?

The data is being fetched from the API and here is the response object: { "abc": [{ "xyz": "INFO 1", "pqr": "INFO 2" }, { "xyz": "INFO 3", "pqr": "INFO 4" } ] } We are lookin ...

Selenium Webdriver in Python launches a Firefox window successfully, however, it is failing to navigate

Operating within a proxy network presents challenges. Within my Python script, I have configured the following: PROXY_HOST = "10.3.100.212" PROXY_PORT = 8080 fp = webdriver.FirefoxProfile() fp.set_preference("network.proxy.type", 1) fp.set_preference( ...

Exploring the utilization of type (specifically typescript type) within the @ApiProperty type in Swagger

Currently, I am grappling with a dilemma. In my API documentation, I need to include a 'type' in an @ApiProperty for Swagger. Unfortunately, Swagger seems to be rejecting it and no matter how many websites I scour for solutions, I come up empty-h ...

Crafting personalized objects from an array

In the process of creating an object from an array, I am faced with a dilemma. The elements in the array are as follows: var arr = [ 'find({ qty: { $lt: 20 } } )', 'limit(5)', 'skip(0)' ] Despite my efforts, my code is ...

Using placeholders with inputs in an Angular2 table generated by ngFor

I have an array public example = [['hallo', 'fruit', 'rose'], ['apple','book']] Currently, I am working on creating a table of inputs. The values in this table depend on the specific part that I am usin ...

How to extract the text from a WYSIWYG text area inside an iframe using Selenium WebDriver

How can I extract text entered in a WYSIWYG iframe text area? // Entering text for the Message field ContactUs_Page.txt_keyInMessage().sendKeys(ColMessage); ContactUs_Page.java public static WebElement txt_keyInMessage() throws Exception{ try{ ...

The key to creating efficient routers in Express: Don't Repeat Yourself!

Currently, I am in the process of developing a web application in the form of a network structure that involves basic CRUD operations. However, I am facing the issue of having overly large router files, prompting me to consider splitting them up. One of t ...

JQuery Submission with Multiple Forms

Hey everyone! I have a jQuery form with multiple fieldsets that switch between each other using jQuery. Eventually, it leads to a submit button. Can someone assist me by editing my jfiddle or providing code on how I can submit this data using JavaScript, j ...

Incorporating text sections into a div container and adjusting the width

Currently facing an issue with the canvas element on my project. <div id="app-container"> <div id="canvas-container"> <div id="canvas"></div> </div> </div> In the CSS stylesheet, the following styles ar ...

Having difficulties accessing the git repository through the application

I am currently working on a Node.js application that needs to connect to a Git repository via the app. The connection works fine locally, and it also runs smoothly when I docker build and run it within a container on my local machine. However, upon deplo ...