Utilizing Selenium along with the Phantom JS driver, I am attempting to load an HTML page and extract all of the HREF links from it. The issue arises when PhantomJS provides absolute URLs after resolving them entirely.
My specific requirement is to extract relative links in their original format without any modifications.
Despite my efforts to fetch hrefs by traversing the DOM, I consistently receive resolved URLs instead of the desired relative ones.
List<WebElement> list = driver.findElements(By.tagName("a"));
for (WebElement element:list) {
String link = element.getAttribute("href");`
}
For instance:
<a href="../index.html" ></a> with base url - http:docs.oracle.com/en/test.htm
Even though Phantom JS returns the resolved link as http:docs.oracle.com/index.htm, what I really need is the relative link displayed as follows: Relative link : "../index.htm"
Is there a method within Selenium + Phantom JS that can accomplish this task?
Thank you in advance. Neha