I have a task of extracting data from a website on a weekly basis. The data is only visible after clicking on the page (triggering a Javascript function). It is loaded into a table which can be identified by its unique ID. This script needs to be executed on a server without browser support. Below is my code snippet using Geb:
@Grab("org.gebish:geb-core:0.13.1")
@Grab("org.seleniumhq.selenium:selenium-firefox-driver:2.52.0")
@Grab("org.seleniumhq.selenium:selenium-support:2.52.0")
@GrabExclude('org.codehaus.groovy:groovy-all')
import geb.Browser
Browser.drive{
// driver.webClient.javaScriptEnabled = true
go "mysite"
js.loadWeekData()
println $("div.data-listing").text()
}
I've extensively researched this topic but couldn't find any solution for headless scraping with Javascript support. The record below is from Selenium IDE:
driver.findElement(By.linkText("Next")).click();
Unfortunately, I faced difficulties trying to integrate PhantomJS with Geb.
Edit 1 Below is the error message from PhantomJS: java.lang.NoClassDefFoundError: org/openqa/selenium/browserlaunchers/Proxies Despite attempting to address version compatibility issues, I was unable to resolve it.
@Grab("org.gebish:geb-core:0.13.1")
@Grab("org.seleniumhq.selenium:selenium-firefox-driver:2.52.0")
@Grab("org.seleniumhq.selenium:selenium-support:2.52.0")
@Grab("com.codeborne:phantomjsdriver:1.3.0")
WebDriver driver = new PhantomJSDriver();
// Load Google.com
driver.get("http://www.google.com");
// Locate the Search field on the Google page
WebElement element = driver.findElement(By.name("q"));
In summary, I am seeking a way to execute the first script in headless mode (if feasible without Xvfb installation). An ideal solution would involve Groovy or Java programming languages.