Currently, I am facing a dilemma: my desire is to extract information from a webpage (for example, this one) regarding the apps that are available, and then store this data into a database.
In my quest to achieve this task, I have opted to use crawler4j to navigate through each accessible page. However, it seems that crawler4j requires links present in the source code in order to progress.
Unfortunately, the issue arises when the links are dynamically generated by JavaScript code, which means that crawler4j fails to discover new links to explore or pages to crawl.
To address this obstacle, I am considering utilizing Selenium so that I can interact with various elements on the webpage as if I were using a real web browser like Chrome or Firefox (although I'm still learning how to do this).
Yet, despite my efforts, I am unsure of how to retrieve the "generated" HTML instead of just the basic source code.
If anyone has any insights or suggestions on how to tackle this challenge, your guidance would be greatly appreciated.