When trying to crawl a specific website that appears to generate its content dynamically using DWR, I encountered a challenge. The source code of the page only reveals a minimal amount of HTML as a 'shell', without any useful links for crawling. Instead, it consists of POST requests that return responses containing Javascript:
throw 'allowScriptTagRemoting is false.';
//#DWR-INSERT
//#DWR-REPLY
var a1 = {}; var a2 = {}; var a3 = {}; // ... and so on.
a1.configs=a3;a1.defaultSite=true;a1.defaultValues=a4; // ... and more.
This response comprises around 150 lines of data.
I am aware that this behavior is typical of how DWR operates. However, I am curious about strategies web crawlers employ to navigate such scenarios. Is there a way for them to execute the Javascript in the AJAX response and then patiently wait for the HTML to finalize its modifications?
This situation seems distinct from standard Ajax requests where the response may include HTML, and the DOM is updated upon the request's completion. Alternatively, they might return data which is subsequently used by the remaining page's Javascript to update the DOM. In both cases, post-execution of the response is unnecessary.
Any insights or advice on addressing this issue would be greatly valued.