I am currently facing a challenge with my webcrawler application. It has been successfully crawling most common and simple sites, but I am now dealing with websites where the HTML documents are dynamically generated through forms or JavaScripts. Even though these sites do not display the actual HTML code when viewed in browsers like IE or Firefox, I believe they can still be crawled. They seem to use what is known as "Web Forms" with textboxes, checkboxes, etc., which I am not very familiar with as it relates to web development.
Has anyone else encountered this issue and successfully navigated it? Are there any recommended books or articles that specifically address crawling these more advanced types of websites?
Any advice would be greatly appreciated. Thank you.