Loading AJAX content using Selenium Webdriver as you scroll

Question

Loading AJAX content using Selenium Webdriver as you scroll

Currently, I am utilizing Selenium WebDriver to retrieve the content from a website that unfortunately lacks an API. The site employs AJAX for dynamically loading content as the user scrolls through the page. In order to access this content, my approach involves using JavaScript to scroll down and then attempting to fetch the content using findElements().

In terms of the setup, the webpage consists of various nested elements, with one specific div featuring the "GridItems" class (lacking a name or id). Within this div are numerous child elements carrying the "Item" class (once again, missing a name or id, only possessing the class attribute). My objective is to obtain each element with the "Item" class within the div container. Initially, there are about 25 items accessible when the page loads, with additional items loading as the user scrolls further.

The main challenges I encounter are twofold: first, determining when to stop scrolling once reaching the bottom of the page poses an issue. Identifying the appropriate stopping condition remains elusive. While considering using Window.scrollheight, it proves ineffective as it provides the height of the current window rather than the total height after all content is loaded. One proposed solution involves testing if a certain element at the bottom of the page is visible/clickable; however, visibility issues may be due to delayed loading instead of actual reachability. Even implementing a Wait mechanism may prove futile if timed out without clarity on the underlying cause.

The second dilemma arises during scrolling, where newer elements load dynamically while pushing older ones off the DOM. Thus, simply scrolling to the bottom and applying findElements() may result in overlooking items displaced by newer additions. Currently, I address this by:

    int numitems = 135;
    List<WebElement> newitems;
    List<WebElement> allitems = new ArrayList<WebElement>(50);
    
    do {
        //scroll down the full length of the visible window three times
        for(int i=0; i < 3; i++)
        {
            //scroll down
            js.executeScript("window.scrollTo(0, document.body.offsetHeight)");

        }
        
        //validate presence of desired div before proceeding
        WebElement cont =  (new WebDriverWait(driver, 100))
.until(ExpectedConditions.presenceOfElementLocated(By.className("GridItems")));
                

       //retrieve all Items within the div
        newitems = cont.findElements(By.className("Item"));
        

    //append extracted items after each round of scrolling   
        allitems.addAll(newitems);
      
    //continue until list surpasses expected item count
    }while(numitems > allitems.size());

This process entails scrolling thrice, capturing newly available elements, and appending them to a list for repeated cycles until surpassing the anticipated number of items found.

An inherent flaw lies in varying numbers of items added during each scroll, leading to overlaps within the allitems list across iterations. As these Elements lack unique identifiers and content information, deduplication becomes challenging. Moreover, potential losses occur when scrolling fails to perfectly cover the entire content area. Consequently, stale references to earlier items present a drawback upon processing.

Should I opt for immediate item processing to counteract these stability concerns, resulting code complexity looms ahead. Adhering to this method allows content verification and duplicate identification but might not guarantee comprehensive coverage.

If you have any recommendations on optimal strategies to overcome these obstacles or perhaps identify crucial oversights, your insights would be greatly appreciated. Existing Stack Overflow queries regarding AJAX-driven content loads touch upon distinct challenges; mine predominantly addresses efficient extraction mechanisms. Intuitively, a superior approach can possibly streamline this process - does one exist?

Apologies for the verbose narrative; clarity is paramount. Your input is invaluable.

Many thanks, bsg

Edit:

Please note that the accepted response partially answers my query. For unresolved aspects, iteratively scrolling one screen at a time and aggregating fresh elements mitigated data loss. Following each scroll action, all loaded elements underwent processing for content preservation. Redundancy concerns were minimized utilizing a HashSet. Ceasing scrolling upon hitting rock bottom, verified by methods from the aforementioned response, ensured seamless operation. Trust this serves as helpful guidance.

java javascript ajax selenium-webdriver

Answer 1

Answer №1

How can I determine the exact moment when I reach the bottom of a webpage?

Traditional JavaScript isn't very effective in handling this task, so I turned to jQuery for assistance. When I finally hit the bottom of the page, this condition becomes true:

$(document).height() == ($(window).height() + $(window).scrollTop();

Is there a defining characteristic that sets them apart from each other? In your flickr demonstration, images are involved, and the image URL could potentially serve as a unique identifier by utilizing WebElement.getAttribute("src").

Answer 2

How can I determine the exact moment when I reach the bottom of a webpage?

Traditional JavaScript isn't very effective in handling this task, so I turned to jQuery for assistance. When I finally hit the bottom of the page, this condition becomes true:

$(document).height() == ($(window).height() + $(window).scrollTop();

Is there a defining characteristic that sets them apart from each other? In your flickr demonstration, images are involved, and the image URL could potentially serve as a unique identifier by utilizing WebElement.getAttribute("src").

Loading AJAX content using Selenium Webdriver as you scroll

Edit:

Answer №1

Similar questions

After checking the checkbox to add a class, I then need to remove that class without having to interact with the entire document

What could be causing textbox2's value to not update when textbox1's value is changed? (Novice)

Error 9 in Firebase: The function 'initializeApp' could not be located within the 'firebase/app' module

Swipe down to view all the links

Attempting to send a formik form to a specified action URL

I have successfully populated a text box with information pulled from a select option, however

Ts2532, The existence of the object is potentially unsafe

Please retain only the clean <p> elements using Jsoup and remove all other content

SQL AJAX Query Form

Ajax is failing to show a success message for every registration input

Receive the latest order details following jQuery Sort

Guide on using double quotation marks for keys and values in a JavaScript array

Adjusting the line-height in CSS dynamically based on the length of characters using jQuery

Using Node JS as both an HTTP server and a TCP socket client simultaneously

Issue with deactivating attribute through class name element retrieval

Ways to create a table with columns from various fields obtained through an API call

Repair the voting system

Troubleshooting: Why Isn't the React Child Component Rendering After Updates

Evaluation of jQuery code

Storing and retrieving text entered in EditText using SharedPreferences in an Android application