Loading AJAX content using Selenium Webdriver as you scroll

Currently, I am utilizing Selenium WebDriver to retrieve the content from a website that unfortunately lacks an API. The site employs AJAX for dynamically loading content as the user scrolls through the page. In order to access this content, my approach involves using JavaScript to scroll down and then attempting to fetch the content using findElements().

In terms of the setup, the webpage consists of various nested elements, with one specific div featuring the "GridItems" class (lacking a name or id). Within this div are numerous child elements carrying the "Item" class (once again, missing a name or id, only possessing the class attribute). My objective is to obtain each element with the "Item" class within the div container. Initially, there are about 25 items accessible when the page loads, with additional items loading as the user scrolls further.

The main challenges I encounter are twofold: first, determining when to stop scrolling once reaching the bottom of the page poses an issue. Identifying the appropriate stopping condition remains elusive. While considering using Window.scrollheight, it proves ineffective as it provides the height of the current window rather than the total height after all content is loaded. One proposed solution involves testing if a certain element at the bottom of the page is visible/clickable; however, visibility issues may be due to delayed loading instead of actual reachability. Even implementing a Wait mechanism may prove futile if timed out without clarity on the underlying cause.

The second dilemma arises during scrolling, where newer elements load dynamically while pushing older ones off the DOM. Thus, simply scrolling to the bottom and applying findElements() may result in overlooking items displaced by newer additions. Currently, I address this by:

    int numitems = 135;
    List<WebElement> newitems;
    List<WebElement> allitems = new ArrayList<WebElement>(50);
    
    do {
        //scroll down the full length of the visible window three times
        for(int i=0; i < 3; i++)
        {
            //scroll down
            js.executeScript("window.scrollTo(0, document.body.offsetHeight)");

        }
        
        //validate presence of desired div before proceeding
        WebElement cont =  (new WebDriverWait(driver, 100))
.until(ExpectedConditions.presenceOfElementLocated(By.className("GridItems")));
                

       //retrieve all Items within the div
        newitems = cont.findElements(By.className("Item"));
        

    //append extracted items after each round of scrolling   
        allitems.addAll(newitems);
      
    //continue until list surpasses expected item count
    }while(numitems > allitems.size()); 

This process entails scrolling thrice, capturing newly available elements, and appending them to a list for repeated cycles until surpassing the anticipated number of items found.

An inherent flaw lies in varying numbers of items added during each scroll, leading to overlaps within the allitems list across iterations. As these Elements lack unique identifiers and content information, deduplication becomes challenging. Moreover, potential losses occur when scrolling fails to perfectly cover the entire content area. Consequently, stale references to earlier items present a drawback upon processing.

Should I opt for immediate item processing to counteract these stability concerns, resulting code complexity looms ahead. Adhering to this method allows content verification and duplicate identification but might not guarantee comprehensive coverage.

If you have any recommendations on optimal strategies to overcome these obstacles or perhaps identify crucial oversights, your insights would be greatly appreciated. Existing Stack Overflow queries regarding AJAX-driven content loads touch upon distinct challenges; mine predominantly addresses efficient extraction mechanisms. Intuitively, a superior approach can possibly streamline this process - does one exist?

Apologies for the verbose narrative; clarity is paramount. Your input is invaluable.

Many thanks, bsg

Edit:

Please note that the accepted response partially answers my query. For unresolved aspects, iteratively scrolling one screen at a time and aggregating fresh elements mitigated data loss. Following each scroll action, all loaded elements underwent processing for content preservation. Redundancy concerns were minimized utilizing a HashSet. Ceasing scrolling upon hitting rock bottom, verified by methods from the aforementioned response, ensured seamless operation. Trust this serves as helpful guidance.

Answer №1

How can I determine the exact moment when I reach the bottom of a webpage?

Traditional JavaScript isn't very effective in handling this task, so I turned to jQuery for assistance. When I finally hit the bottom of the page, this condition becomes true:

$(document).height() == ($(window).height() + $(window).scrollTop();

Is there a defining characteristic that sets them apart from each other? In your flickr demonstration, images are involved, and the image URL could potentially serve as a unique identifier by utilizing WebElement.getAttribute("src").

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

After checking the checkbox to add a class, I then need to remove that class without having to interact with the entire document

When I click on a checkbox, an animation is added to a div. However, I am struggling to remove this animation unless I click elsewhere on the document. The goal is for the checkbox to both add and remove the class. JS $('#inOrder').click(functi ...

What could be causing textbox2's value to not update when textbox1's value is changed? (Novice)

I am looking to create an interactive form with two textboxes and a checkbox. When the checkbox is checked, I want the value of textbox2 to mirror whatever is typed into textbox1. Is there a way to have the value of textbox2 automatically updated when the ...

Error 9 in Firebase: The function 'initializeApp' could not be located within the 'firebase/app' module

Since updating to firebase 9, I've been encountering issues with importing certain functions that were working fine on firebase 8. I've gone through the documentation and made necessary code improvements, but the error persists. This issue is not ...

Swipe down to view all the links

Out of 3821 links, only 103 were provided to me. I tried applying the condition `window.scroll` to retrieve all the links, but unfortunately, it did not work as expected. from selenium.webdriver.common.by import By from selenium.webdriver.common. ...

Attempting to send a formik form to a specified action URL

Seeking assistance with a simple fix as I navigate through the learning process. I have created an action called addTenant() that is supposed to receive the state and use it to dispatch a post API call, like so: export const addTenant = (tenant) => (di ...

I have successfully populated a text box with information pulled from a select option, however

When I try to select from my list, only one of the two fields is being picked up by the text field. Can anyone help me with this issue? Here is the code I am using: <script> function ProdValue(data) { document.getElementById("ProdName"). ...

Ts2532, The existence of the object is potentially unsafe

I am encountering an issue while trying to update a task in my project built with the MEAN stack. Although all APIs are functioning properly, I am facing an error when attempting to patch an element using the ID parameter. The error message displayed is: & ...

Please retain only the clean <p> elements using Jsoup and remove all other content

I'm struggling to figure out a simple solution for this problem. My elements consist of a mix of <p>, p class="example">, and <p><strong>...</strong></p>. All I want to do is preserve everything within the clean < ...

SQL AJAX Query Form

I have been searching for tutorials on how to create a good form with PHP and AJAX. I tried starting with a code given to me by a friend and managed to send the request successfully. However, it seems like the data I receive is empty. Could you please take ...

Ajax is failing to show a success message for every registration input

After the user registers, I want to display a success message and clear the text boxes for the next registration. However, currently, the success message is only shown for the first registration. I want to display a success message for each registration. ...

Receive the latest order details following jQuery Sort

Utilizing jQuery UI's Sortable feature, I am reordering a list of items and aiming to adjust the class of each item based on its new position. Below is the HTML structure: <ul id="sortable"> <li class="1">apple</li> <li c ...

Guide on using double quotation marks for keys and values in a JavaScript array

I have an array that contains buffer data shown below [{ Buffer_Data: <Buffer b5 eb 2d> },{ Buffer_Data: <Buffer b5 eb 2d> },{ Buffer_Data: <Buffer b5 eb 2d> },{ Buffer_Data: <Buffer b5 eb 2d> }] I need to add double quotes to bot ...

Adjusting the line-height in CSS dynamically based on the length of characters using jQuery

I am facing an issue with the Twitter widget on my website where our company's latest tweet sometimes gets cut off if it exceeds a certain length. I want to dynamically adjust the line-height CSS property of the element based on the tweet's chara ...

Using Node JS as both an HTTP server and a TCP socket client simultaneously

Currently, I am developing a Node.js application to act as an HTTP server communicating with a TCP socket server. The code snippet for this setup is displayed below: var http = require('http'); var net = require('net'); var url = requi ...

Issue with deactivating attribute through class name element retrieval

There are multiple input tags in this scenario: <input type="checkbox" class="check" disabled id="identifier"> as well as: <input type="checkbox" class="check" disabled> The goal is to remov ...

Ways to create a table with columns from various fields obtained through an API call

Looking to preprocess data received from an API, the raw data is structured as follows: https://i.sstatic.net/a9Q2Z.png Desiring to dynamically generate a table with columns based on the fields task_name and saved_answers. It's important to note tha ...

Repair the voting system

In my Ruby on Rails app, I have successfully set up an up/down voting system that utilizes ajax. When a user clicks the buttons, it triggers the create method to insert a vote into the database and calculate the total sum of votes. Currently, users can fr ...

Troubleshooting: Why Isn't the React Child Component Rendering After Updates

I am a beginner with React and I'm trying my hand at building a custom scrollbar. I'm using a local JSON API to mimic fetching data, and then displaying the data as 'cards'. I created a few other components to help manage and organize t ...

Evaluation of jQuery code

Just starting out with jQuery and programming in general. I posted my first hour of work here and would appreciate some feedback on how to improve it. $(function() { function hideElements() //Hides specified elements on page load. { $("li.credentia ...

Storing and retrieving text entered in EditText using SharedPreferences in an Android application

In my XML file, I have an EditText element defined as follows: <EditText android:id="@+id/hrvalue" android:layout_width="wrap_content" android:layout_height="wrap_content" android:text="64" android:textSize="18sp"> </EditText> Now, in my conf ...