Discovering every element on a webpage while scrolling with the help of Selenium WebDriver and the Python programming language

I've been struggling to scrape all elements from a webpage using Selenium. Despite my efforts, I can't seem to retrieve more than 6 elements out of at least 30 on the URL provided. Can you help me identify what I might be overlooking in my code?

import requests
import webbrowser
import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'

res = requests.get(url, headers=headers)
page_soup = bs(res.text, "html.parser")

containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})

print(len(containers))
#for each container find shoe model
shoe_colors = []

for container in containers:
    if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
        shoe_model = container.div.div.img["title"]
        review = container.find('div', {'class':'gl-product-card__reviews-number'})
        review = int(review.text)

driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
shoe_prices = driver.find_elements_by_css_selector('.gl-price')

for price in shoe_prices:
    print(price.text)
print(len(shoe_prices))

Answer №1

For optimal results, it is important to gently scroll down the page as the price data is only requested via ajax when a product is being viewed.

settings = Options()
settings.add_argument('--start-maximized')
driver = webdriver.Chrome(options=settings)

url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
driver.get(url)

scroll_times = len(driver.find_elements_by_class_name('col-s-6')) / 4 # (divide by 4 column product per row)
scrolled = 0
scroll_size = 400

while scrolled < scroll_times:
    driver.execute_script('window.scrollTo(0, arguments[0]);', scroll_size)
    scrolled +=1
    scroll_size += 400
    time.sleep(1)

shoe_prices = driver.find_elements_by_class_name('gl-price')

for price in shoe_prices:
    print(price.text)

print(len(shoe_prices))

Answer №2

Upon running the code trial, it appears that there are discrepancies in the results:

  • According to your findings, there are 30 items with requests and 6 items with Selenium
  • Whereas, I discovered 40 items with requests and 4 items with Selenium

The content on this website is generated dynamically through Lazy Loading. To access all elements, you must scroll down and allow new ones to load within the HTML DOM. The following solution can assist you:

  • Code Block:

    
    // Code block will contain the necessary Python script for web scraping and data extraction
    
  • Console Output:

    
    // Console output illustrating the extracted data from the webpage
    

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Individual Ajax data

Starting out with javascript, I'm a bit unsure of how to tackle this task. Essentially, I am looking to implement a for loop within the ajax data call, rather than listing each item manually. jQuery(document).ready(function() { ...

What is the best way to navigate through an HTML node tree, including all of its sub elements, when walking through

Do you know of a way to iterate through the entire hierarchy of HTML elements and check each one for attributes and more without using JavaScript? We currently have a JavaScript solution that iterates through each child element and all their descendants. ...

I am experiencing an issue with my jQuery loop code not functioning properly when using the .each method within the loop

I am struggling with the following code. <input type="text" name="1" class = "inp<?=$p?>"> <input type="text" name="2" class = "inp<?=$p?>"> <input type="text" name="3" class = "inp<?=$p?>"> <input type="text" na ...

Crafting a robust and secure password using Yup and Formik while receiving personalized error notifications

My Password Validation Dilemma I'm in the process of developing a password field that can assess the strength of the input provided. In a different response, I came across a regex that I could utilize to validate whether the input meets specific crit ...

Utilizing the Google Maps API to geocode addresses and postponing the retrieval of the data

Utilizing Google's maps API to geocode two addresses has brought about a unique challenge for me. I am deferring the returned results and utilizing a $.when().then() method to execute my logic once I receive the coordinates for the string addresses. T ...

Regular expression to limit a string to a maximum of 5 consecutive numeric characters and a total of up to 8 numeric characters

I need help creating a regex pattern that limits a string to no more than 5 consecutive numeric characters and a total of 8 numeric characters. Here are some examples: 12345 => True Yograj => True Yograj1234 ...

reset input fields upon unchecking a checkbox with javascript

Currently in the process of building a form that includes physical address fields and postal address fields. If the user fills in the physical address and the postal address is the same, they can check a box to copy the information over. I have successfull ...

Ways to stop a ng-click event on a child div controller from activating an ng-click in the parent controller?

http://plnkr.co/edit/gB7MtVOOHH0FBJYa6P8t?p=preview The example above demonstrates a parent-child controller setup. In the child controller, when a button is clicked, it displays the div showPop and emits an event to the $rootScope. Upon receiving this e ...

No response received after clicking label twice

Currently, I am utilizing Selenium with a smart client. Within my code, I have an HTML5 label that is set to trigger on double click. Unfortunately, it seems that Selenium is unable to detect the double click event, causing my test to fail. Upon examinin ...

Creating a ListView in React Native and utilizing the CloneWithRow method with an object instead of an

When retrieving data from a webservice, I am able to work with JSON arrays without any issues. WebServiceHandler.get('http:/api.local/stock',{},{) .then((val)=>{ this.setState({ dataSource: this.state.dataSou ...

What steps can I take to confirm that a fresh window has been opened during my Protractor test?

Currently, I am exploring ways to verify if a new window has opened and ensure that the URL is accurate Whenever I attempt to confirm the opening of a new window, the test consistently times out due to challenges in switching between tabs Is there a meth ...

Is there a way to solve the issue of starting Chrome in Jenkins?

An error occurred due to org.openqa.selenium.SessionNotCreatedException: Unable to initiate a new session. Response code 500. Error message indicates that Chrome failed to start and exited abnormally. (Unknown error: DevToolsActivePort file is missing) ( ...

Finding the second through eighth elements in a protractor using a CSS locator

Recently, I have been experimenting with protractor and facing a limitation when trying to reference elements. The only way to refer to them is through CSS, as they only have a class attribute provided. The issue arises when there are more than 7 elements ...

Customizing the add row functionality in material table

Before the add row in the editable table opens, I would like to run a function. https://i.sstatic.net/7kVVL.png It is important for me to execute a specific function before this user interface becomes visible. (such as console.log('Hello')) Here ...

How to create a Bootstrap panel that collapses on mobile devices and expands on desktop screens?

Is there a simple way to remove the "in" class from the div with id "panel-login" when the screen size is less than 1199px (Bootstrap's xs, sm, and md modes)? I believe JavaScript or JQuery could be used for this, but I'm not very proficient in e ...

The Glyphicon icon fails to appear on the initial page load and only shows up after refreshing the

I have been utilizing bootstrap.min.css from bootstrap v3.3.5 which I downloaded from http://getbootstrap.com and used it locally. However, I encountered an issue with glyphicons when running it on IE 9 and above. The glyphicon icon disappears on the first ...

Methods for minimizing components re-rendering

I'm currently exploring Next.js. I have two pages, A and B, each containing component C. Whenever I navigate between the pages, component C gets re-rendered. Is there a way to prevent this from happening? To clarify, I don't want useEffect to exe ...

javascript: obtain the height of the pseudo :before element

Can someone help me figure out how to get the height of a pseudo :before element? I've tried using jQuery but it's not working as expected. Here's what I attempted: $('.test:before').height() // --> null If you want to take a ...

What is the process for adding content to a JSON object?

I may have a simple question, but I'm new to backend development and I'm trying to figure out how to post in JSON format. Here is the JSON schema I am working with: email: { type: String, unique: true, lowercase: true, required ...

Tips for launching a fresh window and embedding HTML into it with jQuery?

I'm attempting to use JavaScript to open a new window, but the HTML content is not being added: var callScriptText = $('#callScriptText').html(); var url = '/Action/CallScript/?callScript='; // Open the current call script in a n ...