Discovering every element on a webpage while scrolling with the help of Selenium WebDriver and the Python programming language

Question

Discovering every element on a webpage while scrolling with the help of Selenium WebDriver and the Python programming language

I've been struggling to scrape all elements from a webpage using Selenium. Despite my efforts, I can't seem to retrieve more than 6 elements out of at least 30 on the URL provided. Can you help me identify what I might be overlooking in my code?

import requests
import webbrowser
import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'

res = requests.get(url, headers=headers)
page_soup = bs(res.text, "html.parser")

containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})

print(len(containers))
#for each container find shoe model
shoe_colors = []

for container in containers:
    if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
        shoe_model = container.div.div.img["title"]
        review = container.find('div', {'class':'gl-product-card__reviews-number'})
        review = int(review.text)

driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
shoe_prices = driver.find_elements_by_css_selector('.gl-price')

for price in shoe_prices:
    print(price.text)
print(len(shoe_prices))

javascript python-3.x selenium lazy-loading webdriverwait

Answer 1

Answer №1

For optimal results, it is important to gently scroll down the page as the price data is only requested via ajax when a product is being viewed.

settings = Options()
settings.add_argument('--start-maximized')
driver = webdriver.Chrome(options=settings)

url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
driver.get(url)

scroll_times = len(driver.find_elements_by_class_name('col-s-6')) / 4 # (divide by 4 column product per row)
scrolled = 0
scroll_size = 400

while scrolled < scroll_times:
    driver.execute_script('window.scrollTo(0, arguments[0]);', scroll_size)
    scrolled +=1
    scroll_size += 400
    time.sleep(1)

shoe_prices = driver.find_elements_by_class_name('gl-price')

for price in shoe_prices:
    print(price.text)

print(len(shoe_prices))

Answer 2

For optimal results, it is important to gently scroll down the page as the price data is only requested via ajax when a product is being viewed.

settings = Options()
settings.add_argument('--start-maximized')
driver = webdriver.Chrome(options=settings)

url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
driver.get(url)

scroll_times = len(driver.find_elements_by_class_name('col-s-6')) / 4 # (divide by 4 column product per row)
scrolled = 0
scroll_size = 400

while scrolled < scroll_times:
    driver.execute_script('window.scrollTo(0, arguments[0]);', scroll_size)
    scrolled +=1
    scroll_size += 400
    time.sleep(1)

shoe_prices = driver.find_elements_by_class_name('gl-price')

for price in shoe_prices:
    print(price.text)

print(len(shoe_prices))

Answer 3

Answer №2

Upon running the code trial, it appears that there are discrepancies in the results:

According to your findings, there are 30 items with requests and 6 items with Selenium
Whereas, I discovered 40 items with requests and 4 items with Selenium

The content on this website is generated dynamically through Lazy Loading. To access all elements, you must scroll down and allow new ones to load within the HTML DOM. The following solution can assist you:

Code Block:


// Code block will contain the necessary Python script for web scraping and data extraction

Console Output:


// Console output illustrating the extracted data from the webpage

Answer 4

Upon running the code trial, it appears that there are discrepancies in the results:

According to your findings, there are 30 items with requests and 6 items with Selenium
Whereas, I discovered 40 items with requests and 4 items with Selenium

The content on this website is generated dynamically through Lazy Loading. To access all elements, you must scroll down and allow new ones to load within the HTML DOM. The following solution can assist you:

Code Block:


// Code block will contain the necessary Python script for web scraping and data extraction

Console Output:


// Console output illustrating the extracted data from the webpage

Discovering every element on a webpage while scrolling with the help of Selenium WebDriver and the Python programming language

Answer №1

Answer №2

Similar questions

Individual Ajax data

What is the best way to navigate through an HTML node tree, including all of its sub elements, when walking through

I am experiencing an issue with my jQuery loop code not functioning properly when using the .each method within the loop

Crafting a robust and secure password using Yup and Formik while receiving personalized error notifications

Utilizing the Google Maps API to geocode addresses and postponing the retrieval of the data

Regular expression to limit a string to a maximum of 5 consecutive numeric characters and a total of up to 8 numeric characters

reset input fields upon unchecking a checkbox with javascript

Ways to stop a ng-click event on a child div controller from activating an ng-click in the parent controller?

No response received after clicking label twice

Creating a ListView in React Native and utilizing the CloneWithRow method with an object instead of an

What steps can I take to confirm that a fresh window has been opened during my Protractor test?

Is there a way to solve the issue of starting Chrome in Jenkins?

Finding the second through eighth elements in a protractor using a CSS locator

Customizing the add row functionality in material table

How to create a Bootstrap panel that collapses on mobile devices and expands on desktop screens?

The Glyphicon icon fails to appear on the initial page load and only shows up after refreshing the

Methods for minimizing components re-rendering

javascript: obtain the height of the pseudo :before element

What is the process for adding content to a JSON object?

Tips for launching a fresh window and embedding HTML into it with jQuery?