The initial item in a pagination/list is double-parsed on an AJAX website using Selenium 3.0.2, Firefox webdriver, and BeautifulSoup 4.5.1

Question

The initial item in a pagination/list is double-parsed on an AJAX website using Selenium 3.0.2, Firefox webdriver, and BeautifulSoup 4.5.1

For the past three days, I've been facing a frustrating issue with both Selenium and Bs4. While I suspect Selenium (or my code) to be the culprit.

Like many others before me, I'm attempting to scrape data from this website:

I'm moving from the 2015-16 season to the 2007-08 season. First, I navigate to the season's webpage, then use Selenium to assist in navigating through the pagination for each season. Once completed, I move on to the next season.

To achieve this, I parse each season to extract its pagination links and consolidate them into a list. Currently, I have a list of approximately 72 links that I am iterating over.

Here is a snippet of the list:

tot_links[0:10]
['http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/2/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/3/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/4/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/5/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/6/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/7/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/8/',
 'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/#/page/2/']

I utilize Selenium to handle the website's Javascript while using BS4 to scrape the cell data. Everything appears fine so far.

... Here are the functions `cells_data()` and the loop for going through all the links...

javascript python-2.7 selenium selenium-webdriver beautifulsoup

Answer 1

Answer №1

Okay,

Just thought I would share for anyone who may come across this in the future. Turns out, my problem stemmed from not handling waits properly. Somehow, it was causing the page to be parsed twice without me even realizing it.

After some trial and error, I figured out that since the page loads dynamically, adding a specific wait resolved the issue:

wait = WebDriverWait(parser.browser,100).until(EC.visibility_of_element_located
                                                        ((By.CLASS_NAME,"table-main")))

Now everything is functioning as expected. Lesson learned - always make sure to manage your waits correctly when dealing with these types of issues :)

Answer 2

Okay,

Just thought I would share for anyone who may come across this in the future. Turns out, my problem stemmed from not handling waits properly. Somehow, it was causing the page to be parsed twice without me even realizing it.

After some trial and error, I figured out that since the page loads dynamically, adding a specific wait resolved the issue:

wait = WebDriverWait(parser.browser,100).until(EC.visibility_of_element_located
                                                        ((By.CLASS_NAME,"table-main")))

Now everything is functioning as expected. Lesson learned - always make sure to manage your waits correctly when dealing with these types of issues :)

The initial item in a pagination/list is double-parsed on an AJAX website using Selenium 3.0.2, Firefox webdriver, and BeautifulSoup 4.5.1

Answer №1

Similar questions

Steps for automatically playing the next song when a button is clicked

Implementing Placeholder Text Across Multiple Lines with Material UI

What is the value of x in the equation 2 raised to the power of x equals 800

Error: [$controller:ctrlreg] - The controller registration has failed

Animate.css plugin allows for owl-carousel to smoothly transition sliding from the left side to the right side

Creating dynamic templates for table rows in AngularJS directives

Preparing my JSON data for visualization on a chart

retrieve the coordinates of the northwest and southeast corners of a group of markers displayed on a Google Map

Implement a loader in AngularJS to display when transitioning between pages

Tips for converting API data to DTO (Data Transfer Object) using TypeScript

The selected option in Bootstrap is displayed twice in the Bootstrap modal

Using Javascript within a PHP file to generate JSON output

Struggling with extracting and transferring data from a span tag to another field within the same form using selenium and python

Purge the external CSS files

Integrating Excel into a webpage - is it possible?

Ensure the div element remains fixed at the top of the page

We are having trouble finding the textbox element using Selenium web driver. Can someone please assist us?

Exploring the chosen choice in the Material Design Lite select box

Ways to release a client-side script with npm?

Please proceed with submitting your choices in the order that you have selected