The initial item in a pagination/list is double-parsed on an AJAX website using Selenium 3.0.2, Firefox webdriver, and BeautifulSoup 4.5.1

For the past three days, I've been facing a frustrating issue with both Selenium and Bs4. While I suspect Selenium (or my code) to be the culprit.

Like many others before me, I'm attempting to scrape data from this website:

I'm moving from the 2015-16 season to the 2007-08 season. First, I navigate to the season's webpage, then use Selenium to assist in navigating through the pagination for each season. Once completed, I move on to the next season.

To achieve this, I parse each season to extract its pagination links and consolidate them into a list. Currently, I have a list of approximately 72 links that I am iterating over.

Here is a snippet of the list:

tot_links[0:10]
['http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/2/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/3/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/4/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/5/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/6/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/7/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/8/',
 'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/#/page/2/']
 

I utilize Selenium to handle the website's Javascript while using BS4 to scrape the cell data. Everything appears fine so far.

... Here are the functions `cells_data()` and the loop for going through all the links...

Answer №1

Okay,

Just thought I would share for anyone who may come across this in the future. Turns out, my problem stemmed from not handling waits properly. Somehow, it was causing the page to be parsed twice without me even realizing it.

After some trial and error, I figured out that since the page loads dynamically, adding a specific wait resolved the issue:

wait = WebDriverWait(parser.browser,100).until(EC.visibility_of_element_located
                                                        ((By.CLASS_NAME,"table-main")))

Now everything is functioning as expected. Lesson learned - always make sure to manage your waits correctly when dealing with these types of issues :)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Steps for automatically playing the next song when a button is clicked

I have encountered a challenge in developing a music player. The issue lies in the loading of the next song when the user clicks the 'next' button. While the new data is successfully updated in both the state and render, the music does not automa ...

Implementing Placeholder Text Across Multiple Lines with Material UI

Currently, for the React App I am developing, I am utilizing Material UI. In order to achieve a multi-line placeholder for a textarea using the TextField component, here is what I have so far: <TextField id="details" ful ...

What is the value of x in the equation 2 raised to the power of x equals 800

Similar Question: What is the reverse of Math.pow in JavaScript? 2^x=i If i is given, how can we determine x using Javascript? ...

Error: [$controller:ctrlreg] - The controller registration has failed

I am currently exploring the world of AngularJs and attempting to display options from a json file. However, I keep encountering the following error message: "Error: [$controller:ctrlreg]" Below is the code snippet I am working with: var sj = angular. ...

Animate.css plugin allows for owl-carousel to smoothly transition sliding from the left side to the right side

I have a question regarding the usage of two plugins on my website. The first plugin is Owl Carousel 2, and the second one is Animate.css. While using `animateIn: 'slideInLeft'` with Owl Carousel 2 is working fine, I am facing an issue with `ani ...

Creating dynamic templates for table rows in AngularJS directives

Is it possible to dynamically load an AngularJS Directive templateUrl while working within a table? In my scenario, I have the following HTML structure where I am repeating a tr element with a fw-rule directive: <tbody> <tr ng-repeat="rule in ...

Preparing my JSON data for visualization on a chart

I have successfully retrieved data using this API, but now I need to transform it into a chart within my react project. As a newcomer to JS and React, I am struggling to create a chart with the JSON data. My objective is to display prices by bedrooms over ...

retrieve the coordinates of the northwest and southeast corners of a group of markers displayed on a Google Map

Is there a more efficient way to get the NE and SW corners of a set of markers on a Google map without iterating over each marker individually using JavaScript or Google functions? function fnSetBounds(){ var lowLat = 90; var highLat ...

Implement a loader in AngularJS to display when transitioning between pages

Is there a way to implement a loader that appears when the page starts changing and only disappears once the entire page is fully rendered to prevent clipping bugs? I have already set up the loader as shown below: $scope.$on('$routeChangeStart' ...

Tips for converting API data to DTO (Data Transfer Object) using TypeScript

Here is an array of vehicles with their details. export const fetchDataFromApi = () => { return [ { vehicleId: 1, vehicleType: 'car', seats: 4, wheelType: 'summer', updatedAt: new Date().toISOString }, { vehicleId: 2, vehic ...

The selected option in Bootstrap is displayed twice in the Bootstrap modal

I am facing an issue with Bootstrap Select-box showing multiple times in a bootstrap modal wizard. I have tried various solutions from Stack Overflow but none of them seem to work. A screenshot of the problem can be seen below: Here is the relevant part o ...

Using Javascript within a PHP file to generate JSON output

Can you integrate Javascript code within a PHP file that utilizes header('Content-Type: application/json'); to produce output in JSON format? UPDATE: I'm attempting to modify the color of a CSS class when $est = 'Crest', but the J ...

Struggling with extracting and transferring data from a span tag to another field within the same form using selenium and python

I am struggling with an issue while running a code using selenium in Python. I am trying to extract values from span tags and store them in variables named "cap1, cap2, and cap3." After saving these values, I need to input them into another field on a web ...

Purge the external CSS files

Scenario In my React Router setup, most pages include their own .css files along with the default antd (UI framework) stylesheet: import '../styles.css'; This ensures that all components inherit these styles automatically. Issue at Hand Now, I ...

Integrating Excel into a webpage - is it possible?

Currently facing an issue on my website. I'm trying to open a 'file://' URL directly with the <a href=""> element in a browser, but it's prohibited. I'm searching for a plugin or similar solution that can enable me to execut ...

Ensure the div element remains fixed at the top of the page

I created a script to identify when I reach the navigation bar div element and then change its CSS to have a fixed position at the top. However, I am encountering an issue where instead of staying fixed, it jumps back to the beginning of the screen and fli ...

We are having trouble finding the textbox element using Selenium web driver. Can someone please assist us?

Here is the HTML tag I am working with: I included a link to travelcube and also added this HTML tag: <input type="text" maxlength="10" size="10" name="amount"> In Selenium web driver using Java, I am having difficulty locating the text box elem ...

Exploring the chosen choice in the Material Design Lite select box

Consider the following scenario. If I want to extract the name of the country chosen using JavaScript, how can this be achieved? <div class="mdl-textfield mdl-js-textfield mdl-textfield--floating-label getmdl-select getmdl-select__fullwidth"> ...

Ways to release a client-side script with npm?

Within my nodejs package, I have included code that can be executed on both the backend and in a single .js file for browsers. In order to utilize the browser script, it must be placed within a script element in an HTML file. My query pertains to whether t ...

Please proceed with submitting your choices in the order that you have selected

My goal is to submit options from a dropdown list in the order they are selected, rather than in the order they appear in the list. How can I achieve this? Here is the HTML code for the dropdown: < select style = "padding: 1em;" name = "skills" multi ...