The initial item in a pagination/list is double-parsed on an AJAX website using Selenium 3.0.2, Firefox webdriver, and BeautifulSoup 4.5.1

For the past three days, I've been facing a frustrating issue with both Selenium and Bs4. While I suspect Selenium (or my code) to be the culprit.

Like many others before me, I'm attempting to scrape data from this website:

I'm moving from the 2015-16 season to the 2007-08 season. First, I navigate to the season's webpage, then use Selenium to assist in navigating through the pagination for each season. Once completed, I move on to the next season.

To achieve this, I parse each season to extract its pagination links and consolidate them into a list. Currently, I have a list of approximately 72 links that I am iterating over.

Here is a snippet of the list:

tot_links[0:10]
['http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/2/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/3/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/4/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/5/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/6/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/7/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/8/',
 'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/',
 u'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/#/page/2/']
 

I utilize Selenium to handle the website's Javascript while using BS4 to scrape the cell data. Everything appears fine so far.

... Here are the functions `cells_data()` and the loop for going through all the links...

Answer №1

Okay,

Just thought I would share for anyone who may come across this in the future. Turns out, my problem stemmed from not handling waits properly. Somehow, it was causing the page to be parsed twice without me even realizing it.

After some trial and error, I figured out that since the page loads dynamically, adding a specific wait resolved the issue:

wait = WebDriverWait(parser.browser,100).until(EC.visibility_of_element_located
                                                        ((By.CLASS_NAME,"table-main")))

Now everything is functioning as expected. Lesson learned - always make sure to manage your waits correctly when dealing with these types of issues :)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Is it more efficient to pass session variables through an ajax function, or should I access them directly within the script?

I am currently working on a page that utilizes an ajax function to retrieve updates from another page. The function requires the user's id, which is obtained from a session variable, to fetch any new updates. These updates are then displayed in a spec ...

Can someone explain what exactly is [object Object]?

Why is the data value from the database showing as [object Object] instead of the actual data? var dataObj = $(this).closest('form').serialize(); $.ajax({ type: "POST", url: url, data: dataObj, cache: false, ...

How to refresh a page in React when the browser's back button is pressed

In my React project using Material-UI, I have created a simple search form. On page A, users can input values in text boxes and select options from drop-down lists and checkboxes. The results are then displayed on page B. My issue arises when returning to ...

Is it possible to generate an array of objects, where each object includes an observable retrieved through forkJoin?

let food = { id: 1, isTasty: false } Imagine I have a custom function that takes the ID of a food item and returns an observable which resolves to a boolean indicating whether the food is tasty or not. I wish to loop through an array of food items an ...

New approach: AngularJS - Using nested ng-click events

Looking for a solution to a problem with my html code: <div class="outerdiv" data-ng-click="resetText()"> <div class="innerdiv" data-ng-click="showText()"> {{ text }} </div> </div> The outer div has an ng-click fun ...

What is causing my fetch response to not be passed through my dispatch function?

I'm currently utilizing a node server to act as the middleman between firebase and my react native app. Could someone kindly point out what might be going awry in my fetch method below? export const fetchPostsByNewest = () => { return (dispatch ...

Updating Error: Unable to establish connection with IP address 104.16.21.35 on port 80; Error code: ECONNREFUSED. This issue is being handled by the _

I need help updating my Angular version from 5 to 6 and I'm following these steps: Want to upgrade project from Angular v5 to Angular v6 After running the commands ng update @angular/cli and ng update @angular/core, I encountered the err ...

What are the steps to run a webpack project without relying on webpack-dev-server?

I've been working on hosting my project on GitHub pages by creating a /doc file and placing all my HTML, CSS, and JS there. If you're interested, you can check out my project here: https://github.com/mattfrancis888/the_movie_db The only way I&a ...

If an element with a "hidden" display property is loaded in the browser window, will it be visible?

Is an element in a hidden display still using memory when the page is loaded? It's convenient to have many elements on a page, but if 99 elements are hidden and only 1 is displayed, does that impact the loading of the page? I'm curious if the pa ...

Using Javascript to select a radio button in a form depending on the value entered in a text box

I have a form that interacts with a Google Sheet to insert and retrieve data. For instance, the form contains two radio buttons: <input id="Rdio_1" name="RdioSelect" type="radio" class="FirstCheck" value="1" onchange="RadioValInsert ()"/> < ...

Dynamic route fails to return value for ID search

Currently, I am testing by creating an array of users containing their respective IDs and names. There is also a button that triggers an onclick function to add the element's ID to the page's URL in order to establish a dynamic route. However, wh ...

Fixed position not being maintained after clicking the button

Looking for some help with a fixed header issue on my website. The header is supposed to stay at the top while scrolling, which works well. However, when I switch to responsive view, click the menu button, and then go back to desktop view, none of the po ...

Encountered an issue while installing the "sharp" module on MAC M1

When I run npm run dev (gatsby develop) on my MacBook Pro M1 chip, it exits with the error message: Error: Something went wrong installing the "sharp" module However, when I run npm run dev on a MacBook Pro with an Intel chip, everything works fine. I&ap ...

Passing "this" to the context provider value in React

While experimenting with the useContext in a class component, I decided to create a basic React (Next.js) application. The app consists of a single button that invokes a function in the context to update the state and trigger a re-render of the home compon ...

Modify the information and return it to me

I am attempting to modify and return the content inside a custom directive (I have found some resources on SO but they only cover replacement). Here is an example: HTML <glossary categoryID="199">Hello and welcome to my site</glossary> JS . ...

navigating to the following page of a table using selenium

<table class="pagerRegion"> <tbody> <tr> <td class="leftRegion">Page&nbsp;<span class="currentPage">1</span>&nbsp;of 2 Pages</td> <td class="centerRegion">&nbsp ...

Can Comet be implemented without utilizing PrototypeJs?

Can Comet be implemented without utilizing PrototypeJs? ...

The specified column `EventChart.åå` is not found within the existing database

I've been developing a dashboard application using Prisma, Next.js, and supabase. Recently, I encountered an issue when making a GET request. Prisma throws an error mentioning a column EventChart.åå, with a strange alphabet "åå" that I haven&apos ...

Unexplained Reference Error in Next.js Typescript: Variable Accessed before Initialization

I am currently working on an admin website and encountered the error Block-scoped variable used before its declaration.. I will provide details using images and code. This is my first time seeking help on StackOverflow. Error Message: Block-scoped variab ...

Access and retrieve dynamically generated table row values with the use of AngularJS

Hi, I'm new to angularjs and I have a table where I need to dynamically add rows. I've got everything working with a bit of JQuery but I'm having trouble getting the value of dynamically created table rows. Here's my code, can someone p ...