For the past three days, I've been facing a frustrating issue with both Selenium and Bs4. While I suspect Selenium (or my code) to be the culprit.
Like many others before me, I'm attempting to scrape data from this website:
I'm moving from the 2015-16 season to the 2007-08 season. First, I navigate to the season's webpage, then use Selenium to assist in navigating through the pagination for each season. Once completed, I move on to the next season.
To achieve this, I parse each season to extract its pagination links and consolidate them into a list. Currently, I have a list of approximately 72 links that I am iterating over.
Here is a snippet of the list:
tot_links[0:10]
['http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/2/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/3/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/4/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/5/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/6/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/7/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2015-2016/results/#/page/8/',
'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/',
u'http://www.oddsportal.com//soccer/france/ligue-1-2014-2015/results/#/page/2/']
I utilize Selenium to handle the website's Javascript while using BS4 to scrape the cell data. Everything appears fine so far.
... Here are the functions `cells_data()` and the loop for going through all the links...