Uncovering data from a dynamic website through the combination of Selenium and PhantomJS

Question

Uncovering data from a dynamic website through the combination of Selenium and PhantomJS

I am attempting to obtain the timer value from this website http://prntscr.com/kcbwd8 located at , and ideally save it in a variable.

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})

for item in result:
    print(item.text)

browser.quit()

I have attempted to run the aforementioned code but encountered the following error:

C:\Users\rober\Anaconda3\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' 19:59:11

Is there a way to resolve this issue? If not, what other methods can be used to retrieve the dynamic values from a website and save them in a variable?

Thank you.

javascript selenium selenium-webdriver web-scraping phantomjs

Answer 1

Answer №1

The maintenance of PhantomJs has been discontinued. Visit this link for more information.

It is advised to switch to headless chrome or firefox.

To update your code, you will need to make the following changes:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument("--headless")
browser= webdriver.Firefox(firefox_options=options, executable_path="Path to geckodriver.exe")
browser.get('https://www.whenisthenextsteamsale.com/');

Download Geckodriver from: Get GeckoDriver Here

Answer 2

The maintenance of PhantomJs has been discontinued. Visit this link for more information.

It is advised to switch to headless chrome or firefox.

To update your code, you will need to make the following changes:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument("--headless")
browser= webdriver.Firefox(firefox_options=options, executable_path="Path to geckodriver.exe")
browser.get('https://www.whenisthenextsteamsale.com/');

Download Geckodriver from: Get GeckoDriver Here

Answer 3

Answer №2

Your code looks impeccable. However, you did not utilize the headers that you defined as:

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

I ran your script exactly as you provided:

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
    print(item.text)
browser.quit()

The output I received matches what you shared:

C:\Python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
08:06:16

It's important to note that the Selenium team has discontinued default support for PhantomJS in Selenium Java Client and will do the same for Selenium Python Client. The warning message you are seeing is from the __init__() method of PhantomJS, shown below:

def __init__(self, executable_path="phantomjs",
             port=0, desired_capabilities=DesiredCapabilities.PHANTOMJS,
             service_args=None, service_log_path=None):
    """
    Creates a new instance of the PhantomJS / Ghostdriver.

    Starts the service and then creates new instance of the driver.

    :Args:
     - executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
     - port - port you would like the service to run, if left as 0, a free port will be found.
     - desired_capabilities: Dictionary object with non-browser specific
       capabilities only, such as "proxy" or "loggingPref".
     - service_args : A List of command line arguments to pass to PhantomJS
     - service_log_path: Path for phantomjs service to log to.
    """
    warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
                  'versions of Chrome or Firefox instead')
    self.service = Service(
        executable_path,
        port=port,
        service_args=service_args,
        log_path=service_log_path)
    self.service.start()

Answer 4

Your code looks impeccable. However, you did not utilize the headers that you defined as:

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

I ran your script exactly as you provided:

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
    print(item.text)
browser.quit()

The output I received matches what you shared:

C:\Python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
08:06:16

It's important to note that the Selenium team has discontinued default support for PhantomJS in Selenium Java Client and will do the same for Selenium Python Client. The warning message you are seeing is from the __init__() method of PhantomJS, shown below:

def __init__(self, executable_path="phantomjs",
             port=0, desired_capabilities=DesiredCapabilities.PHANTOMJS,
             service_args=None, service_log_path=None):
    """
    Creates a new instance of the PhantomJS / Ghostdriver.

    Starts the service and then creates new instance of the driver.

    :Args:
     - executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
     - port - port you would like the service to run, if left as 0, a free port will be found.
     - desired_capabilities: Dictionary object with non-browser specific
       capabilities only, such as "proxy" or "loggingPref".
     - service_args : A List of command line arguments to pass to PhantomJS
     - service_log_path: Path for phantomjs service to log to.
    """
    warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
                  'versions of Chrome or Firefox instead')
    self.service = Service(
        executable_path,
        port=port,
        service_args=service_args,
        log_path=service_log_path)
    self.service.start()

Uncovering data from a dynamic website through the combination of Selenium and PhantomJS

Answer №1

Answer №2

Similar questions

What is the best way to bundle a node module with the choice of adding submodules?

An HTML attribute with a blank value will not display the equals sign operator

Issue: TypeError - The function addTicket is not recognized as a valid function. Utilize the useState hook within the modal component

Spring REST service prevents Cross-Origin Requests with AJAX

How can I retrieve the class of the parent element by referencing the child id in jQuery?

Filtering ng-repeat in AngularJs based on nested data properties

"Integration error: specified token_name parameters are invalid." FORTPAY INTEGRATION

The choices can be found within the "_listener" attribute

Remove background image when input form field is in focus

Dynamic Search Functionality using Ajax and JavaScript

Obtain product pricing information from a JSON file

Ways to fetch a JSON object using JavaScript

Tips for adjusting CSS font sizes within a contenteditable element?

Receiving the error message "The specified path to the driver executable must be set using the webdriver.chrome.driver system property", even after ensuring the correct path

Using the power of node.js to iterate through a loop of queries and execute

"Combining JSON, JavaScript, and HTML for dynamic web development

Tips for integrating Series data into Highcharts using MVC

jQuery.get() function is limited to specific types of webpages

NextJS for Self-hosting Fonts

Ways to verify if two items within a collection of objects share a common value in MongoDB