Encountered issues loading JavaScript and received a pyppeteer error while trying to access a website through requests

Question

Encountered issues loading JavaScript and received a pyppeteer error while trying to access a website through requests

I am facing a challenge when trying to scrape a webpage post login using BeautifulSoup and requests.

Initially, I encountered a roadblock where the page requested JavaScript to be enabled to continue using the application.

To work around this issue, I decided to utilize html_requests with the code snippet below:

from requests_html import HTMLSession

session = HTMLSession()

session.get(url)
session.post(loginUrl, data = {"email":"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="01646c60686d41666c60686d2f626e6c">[email protected]</a>", "password": "Pass123"})


resp.html.render()

Despite this, I continued to face the same error or encountered:

pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH

As a result, I opted to use selenium, even though my preference is request due to its faster script speed.

Although the selenium approach worked well, upon loading the selenium page source into BeautifulSoup, I encountered the

Please enable JavaScript to continue using this application.

error page once again.

This has left me puzzled as the driver loads successfully and I simply parse the HTML page from selenium.

Any suggestions on how to resolve both the requests_html and BeautifulSoup issues?

javascript selenium-webdriver python-requests-html pyppeteer

Answer 1

Answer №1

If you want to access data without the need for pyppeteer or selenium, you can simply log in using basic requests.

The crucial step is to retrieve the accessToken from the Login endpoint and then apply it to subsequent requests.

The API calls I'm utilizing here provide the essential information on the page post-login. The rest of the HTML serves mainly as visual decoration. The data obtained from the API mirrors what is visible on the website:

https://i.sstatic.net/pI054.png

As for the

pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH

, this error typically occurs due to a failure in the SSL/TLS handshake. It may be caused by the server using an outdated or unsupported SSL/TLS version or cipher suite.

For more insights on this error, you can refer to this link.

TL;DR: Unfortunately, there isn't much you can do about it.

I suggest adopting my method (relying on API calls without the need for a browser).

The advantages of this approach include:

lightweight
relatively fast
no SSL errors
full data access

Here's the procedure to retrieve sales data:

import requests
from dateutil.parser import parse

login_url = "https://api-it.saywow.me/it-it/api/Users/Login"
sales_url = "https://api-it.saywow.me/it-it/api/Booking/GetCanBookSaleEvents"
payload = {
    "email": "YOUR_EMAIL",
    "password": "YOUR_PASSWORD",
}

# Define functions for formatting and displaying sales data

def main() -> None:
    with requests.Session() as session:
        response = session.post(login_url, json=payload)
        token = response.json()["data"]["accessToken"]
        sales = session.post(
            sales_url,
            headers={"Authorization": f"Bearer {token}"},
        )
        show_sales(sales.json()["data"])

# Execute the main function

if __name__ == "__main__":
    main()

Upon entering your email and a valid password, the output should resemble this:

Event: HOUSE OF LUXURY
Address: Viale John Fitzgerald Kennedy 54, Napoli NA
Dates: 08 December - 17 December
Booked: You can book this event!


Event: Monot Archive Sale
Address: Via Orobia 11, Milano MI
Dates: 28 November - 06 December
Booked: You can book this event!

The sales_data table contains additional details such as location, phone numbers, etc.

For example:

...

"addressName": "Via Orobia",
"addressNumber": "11",
"addressCity": "Milano",
"addressProvince": "MI",
"addressZip": "20139",
"addressCountry": "IT",
"addressLat": 45.4426322,
"addressLon": 9.2056631,

...

Answer 2

If you want to access data without the need for pyppeteer or selenium, you can simply log in using basic requests.

The crucial step is to retrieve the accessToken from the Login endpoint and then apply it to subsequent requests.

The API calls I'm utilizing here provide the essential information on the page post-login. The rest of the HTML serves mainly as visual decoration. The data obtained from the API mirrors what is visible on the website:

https://i.sstatic.net/pI054.png

As for the

pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH

, this error typically occurs due to a failure in the SSL/TLS handshake. It may be caused by the server using an outdated or unsupported SSL/TLS version or cipher suite.

For more insights on this error, you can refer to this link.

TL;DR: Unfortunately, there isn't much you can do about it.

I suggest adopting my method (relying on API calls without the need for a browser).

The advantages of this approach include:

lightweight
relatively fast
no SSL errors
full data access

Here's the procedure to retrieve sales data:

import requests
from dateutil.parser import parse

login_url = "https://api-it.saywow.me/it-it/api/Users/Login"
sales_url = "https://api-it.saywow.me/it-it/api/Booking/GetCanBookSaleEvents"
payload = {
    "email": "YOUR_EMAIL",
    "password": "YOUR_PASSWORD",
}

# Define functions for formatting and displaying sales data

def main() -> None:
    with requests.Session() as session:
        response = session.post(login_url, json=payload)
        token = response.json()["data"]["accessToken"]
        sales = session.post(
            sales_url,
            headers={"Authorization": f"Bearer {token}"},
        )
        show_sales(sales.json()["data"])

# Execute the main function

if __name__ == "__main__":
    main()

Upon entering your email and a valid password, the output should resemble this:

Event: HOUSE OF LUXURY
Address: Viale John Fitzgerald Kennedy 54, Napoli NA
Dates: 08 December - 17 December
Booked: You can book this event!


Event: Monot Archive Sale
Address: Via Orobia 11, Milano MI
Dates: 28 November - 06 December
Booked: You can book this event!

The sales_data table contains additional details such as location, phone numbers, etc.

For example:

...

"addressName": "Via Orobia",
"addressNumber": "11",
"addressCity": "Milano",
"addressProvince": "MI",
"addressZip": "20139",
"addressCountry": "IT",
"addressLat": 45.4426322,
"addressLon": 9.2056631,

...

Encountered issues loading JavaScript and received a pyppeteer error while trying to access a website through requests

Answer №1

Similar questions

Transfer the data in the columns of Sheet1 to Sheet2 and eliminate any duplicates using Google App Script

What is the most effective way to retrieve the count of users who have logged in within the past three months by utilizing Jquery

I would like a div element to slide up from the bottom of the page

What could be causing my Mocha reporter to duplicate test reports?

Fill the second dropdown menu options based on the selection made in the first dropdown menu

Automatically closing the AppDateTimePicker modal in Vuexy theme after selecting a date

The hover functionality is not functioning as expected on the demo website when using Selenium WebDriver

What is the best way to include a class with Knockout JS?

Perform a Fetch API request for every element in a Jinja2 loop

What is the best way to transmit real-time stdout data from a Node.js server to an AngularJS client?

Activate a jQuery collapsible feature through an external hyperlink

What is the solution to the error message stating that <tr> cannot be a child of <div>?

unable to locate the allong.es variadic function

Rejuvenate a just-launched window.open starting from the about:blank

"Utilizing a dynamic global variable in Node.js, based on the variable present in

I'm having trouble with my bootstrap dropdown and I've exhausted all of my options trying to fix it based on my current understanding

Stopping a velocity.js animation once it has completed: is it possible?

Corporate firewall causing issues with AJAX call execution

Can you identify the issue with my database file?

One jQuery plugin was functioning perfectly, while the other one was failing to work as expected