extracting web content with selenium and javascript integration

Question

extracting web content with selenium and javascript integration

Struggling to extract JavaScript content from a website with selenium and geckodriver, but coming up empty-handed. Below is the snippet of JavaScript code:

<div _ngcontent-c2="" class="header-wrapper">
    <div _ngcontent-c2="" class="title">Suda Office</div>
    <div _ngcontent-c2="" class="update">Jul 05 11:07 AM</div>
</div>

<div _ngcontent-c2="">
    <div _ngcontent-c2="" class="item-row title-headers">
        <div _ngcontent-c2="" class="item-col head1">Route</div>
        <div _ngcontent-c2="" class="item-col head2">Destination</div>
        <div _ngcontent-c2="" class="item-col">
            <div _ngcontent-c2="" class="head3 head3-height">ETA</div>
        </div>
    </div>

    <div _ngcontent-c2="">
        <div _ngcontent-c2="" class="alternet-color">
            <div _ngcontent-c2="" class="item-row item-eta-row">
                <div _ngcontent-c2="" class="item-col eta-route">15 T</div>
                <div _ngcontent-c2="" class="item-col eta-destination">
                    <marquee _ngcontent-c2=""> Charbagh</marquee></div>
                <div _ngcontent-c2="" class="item-col eta-col">                
                    <div _ngcontent-c2="" class="eta-display-wrapper">
                        <div _ngcontent-c2="" class="display">
                            <span _ngcontent-c2="" class="space"></span>
                            <span _ngcontent-c2="" class="currentTiming">10 min</span>
                        </div>

                    </div>
                </div>
            </div>
        </div>
    </div>
</div>

Need to retrieve data from class="item-col eta-route", class="item-col eta-destination", and class="currentTiming" in the JavaScript above. Tried using the following code without success:

from selenium import webdriver
driver = webdriver.Firefox()
driver.get(url)
a = driver.find_elements_by_class_name("item-col eta-route")

However, a=[] as output. Even attempting

d = driver.find_elements_by_class_name("currentTiming")

results in:

[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="6b1f2344-8e8a-4f48-a29a-54610179d62f", element="38e7ce58-ea66-4461-bee7-f81ac414595b")>]

Seeking advice on how to properly extract information from the page using selenium.

javascript python-2.7 selenium

Answer 1

Answer №1

There may be an issue with the class name item-col eta-route in your HTML, which could have numerous similar classes.

Instead of that, you can try using this CSS selector:

div[_ngcontent-c2][class='item-col eta-route']

To retrieve the value of 15 T.

Implementing a webdriver wait would enhance the stability of your script.

wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[_ngcontent-c2][class='item-col eta-route']")))
print(element.text)

For extracting the value:

marquee_text = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[_ngcontent-c2][class='item-col eta-destination'] marquee")))
print(marquee_text.text)

Make sure to import these:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Answer 2

There may be an issue with the class name item-col eta-route in your HTML, which could have numerous similar classes.

Instead of that, you can try using this CSS selector:

div[_ngcontent-c2][class='item-col eta-route']

To retrieve the value of 15 T.

Implementing a webdriver wait would enhance the stability of your script.

wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[_ngcontent-c2][class='item-col eta-route']")))
print(element.text)

For extracting the value:

marquee_text = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[_ngcontent-c2][class='item-col eta-destination'] marquee")))
print(marquee_text.text)

Make sure to import these:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

extracting web content with selenium and javascript integration

Answer №1

Similar questions

Passing the socket.io instance to an express route

Encountering a 405 error code during a Selenium login session

How to upload multiple files using AngularJS and Laravel without using a form tag

Showing JSON data as a table in an HTML format

What could be causing this minimal Angular - WebTorrent configuration to fail?

Only the initial upload file is being passed through the Apollo Express server, with the remaining files missing in action

Create automatic transcripts for videos, including subtitles and captions

Set up an array data by extracting values from an array prop within a Vue component

Is there a way to eliminate a specific input box using the jquery remove function?

When using the combination of Cucumber/Capybara with Angular, the test successfully passes with the Selenium driver but does not work with

Exploring ways to cycle through a select dropdown in React by utilizing properties sent from the Parent Component

Exploring the Power of Jasmine Testing with Ternary Conditions

Switching Perspective on Live ExpressJS Path -- Node.JS

Removing a row from a table in a React component

Changing the entire content of a webpage from the server using AJAX

Node.js server experiencing delays due to V8 processing constraints

At what point should I end the session with the webdriver?

Having trouble displaying values from nested JSON in a datatable

Most effective method for displaying modals in React

Is the JSON data not matching the file's content during validation?