Extract data from a website with Python and selenium

Question

Extract data from a website with Python and selenium

I need to scrape the data from a table that seems to be generated in JavaScript. I'm using selenium and Python3 for this task. While looking at how others have approached similar challenges, I noticed they use xpath to locate the tables before scraping them. However, I am struggling to determine the correct xpath to use.

How can I extract the content of the table? If xpath is the way to go, how can I identify the right xpath(s) by inspecting the source code of the webpage?

from selenium import webdriver                                                                                                                                                                                                                                              
driver = webdriver.Chrome('path/to/chromedriver.exe')                                      
url = https://ultrasignup.com/results_event.aspx?did=6727
driver.get(url)

# Now I need to get the tables contents. I might do something like this:
table = driver.find_elements_by_xpath('my_xpath')
table_html = table.get_attribute('innerHTML') # not sure what innerHTML is...
df = read_html(table_html)[0]
print(df)
driver.close()

javascript python-3.x selenium web-scraping selenium-chromedriver

Answer 1

Answer №1

In my opinion, scraping data may not be necessary as there is an available API for access.

By following this link, you can view well-structured information from the table you supplied:

Here's a snippet of code to demonstrate how you can use the API:

import json
import requests

url = 'https://ultrasignup.com/service/events.svc/results/6727/json'

response = requests.get(url)

# Extract all individuals from the data
people = [person for person in response.json()]

# Display details of the first individual
print(people[0])

I trust this information proves beneficial!

Answer 2

In my opinion, scraping data may not be necessary as there is an available API for access.

By following this link, you can view well-structured information from the table you supplied:

Here's a snippet of code to demonstrate how you can use the API:

import json
import requests

url = 'https://ultrasignup.com/service/events.svc/results/6727/json'

response = requests.get(url)

# Extract all individuals from the data
people = [person for person in response.json()]

# Display details of the first individual
print(people[0])

I trust this information proves beneficial!

Answer 3

Answer №2

To pinpoint the correct xpath, carefully examine the elements within the table and delve into the source code. Once you determine where the table content is located in the tags, construct your xpath step by step.

For instance:


<div class="example">
<p class="example2">
<table class="example3"> 
<!--Additional attributes may be present-->
contents...
</table>
</p>
</div>

Start your xpath with //div[@class="example"] Now you are within the div.

Next step: //div[@class="example"]//p[@class="example2"] You are now inside the paragraph tag.

Final Step:

xpath = "//div[@class='example']//p[@class='example2']//table[@class='example3']"

table = driver.find_elements_by_xpath('xpath')

You can now retrieve the table, access any desired attributes, or extract the table contents.

Answer 4

To pinpoint the correct xpath, carefully examine the elements within the table and delve into the source code. Once you determine where the table content is located in the tags, construct your xpath step by step.

For instance:


<div class="example">
<p class="example2">
<table class="example3"> 
<!--Additional attributes may be present-->
contents...
</table>
</p>
</div>

Start your xpath with //div[@class="example"] Now you are within the div.

Next step: //div[@class="example"]//p[@class="example2"] You are now inside the paragraph tag.

Final Step:

xpath = "//div[@class='example']//p[@class='example2']//table[@class='example3']"

table = driver.find_elements_by_xpath('xpath')

You can now retrieve the table, access any desired attributes, or extract the table contents.

Extract data from a website with Python and selenium

Answer №1

Answer №2

Similar questions

jQuery does not support the addition of new fields in HTML

What is the best way to extract individual objects from several arrays and consolidate them into a single array?

How to toggle two classes simultaneously using JQuery

Customize the focus function for an individual element

Improving Zen Coding to integrate with JavaScript files on Sublime Text2

Adjust the color of the font within a div element when hovering over it

Tips on verifying if a web element is positioned in the top left corner

Unexpected behavior with if statements in jQuery

The Express server automatically shuts down following the completion of 5 GET requests

Interactive pop-up windows in Bootstrap

Issue with ng-selected when used alongside ng-options or ng-repeat in Angular

The getBBox() method of SVG:g is returning an incorrect width value

The function `driver.getScreenshotAs(OutputType.FILE)` may encounter limitations when attempting to store the entire screenshot in the specified destination

a gentle breeze gathers a multitude of entities rather than items

importing selenium webdriver certificate

Attempting to incorporate icons into a Material UI table design

Error encountered in selenium python: 'dict' object does not have 'click' attribute

Shuffling Numbers in an Array After Removing an Element with AngularJS

How can I stretch a background image using jquery to cover the entire document instead of just the window

Error in Displaying Vuetify Child Router View