Extract data from a website with Python and selenium

I need to scrape the data from a table that seems to be generated in JavaScript. I'm using selenium and Python3 for this task. While looking at how others have approached similar challenges, I noticed they use xpath to locate the tables before scraping them. However, I am struggling to determine the correct xpath to use.

How can I extract the content of the table? If xpath is the way to go, how can I identify the right xpath(s) by inspecting the source code of the webpage?

from selenium import webdriver                                                                                                                                                                                                                                              
driver = webdriver.Chrome('path/to/chromedriver.exe')                                      
url = https://ultrasignup.com/results_event.aspx?did=6727
driver.get(url)

# Now I need to get the tables contents. I might do something like this:
table = driver.find_elements_by_xpath('my_xpath')
table_html = table.get_attribute('innerHTML') # not sure what innerHTML is...
df = read_html(table_html)[0]
print(df)
driver.close()     

Answer №1

In my opinion, scraping data may not be necessary as there is an available API for access.

By following this link, you can view well-structured information from the table you supplied:

Here's a snippet of code to demonstrate how you can use the API:

import json
import requests

url = 'https://ultrasignup.com/service/events.svc/results/6727/json'

response = requests.get(url)

# Extract all individuals from the data
people = [person for person in response.json()]

# Display details of the first individual
print(people[0])

I trust this information proves beneficial!

Answer №2

To pinpoint the correct xpath, carefully examine the elements within the table and delve into the source code. Once you determine where the table content is located in the tags, construct your xpath step by step.

For instance:


<div class="example">
<p class="example2">
<table class="example3"> 
<!--Additional attributes may be present-->
contents...
</table>
</p>
</div>

Start your xpath with //div[@class="example"] Now you are within the div.

Next step: //div[@class="example"]//p[@class="example2"] You are now inside the paragraph tag.

Final Step:

xpath = "//div[@class='example']//p[@class='example2']//table[@class='example3']"

table = driver.find_elements_by_xpath('xpath')

You can now retrieve the table, access any desired attributes, or extract the table contents.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

jQuery does not support the addition of new fields in HTML

Recently, I've been working on creating a lucky number generator. Initially, I developed it using C# and now I'm in the process of transitioning it to JavaScript and jQuery. You can view the latest version here. However, I've encountered an ...

What is the best way to extract individual objects from several arrays and consolidate them into a single array?

Currently, I have a collection of objects stored in a variable called listOfObjects. They are not separated by commas because I utilized the Object.entries method to extract these values from another array. console.log(listOfObjects) outputs { q: 'L ...

How to toggle two classes simultaneously using JQuery

Check out this code snippet: http://jsfiddle.net/celiostat/NCPv9/ Utilizing the 2 jQuery plugin allows for the following changes to be made: - The background color of the div can be set to gray. - The text color can be changed to red. The issue arises wh ...

Customize the focus function for an individual element

I am working on a custom component that needs to seamlessly integrate with the native blur and focus functions. My goal is to override these functions in order to achieve the specific functionality I need. Currently, I have managed to override the prototy ...

Improving Zen Coding to integrate with JavaScript files on Sublime Text2

Sublime Text2 is hands down my go-to editor, except for one minor hiccup - the Zen Coding plugin only works with CSS and HTML files. There are plenty of times where I'd love to use Zen Coding with JavaScript or other file types, like incorporating HTM ...

Adjust the color of the font within a div element when hovering over it

I've been attempting to modify the text color and add an underline when a user hovers over it. Despite trying various methods, I haven't been successful. I scoured the internet for a solution but couldn't find one that met my specific requi ...

Tips on verifying if a web element is positioned in the top left corner

I need to verify whether the logo appears in the top left corner. I retrieved the location of the element and I am unsure how to setup the assertions using JUnit. Point loc = driver.findElement(By.className("hdr_logo")).getLocation(); The coordinates are ...

Unexpected behavior with if statements in jQuery

Having recently started using jQuery, I am embarking on the creation of a survey. Each question within the survey resides in its own div and is unveiled upon clicking the "Next" button. Currently, all divs are hidden except for the current one. My approach ...

The Express server automatically shuts down following the completion of 5 GET requests

The functionality of this code is as expected, however, after the fifth GET request, it successfully executes the backend operation (storing data in the database) but does not log anything on the server and there are no frontend changes (ReactJS). const ex ...

Interactive pop-up windows in Bootstrap

I encountered an issue with bootstrap modal forms where the form displays correctly after clicking on the button, but the area in which the form is displayed gets blocked! It's difficult to explain, but you can see the problem by visiting this link. ...

Issue with ng-selected when used alongside ng-options or ng-repeat in Angular

My application features a form where users can fill out their starting point and choose from 350 possible destinations to get directions using Google Maps. Users can select their destination by either clicking on a pin on the map or choosing from a drop-do ...

The getBBox() method of SVG:g is returning an incorrect width value

Hey there, I've been attempting to determine the width of a <g> element and it consistently returns 509.5 pixels regardless of what I do. Initially, I assumed this was the actual size and not scaled. However, upon opening the SVG in Illustrato ...

The function `driver.getScreenshotAs(OutputType.FILE)` may encounter limitations when attempting to store the entire screenshot in the specified destination

In my project, I have implemented a method for capturing screenshots as shown below: public String captureScreen(String imageName) { String screenshot = null; try { if (imageName.equals("")) { imageName = "blank"; } Calendar cal = Cale ...

a gentle breeze gathers a multitude of entities rather than items

When utilizing a restful server with node.js and sending a collection of entities wrapped in one entity object (Lookups), everything seems to be functioning properly. However, the issue arises when breeze views the entities in the collection as plain objec ...

importing selenium webdriver certificate

Prior to initiating selenium, it is necessary to import a specific certificate. Once imported, the execution proceeds as intended. However, with each new test suite run, the certificate is no longer available in Firefox, causing the execution to fail due ...

Attempting to incorporate icons into a Material UI table design

Hello, I've incorporated a Material UI table into one of my projects with a design concept similar to this - https://i.stack.imgur.com/i6Fsj.png I'm aiming to include icons within the tables. Here's the code I've worked on so far - ...

Error encountered in selenium python: 'dict' object does not have 'click' attribute

if href.startswith("https://store.steampowered"): browser.get(href) if browser.current_url.startswith("https://store.steampowered.com/agecheck"): area = browser.find_element_by_id("agecheck_form") location_field = area.find_element ...

Shuffling Numbers in an Array After Removing an Element with AngularJS

I am working with a JSON array that contains tasks: tasks = [{taskcode:1, taskName:'abc'}, {taskcode:2, taskName:'abc1'}, {taskcode:3, taskName:'abc2'}, ..... ]; If I delete a task with the nam ...

How can I stretch a background image using jquery to cover the entire document instead of just the window

I have implemented a plugin called https://github.com/srobbin/jquery-backstretch to effectively stretch the background image. The problem arises when the content of the page is long enough to create a scrollbar. In this case, while scrolling, the backgrou ...

Error in Displaying Vuetify Child Router View

I am currently working on integrating a child router-view to be displayed alongside surrounding components. Here is an overview of my routing setup: { path: "/login", name: "TheLoginView", component: TheLoginView, }, { path: "/dashboa ...