Bizarre symbols observed while extracting data from HTML tables produced by Javascript

I am in the process of extracting data from

Specifically, my focus is on the "tournament-page-data-results" div within the source code. Upon inspecting the HTML source code, the data does show up, but it appears with a mix of real information and random characters like this:

"SA÷2¬~ZA÷ATP - SINGLES: Australian Open (Australia), hard¬ZEE÷MP4jLdJh¬ZB÷3473162¬ZC÷n5bYULYo¬ZD÷p...

Even after attempting to convert from 'utf-8' to 'ascii', the issue persists with different random characters.

What is the appropriate encoding solution for this situation? Or perhaps there is an alternative method I should consider? Currently, I am using R (rvest package) for web scraping to avoid manually opening each page in a browser. If necessary, I could switch to Python for a different approach.

Answer №1

Comments have pointed out that this is not an encoding issue. The div's text content is written in a specific table markup language that is processed by javascript.

To decode this, you can begin by breaking down each match separated by a tilde (~) and the data fields separated by the "¬" character. Each field follows a key-value pair structure split by a "÷".

Transforming this into a data frame is challenging due to the non-rectangular data. Converting it to JSON format would be a simpler approach.

Below is an example showcasing how to extract specific fields of interest:

"https://www.flashscore.com/tennis/atp-singles/australian-open-2020/results/" %>%
  xml2::read_html() %>% 
  rvest::html_node("#tournament-page-data-results") %> 
  rvest::html_text() %>% strsplit("[~]") %>% unlist() %>% strsplit("\u00ac") %>
  lapply(function(x) gsub("^.*\u00f7", "", x)) %>%
  lapply(function(x){
    y <- as.numeric(grep("\\d{10}", x, value = TRUE))
    y <- as.difftime(y, units = "secs") + as.POSIXct("1970-01-01 00:00:00")
    x[grep("\\d{10}", x)] <- as.character(y)
    return(x)}) %>% 
  lapply(`[`, -(1:2)) %>% 
  lapply(function(x) x[!grepl("^[[:alnum:]]{8}$", x)]) %>
  lapply(function(x) grep("[a-z ]", x, value = TRUE)[-c(2,4,6,8)]) %>
  `[`(-(1:2)) %>
  {do.call(rbind, .)} %>
  as.data.frame(stringsAsFactors = FALSE) %>
  `names<-`(c("Date", "Stage", "Player1", "Player2")) %>
  tibble::as.tibble()
#> # A tibble: 127 x 4
#>    Date                Stage          Player1           Player2          
#>    <chr>               <chr>          <chr>             <chr>            
#>  1 2020-02-02 07:45:00 Final          Djokovic N. (Srb) Thiem D. (Aut)   
#>  2 2020-01-31 07:45:00 Semi-finals    Thiem D. (Aut)    Zverev A. (Ger)  
#>  3 2020-01-30 07:45:00 Semi-finals    Federer R. (Sui)  Djokovic N. (Srb)
#>  4 2020-01-29 07:45:00 Quarter-finals Thiem D. (Aut)    Nadal R. (Esp)   
#>  5 2020-01-29 02:45:00 Quarter-finals Wawrinka S. (Sui) Zverev A. (Ger)  
#>  6 2020-01-28 07:50:00 Quarter-finals Raonic M. (Can)   Djokovic N. (Srb)
#>  7 2020-01-28 03:15:00 Quarter-finals Sandgren T. (Usa) Federer R. (Sui) 
#>  8 2020-01-27 08:05:00 1/8-finals     Rublev A. (Rus)   Zverev A. (Ger)  
#>  9 2020-01-27 07:15:00 1/8-finals     Nadal R. (Esp)    Kyrgios N. (Aus) 
#> 10 2020-01-27 03:15:00 1/8-finals     Medvedev D. (Rus) Wawrinka S. (Sui)
#> # ... with 117 more rows

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

I am unable to determine if I have already selected a List Item

My goal is to have a functionality where clicking on "Download Drivers" will open the list, and clicking again will close it. This should be achieved with onclick events only, no hover effects. Additionally, I want the list to remain open even if I click o ...

The compatibility issue between Angular JS App and JSPDF is causing malfunctions specifically in Internet Explorer

I am currently working on an Angular JS application that utilizes JSPDF for generating PDFs. While the PDF generation functionality works perfectly fine on Chrome, Firefox, and Safari, it encounters issues on Internet Explorer (IE). The specific error mes ...

Combining array values in Node.js/JavaScript by matching key values

Looking to merge two arrays by matching key values in JavaScript/Node.js. Check out the code snippet below: var userData=[{'email':'<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d8b998bfb5b9b1b4f6bbb7b5">[em ...

Error: Cannot access the 'top' property of an undefined object

Here is a snippet of my jQuery code: $(document).ready(function(){ $('.content-nav a').on('click',function(){ var str = $(this).attr("href"); var the_id = str.substr(1); $("#container").animate({ scrollTop: $ ...

Using JavaScript to manage form input values in React

I am currently coding a basic application using NextJS and bulma CSS. The snippet below shows the form I am working on: const MyPage = () =>{ const [firstName, setFirstName] = useState('') const [secondName, setSecondName] = useState('&ap ...

Where should the JQuery hashchange event be added for optimal placement?

I am currently utilizing the JQuery hashchange event. $(window).on('hashchange', function () { //perform certain actions }); On the initial load, if my URL includes a hash value, I know that it must be triggered by $(window).hashchange(); Is i ...

How can we ensure that an enum is accessible throughout the entire meanjs stack?

Currently, I am exploring the meanjs technology stack and facing a challenge in creating a centralized enum array that can be accessed throughout the project, from MongoDB to Angular. Can anyone suggest a smart solution for accomplishing this task? ...

Effective approach for managing a series of lengthy API requests

I am developing a user interface for uploading a list of users including their email and name into my database. After the upload process is complete, each user will also receive an email notification. The backend API responsible for this task is created u ...

The issue with ui-router failing to render the template in MVC5

I'm having trouble setting up a basic Angular UI-Router configuration. My goal right now is to have a hardcoded template render properly, and then work on loading an external .html file. My project is using MVC5, so I'll provide the necessary fi ...

Utilizing eval properly in JavaScript

One method I am using is to load a different audio file by clicking on different texts within a web page. The jQuery function I have implemented for this purpose is as follows: var audio = document.createElement('audio'); $(".text_sample ...

The comparison between installing a JavaScript library and simply copying .js files

As I dive into the world of web development and JavaScript, I've noticed that many open-source JavaScript libraries like jqueryUI come with readme files containing installation instructions. These instructions often mention the need to install additio ...

Unique phrase: "Personalized text emphasized by a patterned backdrop

I'm facing a challenge and struggling to find a way to highlight text using CSS or jQuery. My goal is to have an image on the left, another one on the right, and a repeated image in between. Since there are some long words involved, I need a dynamic s ...

The output of jQuery('body').text() varies depending on the browser being used

Here is the setup of my HTML code: <html> <head> <title>Test</title> <script type="text/javascript" src="jQuery.js"></script> <script type="text/javascript"> function initialize() { var ...

“What is the process of setting a referenced object to null?”

Here is an example of the code I'm working with: ngOnInit{ let p1 : Person = {}; console.log(p1); //Object { } this.setNull<Person>(p1); console.log(p1); //Object { } } private setNull<T>(obj : T){ obj = null; } My objective is to ...

Determine the most recent API response and disregard any outdated responses from previous calls

I am currently working on a search page where the user can input text into a search box. With each character they enter, an ajax call is made to update the UI. However, I am facing an issue in determining the response from the last API call. For example, i ...

Utilizing ASP.NET MVC Kendo Grid to invoke a controller method via JavaScript

To implement a custom modal confirmation popup, I need to invoke the method .Destroy("Remove", "Attachment") from javascript. How can I trigger the Remove method in javascript? I have identified where I would like to make this call in the code snippet. Add ...

Is it possible to transform an array of objects into a new array of objects?

After searching for a similar question, I realized none of them really addressed my issue... EDIT: My main goal is to create a reusable dropdown component where I can pass items as props directly. These items need to be objects with both a key (for the fi ...

Apply CSS styling (or class) to each element of a React array of objects within a Component

One issue I'm facing involves adding specific properties to every object in an array based on another value within that same object. One such property is the background color. To illustrate, consider an array of objects: let myObj = { name: "myO ...

Discovering two local peaks for the y values within a specified range

#!I am troubleshooting a mathematical question that involves finding 2 local maximums for the variable y based on a specific equation. Here is the code I am using to visualize the data and plot the graph: x <-seq(-5,5,length =10001) y<-(10 *((x-1)^2 ...

Issue: Typescript/React module does not have any exported components

I'm currently facing an issue with exporting prop types from one view component to another container component and using them as state type definitions: // ./Component.tsx export type Props { someProp: string; } export const Component = (props: ...