What is the process for utilizing R to extract data from a pie chart on a website that uses JavaScript?

Question

What is the process for utilizing R to extract data from a pie chart on a website that uses JavaScript?

Here's my code:

library(XML)

my_URL <- "http://www.velocitysharesetns.com/viix"

tables <- readHTMLTable(my_URL)

https://i.sstatic.net/Vadsw.png

The code above fetches only the table at the top of the webpage. The pie chart seems to be overlooked due to its Javascript nature. Any easy methods available to extract the two percentage values from the chart?

I've explored using RSelenium, but encountering some errors for which I haven't found solutions yet.

> RSelenium::startServer()
Error in if (file.exists(file) == FALSE) if (!missing(asText) && asText ==  : 
  argument is of length zero
In addition: Warning messages:
1: startServer is deprecated.
Users in future can find the function in file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity. 
Options include manually starting a server see vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see  vignette("RSelenium-docker", package = "RSelenium") 
2: running command '"java" -jar "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/selenium-server-standalone.jar" -log "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/sellog.txt"' had status 127 
3: running command '"wmic" path win32_process get Caption,Processid,Commandline /format:htable' had status 44210 
>

Following Phillip's advice, here is my solution:

library(XML)



# extarct HTML

doc.html = htmlTreeParse('http://www.velocitysharesetns.com/viix',
                         useInternal = TRUE)


# convert to text

htmltxt <- paste(capture.output(doc.html, file=NULL), collapse="\n")

# get location of string

pos = regexpr('CBOE SHORT-TERM VIX FUTURE', htmltxt)

# extarct from "pos" to nchar to end of string 

keep = substr(htmltxt, pos, pos+98)

Output:

> keep
[1] "CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],\n\n    ['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],\n"

javascript r web-scraping

Answer 1

Answer №1

RSelenium Method

I found success with this approach using RSelenium on Windows 7 after inspecting the webpage source. I specifically utilized chromedriver.exe for this solution.

library(RSelenium)
checkForServer(update = TRUE)

#### Chromedriver Implementation
startServer(args = c("-Dwebdriver.chrome.driver=C:/Stuff/Scripts/chromedriver.exe")) 

remDr <- remoteDriver(remoteServerAddr = "localhost", browserName="chrome", port=4444)

### Launch Chrome
remDr$open()

remDr$navigate("http://www.velocitysharesetns.com/viix")

b <- remDr$findElements(using="class name", value="jqplot-pie-series")

sapply(b, function(x){x$getElementAttribute("outerHTML")})

The final command provides the following output:

[[1]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 100px; top: 106px;\"><div style=\"color:white;font-weight:bold;\">82%</div></div>"

[[2]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 159px; top: 67px;\"><div style=\"color:white;font-weight:bold;\">18%</div></div>"

These results clearly display the percentage numbers which can be easily extracted.

Direct HTML Retrieval

Further data extraction can be achieved by analyzing the raw html source, as the necessary information is already present within it. You will likely locate the relevant details within the script below:

<script type="text/javascript" language="javascript">
$(document).ready(function(){
var data = [


['CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],

['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],

];

This script contains the desired data, with the figures rounded before being displayed graphically.

Answer 2

RSelenium Method

I found success with this approach using RSelenium on Windows 7 after inspecting the webpage source. I specifically utilized chromedriver.exe for this solution.

library(RSelenium)
checkForServer(update = TRUE)

#### Chromedriver Implementation
startServer(args = c("-Dwebdriver.chrome.driver=C:/Stuff/Scripts/chromedriver.exe")) 

remDr <- remoteDriver(remoteServerAddr = "localhost", browserName="chrome", port=4444)

### Launch Chrome
remDr$open()

remDr$navigate("http://www.velocitysharesetns.com/viix")

b <- remDr$findElements(using="class name", value="jqplot-pie-series")

sapply(b, function(x){x$getElementAttribute("outerHTML")})

The final command provides the following output:

[[1]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 100px; top: 106px;\"><div style=\"color:white;font-weight:bold;\">82%</div></div>"

[[2]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 159px; top: 67px;\"><div style=\"color:white;font-weight:bold;\">18%</div></div>"

These results clearly display the percentage numbers which can be easily extracted.

Direct HTML Retrieval

Further data extraction can be achieved by analyzing the raw html source, as the necessary information is already present within it. You will likely locate the relevant details within the script below:

<script type="text/javascript" language="javascript">
$(document).ready(function(){
var data = [


['CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],

['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],

];

This script contains the desired data, with the figures rounded before being displayed graphically.

What is the process for utilizing R to extract data from a pie chart on a website that uses JavaScript?

Answer №1

Similar questions

How can I apply conditional styles in React using Sass?

Is it appropriate to delete the comma in the Ghost Handlebars Template?

Serve different files using Node.js socket.io webserver, not just index.html

Experiencing difficulties with implementing both Bootstrap NAV TABS and Grid System on a single webpage

Customizing Colors in ggplot2: How to Specify Unique Colors for Two geom_point Layers

Using ggplot2 to create smooth curves for histograms or densities

Click to solve the equation

Customizing the directory for server files in Next.js

ReactJs - Organize your data with the sort method

Is there a way for me to initialize a PHP array using data from an AJAX call?

React's PrivateRouter component ensures secure routing for authorized users

Preserving measurements of variables in R

Pattern matching algorithm designed to eliminate background-color attributes

Encountering a "Module not found" error while trying to run npm start in a Create React App

When collapsing an accordion, Bootstrap radio buttons fail to properly select

Streamlining the process of running a "for loop" in R across the entire dataset while also keeping track of counts

Node.js & Express: Bizarre file routes

Knockout Mapping is causing a complete re-render of all elements

Using a Javascript while loop to iterate through an array and display the elements in the .innerHTML property

The size of the card image and content area will remain constant based on the specified percentage