Here's my code:
library(XML)
my_URL <- "http://www.velocitysharesetns.com/viix"
tables <- readHTMLTable(my_URL)
https://i.sstatic.net/Vadsw.png
The code above fetches only the table at the top of the webpage. The pie chart seems to be overlooked due to its Javascript nature. Any easy methods available to extract the two percentage values from the chart?
I've explored using RSelenium
, but encountering some errors for which I haven't found solutions yet.
> RSelenium::startServer()
Error in if (file.exists(file) == FALSE) if (!missing(asText) && asText == :
argument is of length zero
In addition: Warning messages:
1: startServer is deprecated.
Users in future can find the function in file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see vignette("RSelenium-docker", package = "RSelenium")
2: running command '"java" -jar "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/selenium-server-standalone.jar" -log "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/sellog.txt"' had status 127
3: running command '"wmic" path win32_process get Caption,Processid,Commandline /format:htable' had status 44210
>
Following Phillip's advice, here is my solution:
library(XML)
# extarct HTML
doc.html = htmlTreeParse('http://www.velocitysharesetns.com/viix',
useInternal = TRUE)
# convert to text
htmltxt <- paste(capture.output(doc.html, file=NULL), collapse="\n")
# get location of string
pos = regexpr('CBOE SHORT-TERM VIX FUTURE', htmltxt)
# extarct from "pos" to nchar to end of string
keep = substr(htmltxt, pos, pos+98)
Output:
> keep
[1] "CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],\n\n ['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],\n"