Extract feedback (produced through javascript) utilizing RSelenium/XML

I am interested in extracting comments from online news sources.

For instance, take a look at this article: Story

I am encountering a similar issue as discussed in this thread: Web data scraping (online news comments) with Scrapy (Python)

While I understand the challenges with R (thanks @yuvi for the detailed response), I have found it difficult to come up with a solution for my specific problem.

The code snippet below allows me to extract the first comment, but I am still struggling to retrieve the rest:

library(RSelenium)
checkForServer()
startServer()

remDr <- remoteDriver(remoteServerAddr = "localhost" 
                  , port = 4444
                  , browserName = "firefox"
)
remDr$open()

remDr$navigate("http://www.tagesanzeiger.ch/schweiz/standard/Auns-sagt-Ja-zur-EcopopInitiative/story/27047608")

comments <- remDr$findElement(using = 'xpath', "//*[(@class = 'message')]")
comments$getElementAttribute("outerHTML") #will return the first comment

Any help would be greatly appreciated.

Answer №1

The comments can be found within a div labeled id=allComments where you have access to the element and can retrieve its HTML content:

comments <- remDr$findElement('css', "#allComments")
output <- comments$getElementAttribute("outerHTML")[[1]]

data <- htmlParse(output, encoding = "UTF-8")
> head(data["//*[(@class = 'message')]", fun = xmlValue])
[[1]]
[1] "Das einzige Richtige!! Das Schweizer-Volk wurde schon zu oft angelogen. JA zu ECOPOP"

[[2]]
[1] "Der Wutbürger ist selten ein guter Ratgeber!!"
...

Additionally, the comments are also accessible in JSON format at

myComments <- fromJSON("http://www.tagesanzeiger.ch/api/articles/27047608/comments")

> lapply(myComments$comments[1:6], '[[', 'message')
[[1]]
[1] "Warum ich gezwungen werde bei Ecopop JA zu stimmen? Weil der BR bis heute keine Anstalt macht, die angenommene Initiative zur Masseneinwanderung umzusetzen! Oder glaubt der BR in allem Ernst, dass er unserem Land eine jährliche Zuwanderung von 80'000 Menschen zumuten könne?"

[[2]]
[1] "Durch Ecopop kommen gewissenlose Anklagevertreter zur Macht, weil sie den Schweizern wohlklingende Versprechen machen. Doch sie lügen! Sie halten ihre Versprechungen nicht. Sie werden das nie tun! Scharfmacher befreien sich selbst, aber sie versklaven das Volk. Lasst uns nun dafür kämpfen, die Schweiz zu befreien \023 die nationalen Schranken niederzureißen \023 die Gier, den Hass und die Ausländerfeindlichkeit beiseite zu werfen. Lasst uns kämpfen für eine Schweiz der Vernunft \023 eine Schweiz, in der Einwanderung und Fortschritt zu unser aller Glück führen werden. Schweizer, im Namen der direkten Demokratie, lasst uns zusammen stehen! Ein Federzug von Schweizer Hand, und neu erschaffen wird die Erde. Blicke empor in das Licht der Hoffnung, liebe Schweiz! Blicke empor und stimme NEIN zu Ecopop!"

[[3]]
[1] "viele spekulieren hier, dass es ein ja oder ein nein am 30.11 gäbe, was momentan reines Kaffisatzlesen ist. Das hängt schlussendlich alles vom Mobilsierungspotential der Befürworter oder Gegner ab. Was sicher ist: wenn die Gegner wie bei der MEI den Einsatz verschlafen, in der Annahme, dass eh ein nein resultieren werde, könnte es ein böses Erwachen am 30.11 geben."
...

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The element does not have a property named 'className' in the object type '{ props: ReactNode; }'

I am currently in the process of converting a Next.js project from JavaScript to TypeScript, and I encountered an issue: Property 'className' does not exist on type '{ props: ReactNode; }'. In JavaScript, I could access className from p ...

Using node.js version 10.0.0 to tinker with gulp

I was working on a node.js api project that functioned perfectly with node.js v8.1.4 & npm v5.0.3. However, upon transitioning to node.js v10.0.0 & npm v5.6.0, I encountered the following error: [email protected] ecosystem E:\opensource& ...

The best approach for setting a select value and managing state in React using TypeScript

Currently, I am in the process of familiarizing myself with TypeScript within my React projects. I have defined a type for the expected data structure (consisting of name and url). type PokedexType = { name: string; url: string; } The API respon ...

I can't figure out why I keep getting the error message saying that $ is not

Looking to execute a PHP file using AJAX, I attempted the following: <html> <script type="text/javascript"> setInterval(function(){ test(); },3000); function test(){ $.ajax({ type: "POST", url: "GetMachineDetail.php", data: ...

AngularJS tree grid component with customizable cell templates

I have been utilizing the tree-grid component in AngularJS from this link: Here is an example of it on Plunker: http://plnkr.co/edit/CQwY0sNh3jcLLc0vMP5D?p=preview In comparison to ng-grid, I am unable to define cellTemplate, but I do require the abilit ...

Guide on how to streamline JSON output from aggregation result

I have written a NodeJs api using mongo db aggregation to obtain some output. However, the result I received is not what I expected. Can anyone help me figure out how to get the desired output? app.get('/polute', function (req, res) { Light. ...

What is the best way to retrieve state within a property of a React class component?

I have encountered an issue with a React Class Component where I am trying to separate a part of the rendered JSX but unable to access the Component's state within the separated JSX as a property of the class. The scenario is quite similar to the fol ...

Accessing props in setup function in Vue 3

I am encountering issues when trying to access the props value (an array) in my composition API setup. The component I have is called DropDown, and I am passing it an array of objects. Here's what I need to achieve: export default { emits: ['up ...

What is the best way to retrieve a GWT textbox value using Selenium WebDriver?

I'm currently testing my GWT application with selenium, and the HTML generated by GWT Textbox appears like this: <input type="text" class="gwt-TextBox" > Even though there's no value visible in the code above, I can see text in the UI. Is ...

Disappear solely upon clicking on the menu

Currently, I am working on implementing navigation for menu items. The functionality I want to achieve is that when a user hovers over a menu item, it extends, and when they move the mouse away, it retracts. I have been able to make the menu stay in the ex ...

Learn how to retrieve data by clicking on the previous and next buttons in FullCalendar using Vue.js

Seeking guidance on retrieving calendar data from the database for my Vue frontend, I have incorporated the fullcalendar API. Successfully able to retrieve data for the current week, however facing challenges when attempting to fetch data for the previous ...

The removeAttribute function has the ability to remove the "disabled" attribute, but it does not have the capability to remove

When it comes to my JavaScript code, I have encountered an issue with two specific lines: document.getElementsByName('group')[0].removeAttribute('disabled'); document.getElementsByName('group')[0].removeAttribute('checke ...

What is the best way to send checkbox values to ActionResult in MVC5?

I am working on an MVC5 application and within my view, I have the following code snippet: <div class="form-group"> @Html.LabelFor(model => model.CategoryID, "Category", htmlAttributes: new { @class = "control-label col-md-3" }) <div c ...

Save the selected value from the dropdown menu and automatically check the corresponding checkbox when it

I'm attempting to update the 'value' of a checkbox once an item is chosen from a dropdown list and then the checkbox itself is clicked. I have created a jQuery function that captures the value from the dropdown list (I omitted the code for t ...

Nested scrolling bars within each other

Currently, I am importing photos from a Facebook page and displaying them on my jQuery mobile webpage using the photoSwipe plugin. However, there seems to be an issue with the final appearance of the images. Pay attention to the image where the red arrow ...

What is the best method for inserting dynamic HTML content (DIV) with JavaScript?

Can someone assist me in dynamically adding HTML content when data is returned from the server-side? I am currently using ajax/jQuery to handle server-side processing requirements. The success code section of my ajax function allows me to write HTML elemen ...

Is there a way to display two words side by side in React components?

I previously had the following code: projectName: project.get('name') === 'default' ? 'No Project' : project.get('name') In the render() method, it was written like this: <div className='c-card__projects&ap ...

Use jQuery to target an element by its class name and node index

I am trying to target a specific element with the class ".myclass" by its node index, but I keep encountering an error stating that the element has no "animate" function. Here is an example: <div class="myclass"></div> <div class="myclass" ...

Tips for sending a JavaScript variable value to jQuery validate() submitHandler while performing an AJAX request

My validation on ajax calls is currently using a jQuery plugin. This is the current setup: $("#invoiceForm").validate({ rules: { plateNumber: { required: true, }, plateIssueState: { required: true, } }, ...

"Embracing Progressive Enhancement through Node/Express routing and the innovative HIJAX Pattern. Exciting

There may be mixed reactions to this question, but I am curious about the compatibility of using progressive enhancement, specifically the HIJAX pattern (AJAX applied with P.E.), alongside Node routing middleware like Express. Is it feasible to incorporate ...