Transform the content of a textNode into a string

Question

Transform the content of a textNode into a string

Struggling with a textNode that refuses to convert into a string format. My goal is to scrape specific information from a website, and when I utilize an XPath to locate the desired text, all I receive is a textNode. Upon inspecting the textNode in Chrome's Google Development Tool, I can see that it indeed contains the text I am seeking. But how do I transform this textNode into plain text?

Below is the code line being used:

abstracts = ZU.xpath(doc, '//*[@id="abstract"]/div/div/par/text()');

I have attempted methods like .innerHTML, toString, textContent, but none have proven successful thus far.

javascript xpath textnode

Answer 1

Answer №1

When I need to retrieve the content string of a textNode, I typically use Text.wholeText instead of using toString or innerHTML because those methods won't work on objects.

For example: visit https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

The read-only property Text.wholeText returns the full text of all Text nodes logically adjacent to the node, concatenated in document order. This allows you to specify any text node and get all nearby text as one string.

Syntax

str = textnode.wholeText;

Notes and example: Imagine you have a simple paragraph in your webpage stored in a variable called para:

<p>Thru-hiking is great! <strong>No boring election coverage!</strong>
However, <a href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>

If you decide to remove the middle sentence, you can do so like this:

para.removeChild(para.childNodes[1]);

Later, if you want to change the wording to "Thru-hiking is great, but casting a ballot is tricky.", while keeping the hyperlink, you could try:

para.firstChild.data = "Thru-hiking is great, but ";

But be careful, if there are multiple adjacent text nodes, they may not behave as expected. Using wholeText helps to treat them as a single unit. For instance:

assert(para.firstChild.wholeText == "Thru-hiking is great! However, ");

The property wholeText combines the data of adjacent text nodes that are not separated by elements. Additionally, replaceWholeText() allows you to replace the entire text with new text:

para.firstChild.replaceWholeText("Thru-hiking is great, but ");

In some cases, Node.textContent or Element.innerHTML may be more appropriate than wholeText. However, when dealing with mixed content within an element, wholeText and replaceWholeText() can be useful tools.

For more information: https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

Answer 2

When I need to retrieve the content string of a textNode, I typically use Text.wholeText instead of using toString or innerHTML because those methods won't work on objects.

For example: visit https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

The read-only property Text.wholeText returns the full text of all Text nodes logically adjacent to the node, concatenated in document order. This allows you to specify any text node and get all nearby text as one string.

Syntax

str = textnode.wholeText;

Notes and example: Imagine you have a simple paragraph in your webpage stored in a variable called para:

<p>Thru-hiking is great! <strong>No boring election coverage!</strong>
However, <a href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>

If you decide to remove the middle sentence, you can do so like this:

para.removeChild(para.childNodes[1]);

Later, if you want to change the wording to "Thru-hiking is great, but casting a ballot is tricky.", while keeping the hyperlink, you could try:

para.firstChild.data = "Thru-hiking is great, but ";

But be careful, if there are multiple adjacent text nodes, they may not behave as expected. Using wholeText helps to treat them as a single unit. For instance:

assert(para.firstChild.wholeText == "Thru-hiking is great! However, ");

The property wholeText combines the data of adjacent text nodes that are not separated by elements. Additionally, replaceWholeText() allows you to replace the entire text with new text:

para.firstChild.replaceWholeText("Thru-hiking is great, but ");

In some cases, Node.textContent or Element.innerHTML may be more appropriate than wholeText. However, when dealing with mixed content within an element, wholeText and replaceWholeText() can be useful tools.

For more information: https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

Answer 3

Answer №2

In my case, I found that utilizing the nodeValue method was highly effective. For example, if your node happens to be labeled as "abstracts," you can access its value using the following line of code:

nodeValue = abstracts.nodeString

Answer 4

In my case, I found that utilizing the nodeValue method was highly effective. For example, if your node happens to be labeled as "abstracts," you can access its value using the following line of code:

nodeValue = abstracts.nodeString

Transform the content of a textNode into a string

Answer №1

Answer №2

Similar questions

Using Node.js to separate applications on the same URL based on different paths

Using Vue to dynamically upload multiple files simultaneously

Creating a seamless and interactive online platform

I am experiencing difficulties with rendering highcharts to my div

Top methods for handling special characters in a database

Resizable table example: Columns cannot be resized in fixed-data-table

Today's Date Bootstrap Form

I am currently working on obtaining images that are saved by their URL within a PHP file. These images are located within a directory named "images."

managing the HTML class names and IDs for various functions such as styling, jQuery interactions, and Selenium automation

Using jQuery to switch classes when the input is invalid

A limitation exists where manifest-cached files cannot be retrieved with AJAX in web apps added to the Home screen on iOS devices when using jQuery's .ajax

Even after I delete and refresh, the persistent cookie sticks around

Navigate to the following section on an HTML page by clicking a button using jQuery

What steps should I take to implement the features I want using Node.js?

angular 2 checkbox for selecting multiple items at once

Configuring the baseUrl for Axios in a Vue.js application triggers the sending of a request

Arrangement of 3 points on the graphical user interface

Seamless Integration of jQuery Functions

Unlock hidden content with a single click using jQuery's click event

Adding a div element to a React component with the help of React hooks