How can I generate a list of separate words from innerText using JavaScript?

Wondering how to parse a string correctly using JavaScript? Take a look at the example below:

<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p> etc...

The objective is to convert each string into an array without the HTML elements present.
For instance:

<strong>word</strong>

This specific string should be converted to an array like this:

['word', ':']

Similarly, for the string:

<p><strong>word</strong>: this is a sentence</p>

The expected array output would be:

['word', ':', 'this', 'is', 'a', 'sentence']      

If you are currently facing issues in achieving this outcome with your JavaScript code and it's generating individual characters instead of words, then have a look at the snippet provided below:

//w = the string I want to parse
var p = document.querySelector("p").innerText;

var result = p.split(' ').map(function(w) {
  if (w === '')
    return w;
  else {
    var tempDivElement = document.createElement("div");
    tempDivElement.innerHTML = w;

    const wordArr = Array.from(tempDivElement.textContent);
    return wordArr;
  }
});
console.log(result)
<p><strong>word</strong>: this is a sentence</p>

Answer №1

To begin, I would create a temporary div and then extract the inner text from it. Utilizing match(), I can identify words by using \w to match letters, numbers, and underscores. This approach separates punctuation like : into distinct words, aligning with your desired outcome.

p = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>'

var tempDivElement = document.createElement("div");
tempDivElement.innerHTML = p;

let t = tempDivElement.innerText
let words = t.match(/\w+|\S/g)
console.log(words)

If the goal is to solely extract words, then narrowing down the match to \w would be more suitable:

p = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>'

var tempDivElement = document.createElement("div");
tempDivElement.innerHTML = p;

let t = tempDivElement.innerText
let words = t.match(/\w+/g)
console.log(words)

Answer №2

To tackle this issue, consider utilizing the built-in DOMParser method:

let text = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p> etc...';
let documentObject = new DOMParser().parseFromString(text, 'text/html');

Next, make sure to recursively navigate through the documentObject in order to access its HTMLDocument child nodes.

Alternatively, you could employ a client-side JavaScript web scraping tool like artoo.js to analyze the nodes effectively.

In cases where strings are not enclosed within an actual tag, such as ": or," it might be necessary to enclose the string within a <p> tag before proceeding.

Answer №3

Referencing this helpful answer from , you have the ability to perform a recursive iteration through each node and store the text components in an array. For example:

var items = [];
var elem = document.querySelector("div");
function getText(node) {
    // dive into every child node
    if (node.hasChildNodes()) {
        node.childNodes.forEach(getText);
    } else if (node.nodeType === Node.TEXT_NODE) {
        const text = node.textContent.trim();
        if (text) {
            var words = text.split(" ");
            words.forEach(function(word) {
              items.push(word);
            });
        }
    }
}
//
getText(elem);
console.log(items);
<div><strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p></div>

Answer №4

To achieve this, you can create a temporary HTML element and then extract its textContent.

Here is an example:

/* Extracting words separated by space only */
function getWordsSeparatedBySpace(htmlString) {
  var div = document.createElement('div');
  div.innerHTML = htmlString;
  return (div.textContent || div.innerText).toString().split(" ");
};

/* Extracting words with HTML tags and spaces */
function getWordsWithTagsAndSpaces(htmlString) {
  var div = document.createElement('div');
  div.innerHTML = htmlString;
  var children = div.querySelectorAll('*');
  for (var i = 0; i < children.length; i++) {
    if (children[i].textContent)
      children[i].textContent += ' ';
    else
      children[i].innerText += ' ';
  }
  return (div.textContent || div.innerText).toString().split(" ");
};

console.log('Result of function 1:');
console.log(getWordsSeparatedBySpace("<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>etc..."));
console.log('Result of function 2: ');
console.log(getWordsWithTagsAndSpaces("<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>etc..."));

Answer №5

  1. In order to ensure proper functionality within this Snippet, a <div> is encompassing the target HTML.
  2. Retrieve the text using .textContent
  3. Sanitize the text by utilizing .replace() with the regex /(\s+|\n)/g, which will substitute any consecutive spaces OR newline characters with a single space. The trimmed string covers both ends.
  4. Proceed to split the string at each space using .split().

let text = document.querySelector('.content').textContent;
let clean = text.replace(/(\s+|\n)/g, ' ').trim();
let array = clean.split(' ');
console.log(array);
<div class='content'>
  <strong>word</strong>: or <em>word</em> or
  <p><strong>word</strong>: this is a sentence</p> etc...
</div>

Answer №6

Figuring out how to handle the colon following the "word" value may seem complex, but by utilizing the textContent attribute and implementing some string manipulation techniques, you can construct a string that is suitable for the desired array split operation.

To begin, locate the element that needs to be parsed:

var p = document.querySelector("p");

Subsequently, extract the text content from within it using the "textContent" attribute:

var pContent = p.textContent;

Then, refine the content to ensure proper separation of any "non-word" characters from the words while preserving them (the spaces at both ends manage non-word characters before and after the words):

var result = pContent.replace(/(\W+)/g, " $0 ");

Afterwards, eliminate any leading or trailing spaces to prevent empty elements at the start and end of the array:

var result = result.trim();

Finally, split the modified string based on white space blocks:

var result = result.split(/\s+/);

Furthermore, you have the option to condense all of this manipulation into a single line of code, as demonstrated in the compact solution below:

var element1 = document.querySelector("#element1");
var element2 = document.querySelector("#element2");
var element3 = document.querySelector("#element3");

function elementTextToArray(element) {
  return element.textContent.replace(/(\W+)/g, " $0 ").trim().split(/\s+/);
}

console.log(elementTextToArray(element1));
console.log(elementTextToArray(element2));
console.log(elementTextToArray(element3));
<p id="element1"><strong>word</strong></p>
<p id="element2"><strong>word</strong>: this is a sentence</p>
<p id="element3"><strong>word</strong>: this is a sentence <em>with multiple levels of <strong>depth</strong> in it!!!</em></p>


UPDATE #1 Enhanced the "non-word" evaluation to be comprehensive (captures all non-word characters) and capable of capturing groups of non-word characters (such as "!!!").

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What can possibly be the reason why the HttpClient in Angular 5 is not functioning properly

I am attempting to retrieve data from the server and display it in a dropdown menu, but I am encountering an error while fetching the data. Here is my code: https://stackblitz.com/edit/angular-xsydti ngOnInit(){ console.log('init') this ...

Having issues with $_POST not retrieving values from .post

Below is the javascript snippet that I have written: function submitForm() { var name = document.getElementsByName('name').value ,email = document.getElementsByName('email').value ,subject = document.getElementsBy ...

Warning occurs when trying to access frame with URL in JavaScript; issue arises in poltergeist but not selenium-webdriver

I've been using Selenium-Webdriver as the javascript driver for running Cucumber tests on my Rails app and had consistent results. Recently, I decided to switch to Poltergeist to run headless. Some of my tests involve a Stripe transaction that trigge ...

Manually controlling line breaks in HTML text

When collaborating with designers, they often have strong opinions about word wrapping in the final HTML page. If I am working on a fixed layout (non-responsive) and the designer is not satisfied with how the text wraps, there are several options available ...

Tips for successfully parsing JSON data during an Ajax call

After making an Ajax call, the response.responseText I receive looks like this: . "[ columns :[ { "id":"name", "header":"User name" }, { "id":"birth", "header":"Date of birth" } ], ...

Ajax transmits an array of data to the server

My table contains dynamic data with input fields for each item. I need to send the input field values along with the item's ID to the backend. Encountered Issue When making an AJAX request, the data sent as an array is not coming out as expected. Cod ...

From HTML to Python to Serial with WebIOPi

I am facing a dilemma and seeking help. Thank you in advance for any guidance! My project involves mounting a raspberry pi 2 b+ on an RC Crawler rover, utilizing WebIOPi for the task. However, I am encountering challenges and unable to find useful resourc ...

TypeScript does not have access to the array prototype

Despite searching through various stack overflow responses, I haven't been able to resolve my error. I've attempted the following: A B Below is my TypeScript code snippet: interface Array<T> { asyncForEach(callback: CallableFunction): v ...

A guide on iterating through an array in vue.js and appending a new attribute to each object

To incorporate a new property or array item into an existing virtual DOM element in Vue.js, the $set function must be utilized. Attempting to do so directly can cause issues: For objects: this.myObject.newProperty = "value"; For arrays: ...

Using Ajax/jQuery in combination with Mongodb

My experience with Ajax/jQuery is fairly new. I am currently working on creating a sample HTML page using Ajax/jQuery to retrieve all customers and search for a customer by ID. Each customer has three variables: ID, firstName, and lastName. I am looking t ...

Adding Proxy-Authorization Header to an ajax request

It's surprising that there isn't much information available online about this issue I'm facing. When attempting to include a proxy authorization in the header of my ajax call, I encounter some difficulties. If I send it as shown below, no er ...

Retrieving external JSON data with JavaScript

I am attempting to utilize a specific service for proxy checking. They offer an uncomplicated API that delivers JSON data. My goal is to retrieve this JSON on my own server. Despite various attempts, I consistently encounter either a CORS request issue or ...

Emphasizing the text while making edits to an item within the dhtmlx tree

Whenever I need the user to rename an item on the tree, I trigger the editor for them: tree.editItem(tree.getSelectedItemId()); However, I want the text in the editor to be automatically selected (highlighted). Currently, the cursor is placed at the end ...

HTML script tag without a specified type

Exploring asp.net and frontend technologies for the first time. The code snippet I'm working with is as follows: <ul> @foreach (var item in Model) { <li> <img src="@item" alt="file here" width="100" height=" ...

Is it necessary to include a request in the API route handler in Next.js when passing parameters?

In my API route handler, I have a function for handling GET requests: import { NextRequest, NextResponse } from "next/server"; export async function GET(req: NextRequest, { params }: { params: { id: string } }) { const { id } = params; try { ...

Is the top bar feature malfunctioning in version 4.3.2 of Foundation?

During my previous project, I utilized the open-source Foundation 4 framework and successfully implemented a top bar navigation. Now, as I embark on a new project with Foundation, I have downloaded the Foundation 4.3.2 version from . Despite referencing th ...

Creating dynamic <a> tags using JavaScript

My current view includes a div tag with 2 links - one for displaying the page in English and another for Arabic. I want to modify it so that if the page is already in English, only the Arabic <a> tag will show, and vice versa if the page is in Arabic ...

What's the best way to implement asynchronous state updating in React and Redux?

In my React incremental-style game, I have a setInterval function set up in App.ts: useEffect(() => { const loop = setInterval(() => { if (runStatus) { setTime(time + 1); } }, rate); return () => clearInterval(lo ...

Currently in the process of executing 'yarn build' to complete the integration of the salesforce plugin, encountering a few error messages along the way

I've been referencing the Github repository at this link for my project. Following the instructions in the readme file, I proceeded with running a series of commands which resulted in some issues. The commands executed were: yarn install sfdx plugi ...

What is the process for transforming an asynchronous method into a synchronous version?

I am currently working on creating a functionality similar to the core fs module's methods, where there is an Async method by default and a Sync method if requested like fs.readDir() and fs.readDirSync(). In my case, I have a method named fetchUrls w ...