Transform a text with HTML tags into sentences while preserving the separators in Javascript

Here is a unique string with some embedded HTML code:

This is the first sentence. In the second sentence, there is a <a href="http://google.com">Google</a> link! The third sentence may have an image like <img src="http://link.to.image.com/hello.png" /> and ends with a question mark. The last sentence is bolded <b>like this</b>??

I am looking to divide this unique string into sentences and keep the HTML intact. Here is the desired output:

[0] = This is the first sentence.
[1] = In the second sentence, there is a <a href="http://google.com">Google</a> link!
[2] = The third sentence may have an image like <img src="http://link.to.image.com/hello.png" /> and ends with a question mark.
[3] = The last sentence is bolded <b>like this</b>??

Any suggestions on how to achieve this? Maybe using Regex and match?

I found a similar solution that almost fits my needs, but it doesn't handle the HTML tags: JavaScript Split Regular Expression keep the delimiter

Answer №1

The simple part involves parsing; all you need to do is wrap an element around the string. However, splitting the sentences proves to be a bit more challenging. Here is my initial attempt at the task:

var s = 'First sentence. Here is a <a href="http://google.com">Google.</a> link in the second sentence! The third sentence might contain an image like this <img src="http://link.to.image.com/hello.png" /> and ends with !? The last sentence looks like <b>this</b>??';

var wrapper = document.createElement('div');
wrapper.innerHTML = s;

var sentences = [],
buffer = [],
re = /[^.!?]+[.!?]+/g;

[].forEach.call(wrapper.childNodes, function(node) {
  if (node.nodeType == 1) {
    buffer.push(node.outerHTML); // store html
  } else if (node.nodeType == 3) {
    var str = node.textContent; // shift sentences
    while ((match = re.exec(str)) !== null) {
      sentences.push(buffer.join('') + match);
      buffer = [];
      str = str.substr(re.lastIndex + 1);
      re.lastIndex = 0; // reset regexp
    }
    buffer.push(str);
  }
});

if (buffer.length) {
  sentences.push(buffer.join(''));
}

console.log(sentences);

Demo

Each node that is either an element or an incomplete sentence is stored in a buffer until a complete sentence is identified; it is then added to the result array.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Is there a way to streamline this function call that appears to be redundantly repeating the same actions?

I have developed a function to search for blog posts, prioritizing titles over excerpts and excerpts over content when added to the containsQuery array. While the code seems to be working well, I have noticed that there is a lot of redundant code. How can ...

How can I determine if my clients are utilizing the CDN or NPM versions of my JavaScript library?

At this moment, I'm contemplating releasing an open-source version of my library on NPM. My main concern is figuring out how to track the usage of my CDN or NPM by clients. Is there a method available to achieve this? ...

PHP: Best practices for printing classes and IDs for jQuery rendering

Hey there, I have a question that may sound basic to some, but I need help with echoing this piece of code: echo 'jQuery(document.body).prepend("<div id="notice" class="alert alert-success">Advanced Custom Fields plugin is currently active. & ...

Step-by-step guide on permanently assigning a value in Node.js

I recently started working with node js and I am facing an issue where I need to save an id in a global variable that is required in multiple functions. However, every time the server restarts, the variable gets emptied. Is there a way to persist this va ...

Guide to downloading a file from a byte base64 in Vue.js

I am dealing with a byte base64 string let data = 'UEsDBBQABgAIAAAAIQBi7...' If I want to download the file from this byte base64 in vue js, what should I do? Any suggestions or solutions would be greatly appreciated :') ...

Implementing X.PagedList within a modal pop-up window

I have implemented a modal pop-up on a webpage: ... <div class="modal fade" tabindex="-1" role="dialog" aria-labelledby="companySearchModal" aria-hidden="true" id="companySearchModal"> <div class="modal-dialog" role="document"> ...

Utilizing $stateParams within a personalized ui-router configuration attribute

Attempting to retrieve data from $stateParams within a custom ui-router state configuration property results in an empty object being logged. This outcome is anticipated given that I am straying from the standard ui-router configuration. Despite this devi ...

`Launch link in new browser window`

Seeking assistance with a coding issue. I'm attempting to have links open in a new tab, but I haven't been successful using the href attribute. Below is the code snippet from src/utils/menu.js: const menu = [ { name: 'App& ...

How can I create the effect of text changing color automatically after a specified time period has elapsed

I am currently dealing with a timer that is functioning properly, but I have a specific CSS inquiry. Essentially, what I would like to achieve is when the timer hits 30 seconds, I want the numbers to change to red color. $(function () { var $startTimer = ...

Trouble with defining variables in EJS

Recently delving into the world of node development, I encountered an issue with my EJS template not rendering basic data. I have two controllers - one for general pages like home/about/contact and another specifically for posts. When navigating to /posts ...

Where is the source of this ever-changing image?

Recently, I decided to dive into the world of jQuery and experiment with a plugin carousel. However, I must admit it is all a bit overwhelming for me at the moment. Currently, I have the plugin installed and I am struggling to get rid of the bottom scroll ...

Converting HTML table data into a JavaScript array

On my website, I have an HTML table that displays images in a carousel with their respective positions. The table utilizes the jQuery .sortable() function to allow users to rearrange the images by dragging and dropping. When an image is moved to the top of ...

Consistently encountering the error message "(0 , _reactDom.h) is not a function" or "h is not defined"

I'm currently in the process of developing an app that makes use of electron, react, redux, and several other technologies. At the moment, I have included electron, react, electron-compile, and babel in the project. Redux is installed but has not been ...

Is there a way to convert a PHP array into a JavaScript object and return it?

When I have an array defined and encode it using json_encode() $array = array("a" => "element1", "b" => "element2"); echo json_encode($array); The usual JSON output is: {"a":"element1","b":"element2"} However, my interest lies in getting this out ...

The persistence of postback from the javascript function despite returning false when utilizing Asp Radiobuttonlist

HTML: <asp:RadioButtonList ID="rdStatus" runat="server" Height="48px" RepeatDirection="Horizontal" AutoPostBack="true" OnSelectedIndexChanged="rdStatus_SelectedIndexChanged" CssClass="rad"> ...

Stubbing out a module's function with Sinon

Let's envision a scenario where there is a file present: // app.js const connection = require('./connection.js') module.exports = function (...args) { return async function (req, res, next) { // code implementation ... const ...

Is there a way to properly access and interpret a .log extension file within a React component?

import React from "react"; import "./App.css"; function FileReaderComponent() { const readLogFile = () => { if (window.File && window.FileReader && window.FileList && window.Blob) { const preview = document.getElementByI ...

The error message: "Trying to access property 'get' of an undefined object"

$http.get('/contactList'). success(function(data){ console.log('received data from http get'); }). error(function(data) { $scope.error = true; $scope.data = data; return 'error message'; }); There seems to be ...

Unable to locate Node.js /socket.io/socket.io.js on express 4.0

Currently, I am working on implementing a chat feature on my website. During testing on my local server, everything was running smoothly as port 8080 on localhost was readily available. However, after deploying my code to Heroku, I encountered an issue whe ...

Implementing a soft transition to intl-tel-input plugin

This tel-input plugin was developed by Jack O'Connor. You can find the plugin here: https://github.com/Bluefieldscom/intl-tel-input I have observed that the flags take approximately one second to download, and I would like to enhance this process wi ...