Tips on adding line breaks after periods that are not at the end of a sentence in HTML text nodes with regular expressions

Looking to craft a regex that can identify all periods not enclosed in quotes and not followed by a '<'.

This is for the purpose of converting text into ssml (Speech Synthesis Markup Language). The regex will be utilized to automatically insert <break time="200ms"/> after a period.

I've managed to devise a pattern that detects periods outside of quotes:

/\.(?=(?:[^"]|"[^"]*")*$)/g

The above regex produces the following results: (^ = match)

This. is.a.<break time="0.5s"/> test sentence.
    ^   ^ ^                                  ^

However, I am striving to formulate a regex that excludes matching the third period. The expected matches should appear as follows:

This. is.a.<break time="0.5s"/> test sentence.
    ^   ^                                    ^

If anyone can offer some guidance, it would be greatly appreciated!

Answer №1

Group capture can be a useful technique in this scenario.

To manipulate or extract string expressions effectively, it is important to capture the dots within a separate group:

/((?:[^"\.]|(?:"(?:\\\\|\\"|[^"])*"))*)(\.(?!\s*<))((?:[^"\.]|(?:"(?:\\\\|\\"|[^"])*"))*)/g

The expression [^"\.] denotes any character that is not a dot or double quote.

The syntax "(?:\\\\|\\"|[^"])*" represents a string expression, potentially containing escaped double quotes or dots.

Therefore, (?:[^"\.]|"(?:\\\\|\\"|[^"])*")* will consume all characters except dots (.), disregarding dots enclosed within string expressions as much as possible.

Upon executing this regex pattern on the provided string:

"Thi\\\"s." is..a.<break time="0\".5s"/> test sentence.

The following matches will be generated:

Match 1

  • Full match, from character 0 to 15: "Thi\\\"s." is.
  • Group 1, from character 14 to 15: .

Match 2

  • Full match, from character 15 to 16: .
  • Group 1, from character 15 to 16: .

Match 3

  • Full match, from character 18 to 55:
    <break time="0\".5s"/> test sentence.
  • Group 1, from character 54 to 55: .

You can validate this using an excellent tool like Regex101

Notably, the captured point will consistently reside in the second group due to how the expression is structured. As such, the index of the dot can be determined by match.index + group[1].length, assuming group[1] exists.

Note: The provided expression accommodates for escaped double quotes to prevent issues when encountering them.

A concise and functional version of the working solution is outlined below:

// To gather all matches, 'g' flag is essential
const regexp = /((?:[^"\.]|(?:"(?:\\\\|\\"|[^"])*"))*)(\.(?!\s*<))((?:[^"\.]|(?:"(?:\\\\|\\"|[^"])*"))*)/g;

function getMatchingPointsExcludingChevronAndStrings(input) {
  let match;
  const result = [];

  // Resetting the lastIndex of regexp since it's reused per call
  regexp.lastIndex = 0;
 
  while ((match = regexp.exec(input))) {
      // Index of the dot = match index + length of group 1 if present
      result.push(match.index + (match[1] ? match[1].length : 0));
  }

  // Result comprises indices of all '.' adhering to the specified criteria
  return result;
}

// Escaping an escaped string requires careful handling, evident from console.log
const testString = `"Thi\\\\\\"s." is..a.<break time="0\\".5s"/> test sentence.`;
console.log(testString);

// Final outcome
console.log(
    getMatchingPointsExcludingChevronAndStrings(testString)
);

Edit:

The requester desires to insert pause markup after periods in the text as raw HTML content.

Here’s a fully operational solution:

// To collect all matches, include 'g' flag
const regexp = /((?:[^"\.]|(?:"(?:\\\\|\\"|[^"])*"))*)(\.(?!\s*<))((?:[^"\.]|(?:"(?:\\\\|\\"|[^"])*"))*)/g;

function addPausesAfterPeriods(input) {
    let match;
    const dotOffsets = [];

    // Resetting lastIndex of regexp before each use
    regexp.lastIndex = 0;
    const ts = Date.now();

    // Initially compile offsets for all period occurrences
    while ((match = regexp.exec(input))) {
        // Offset of the dot = match index + length of first group if applicable
        dotOffsets.push(match.index + (match[1] ? match[1].length : 0));
    }

    // If no periods found, return input untouched
    if (dotOffsets.length === 0) {
        return input;
    }

    // Reconstruct the string with added breaks following each period
    const restructuredContent = dotOffsets.reduce(
        (result, offset, index) => {
            // A segment represents substring from one period to the next (or beginning)
            const segment = input.substring(
              index <= 0 ? 0 : dotOffsets[index - 1] + 1,
              offset + 1
            );
            return `${result}${segment}<break time="200ms"/>`;
        },
        ''
    );

    // Add remaining portion from last period till end of string
    const remainder = input.substring(dotOffsets[dotOffsets.length - 1] + 1);
    return `${restructuredContent}${remainder}`;
}

const testString = `
<p>
    This is a sample from Wikipedia.
    It is used as an example for this snippet.
</p>
<p>
    <b>Hypertext Markup Language</b> (<b>HTML</b>) is the standard
    <a href="/wiki/Markup_language.html" title="Markup language">
        markup language
    </a> for documents designed to be displayed in a
    <a href="/wiki/Web_browser.html" title="Web browser">
        web browser
    </a>.
    It can be assisted by technologies such as
    <a href="/wiki/Cascading_Style_Sheets" title="Cascading Style Sheets">
        Cascading Style Sheets
    </a>
    (CSS) and
    <a href="/wiki/Scripting_language.html" title="Scripting language">
        scripting languages
    </a>
    such as
    <a href="/wiki/JavaScript.html" title="JavaScript">JavaScript</a>.
</p>
`;


console.log(`Initial raw html:\n${testString}\n`);

console.log(`Result (added 2 pauses):\n${addPausesAfterPeriods(testString)}\n`);

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Creating a CSS animation to repeat at regular intervals of time

Currently, I am animating an SVG element like this: .r1 { transform-box: fill-box; transform-origin: 50% 50%; animation-name: simpleRotation,xRotation; animation-delay: 0s, 2s; animation-duration: 2s; animation-iterat ...

I require the variable to be accessible from all corners

Is there a way to access the variable "finalPrice" in a different function? I'm trying to achieve this and could use some guidance. function available(id){ $.ajax({ method:"GET", }).done(function(data){ for( ...

Unable to get jQuery date picker to function on dynamically generated input fields through ajax

I'm facing an issue with my jQuery date picker not working on an input field generated via Ajax from the server-side, but it works fine when the field is directly added to the page. Here's a simplified version of what I'm dealing with: The ...

Encountering an issue: Module not found - 'cryptile' during express js installation

Just dipping my toes into the world of Node.js and I'm encountering some obstacles when trying to install Express.js. Seeking assistance in resolving this issue and successfully setting up Express.js. https://i.stack.imgur.com/PlHiB.png Interestingl ...

The Iframe is preventing the mousemove event from taking place

After attaching an event to the body, a transparent iframe mysteriously appeared in the background of the popup. I am attempting to trigger a mousemove event on the body in order for the popup to disappear immediately when the mouse moves over the iframe ...

Modify the background color of one div based on the visibility of another div

My carousel consists of three divs representing a Twitter post, a Facebook post, and a LinkedIn post. These are contained within another div called #social-media-feeds. I am curious if it is feasible to adjust the background color of #social-media-feeds d ...

What is preventing the bundling of my CSS into the application?

I'm facing an issue while setting up a new project using vue.js, scss, and webpack (with express.js on the server side and TypeScript). I copied over the configurations from a previous project where everything was working fine. According to my underst ...

Is there a way to stop TinyMCE from adding CDATA to <script> elements and from commenting out <style> elements?

Setting aside the concerns surrounding allowing <script> content within a Web editor, I am fully aware of them. What I am interested in is permitting <style> and <script> elements within the text content. However, every time I attempt to ...

After toggling the class, Jquery will no longer select the button

I am having an issue with my jQuery code where I have a button that, when clicked, should switch classes from #testButton to .first or .second. The image toggle shows that the first click works fine and toggles the image, but the second click does not seem ...

The bespoke node package does not have an available export titled

No matter what I do, nothing seems to be effective. I have successfully developed and launched the following module: Index.ts : import ContentIOService from "./IOServices/ContentIOService"; export = { ContentIOService: ContentIOService, } ...

Error message "Truffle 'Migrations' - cb is not a valid function"

I created a straightforward smart contract using Solidity 0.6.6 and now I'm attempting to deploy it on the BSC Testnet. Here's what my truffle-config.js file looks like (privateKeys is an array with one entry ['0x + privatekey']): netw ...

Using jQuery ajax in PHP, the ability to remove retrieved information from a different page is a

I'm currently working on a jQuery AJAX PHP application that allows for adding, deleting, and displaying records using switch case statements to streamline the code. Everything seems to be functioning correctly with inserting and displaying records, bu ...

What is the process for implementing a decorator pattern using typescript?

I'm on a quest to dynamically create instances of various classes without the need to explicitly define each one. My ultimate goal is to implement the decorator pattern, but I've hit a roadblock in TypeScript due to compilation limitations. Desp ...

Unable to retrieve data from the JSON object

I'm struggling to extract the data value from a JSON object. Let me share my code snippet: var ab_id = $( "#ab_id" ).val(); $.ajax({ type: 'GET', contentType: 'application/json', url: 'edit_account.php', ...

What is the process for constructing an object to resemble another object?

After collecting input data, I have created an object based on those values. Here is an example of the generated object: var generate_fields = { name: "Mike", email: "<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b4d9dddf ...

Switching the positions of the date and month in VueJS Datepicker

Recently, I have been utilizing the datepicker component from vuejs-datepicker. However, I encountered an issue where upon form submission, the date and month switch places. For instance, 10/08/2018 (dd/MM/yyyy) eventually displays as 08/10/2018, leading ...

What are the steps to correctly shut down an ExpressJS server?

I am facing a challenge with my ExpressJs (version 4.X) server - I need to ensure it is stopped correctly. Given that some requests on the server may take a long time to complete (1-2 seconds), I want to reject new connections and wait for all ongoing req ...

Toggle textboxes using jQuery depending on the radio button choice

I'm trying to make specific textboxes appear when a particular radio button is clicked on my form, but I want them to remain hidden otherwise. Here's an example of what I've implemented: HTML: Radio buttons: <p>Show textboxes<inpu ...

Activate the Masterpage menu to emphasize its current state

I am currently utilizing the AdminLTE template on my website. One issue I have encountered is with the menu and its child menus. When redirecting to different pages using image buttons, the menu collapses back to its original state. While navigating throu ...

Creating synchronicity in your code within the useEffect hook

Is there a way to ensure that my function is fully completed before moving on, even though it's not recommended to add async to useEffect? Take a look at this code snippet: useEffect( () => { const RetrieverDataProcess = async () => ...