Using JavaScript's Regex to match sentences, while ensuring any full stops within quotes are ignored

Here is the regular expression:

(?:(?:Jr|Master|Mr|Ms|Mrs|Dr|Capt|Col|Sgt|Sr|Prof|Rep|Mt|Mount|St|Etc|Eg)\.\s+|["'“(\[]?)(?:\b(?:(?!(?:\S{1,})[.?!]+["']?\s+["']?[A-Z]).)*)(?:(?:(?:Jr|Master|Mr|Ms|Mrs|Dr|Capt|Col|Sgt|Sr|Prof|Rep|Mt|Mount|St|Etc|Eg)\.\s+(?:(?!\w{2,}[.?!]['"]?\s+["']?[A-Z]).)*)?)*(?:(?![.?!]["']?\s+["']?\w).)*(?:[.?!)\]]+["'”]?|[^\r\n]+$)

You can view this regex101 here.

For a visual representation of the node graph, visit and enter the regex string.

This regex was originally discussed on Sitepoint, and you can find an explanation here.


Purpose: The aim of this regex is to accurately match sentences while considering factors like quotations and abbreviations without breaking sentence structures.


Main Issue:

The main problem lies in situations where sentences are incorrectly split due to full stops within quotes that should remain intact.

PROBLEM: "This is a problem. You hear me?"

Aside from this issue, do you believe this regex is mostly reliable and efficient?


Two Possible Problems or 'Exceptions' (refer to above regex101):

Possible issue with a sentence (Misplacement around "Mr."): On Feb. 20 Mr. X said "Beyond the fourth wall, there shall be 'light'"?!... Or something. Second sentence. Third.

and

Another possible issue ("Really?" should not split before capitalized names?): "Really?" Mr. baker asked, as he proceeded to ponder.

Some previous issues that have been resolved since the thread started include:

No splitting after a single letter followed by punctuation and then a full stop representing a new sentence. (eg. A.S.A.P! New line.)

No splitting when a full stop occurs after a quotation.

Avoiding breakage with abbreviations at the start of a sentence. (eg. Sgt. Timothy.)

Capturing new lines without ending punctuation.

What are your thoughts on this implementation? Thank you!

Answer №1

Try to locate phrases that contain a pattern of characters

(([0-9]+(\.[0-9]+)?[,;]?|([A-Z][bcdfghjklmnpqrstvwxyz]*\.)+[,;]?|[A-Za-z][a-z']*[,;]?)+(\s+|\.|[!?]))+

UPDATE: This is the best match I could find.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Displaying variables in JavaScript HTML

<script type ="text/javascript"> var current = 0; </script> <h3 style={{marginTop: '10', textAlign: 'center'}}><b>Current Status: <script type="text/javascript">document.write(cur ...

What is the best way to adjust viewport settings for child components, ensuring the container size is set to 100vw/100vh for all children

Within my project, I have connected the react-static repository with the react repository using yarn link. "react": "^16.13.1" "react-static": "^6.0.18" I am importing various components from the react-static reposi ...

What is the best way to import modules with the "@" symbol in their path when working with Node.js?

Situation In my VueJS project, I have created service modules for use with vue cli. My code makes use of the @ symbol to easily access files within the src folder: /* Inside someService.js */ import API from '@/services/APIService.js' Ch ...

Using the debug module, we're able to set up debugging for our Express application with the specific tag "express-locallibrary-tutorial:server". I am curious to understand the purpose and significance of this setup

I've been diving into backend development with Express lately. I decided to work on the express-locallibrary-tutorial project from GitHub. However, I'm having trouble grasping something. var debug = require('debug')('express-locall ...

Transforming a group of text into a JSON format using Python

Currently in the process of setting up a flask server for my front end, I am receiving requests in the form of a JSON object like this: InputJson = {"text":"Field1:A|Field2:B|Field3:C","format":"Reader"} The goal is to convert the text field into proper ...

Excluding node modules when not included in tsconfig

Within my Angular project, there is a single tsconfig file that stands alone without extending any other tsconfigs or including any additional properties. Towards the end of the file, we have the following snippet: "angularCompilerOptions": { ...

Using JQuery to create an animated slideToggle effect for a multicolumn list

I have a large list where each li element has a width of 33%, resulting in 3 columns: computers monitors hi-fi sex-toys pancakes scissors Each column contains a hidden UL, which is revealed through slideToggle on click. JQuery $('.subCate ...

JS: The values printed by setTimeout are always taken from the previous iteration

This issue I'm facing is directly related to JS scope, and despite my efforts in research, I have not been able to find effective solutions from similar stackoverflow queries. Here is the program in question: http://jsfiddle.net/0z525bhf/ function ...

Exploring the map function in Angular and native JavaScript

Still getting the hang of angular, so there might be something I'm overlooking. I have a model containing a collection of objects with their own properties, and my goal is to generate a csv value based on the Text property of each object. I've ex ...

Utilizing hyperlinks within NicEdit content and managing events with jQuery

I am using nicEdit, a rich editor, on my website to insert hyperlinks into the content. While I can successfully add hyperlinks using the setContent() method after initializing nicEdit, I am facing issues with handling click events for hyperlinks that have ...

Does SameSite=Lax grant permission for GET requests from third-party sources?

After exploring the MDN documentation on SameSite=Lax, I have come to understand the following: In modern browsers, cookies can be sent along with GET requests initiated by a third-party website or during top-level navigations. This is the default behav ...

Why isn't the onChange function triggering in the input type text when the input is not manually typed in?

I am currently facing an issue with two text fields in my HTML form. Here is how they are set up: HTML : <input type="text" id="input1" onchange="doSomething();" disabled/> <input type="text" id="input2"/> JavaScript : function doSomething( ...

What is the best way to ensure that the state is updated only when the user begins typing in a text

I am currently working on a text editor-related code and my main focus is to update the state of the editor only when the user starts typing in the editor. The state should be updated under the following scenarios: 1. Update the state when the user begin ...

The sign-up button mysteriously vanishes after the page is refreshed

I am encountering an issue with the sign up button during the user registration process. There is a checkbox for Terms & Conditions, and the button should only become enabled after checking this box. Everything seems to be functioning correctly, but when I ...

Centered on the screen are the input field and corresponding label

I am in the process of creating a signup form, and I have encountered an issue. How can I make the input wider without using a fixed width like width: 420px? Additionally, I would like to center both the input field and the label. I envision something simi ...

Having trouble updating the icon on my website using FontAwsome

Just a heads up - I'm not familiar with HTML/CSS/JS. This template is pre-made, and I'm simply making some adjustments to it. Hello, I'm currently working on my portfolio website and I want to display my projects based on the programming la ...

Issue with Gulp Watch failing to detect modifications in Browserify files

Currently, I am utilizing the laravel-elixir-vueify npm package for my project. Gulp watch functionality performs as expected when I make changes to files within the "scripts" or "styles" functions. However, there seems to be an issue when it comes to moni ...

Using Jquery to delete the parent element containing text that does not match

I need to search for text in a table cell that matches the text in an h1 heading and then eliminate all other table rows containing text that does not match. The code snippet provided only works if there is one .tablerow with a .tablecell, so I am looking ...

Implementing JavaScript to Retrieve and Insert Full HTML Tags into a Textarea

I am currently trying to extract an HTML source code value and insert it into a specific textarea or div upon clicking a button. However, I am encountering issues where I am not receiving the entire HTML tags - it seems to begin with a Meta tag and is remo ...

Showing dynamic content retrieved from MongoDB in a list based on the user's selected option value

Implementing a feature to display MongoDB documents conditionally on a webpage is my current goal. The idea is for the user to choose an option from a select element, which will then filter the displayed documents based on that selection. For instance, if ...