Here is the regular expression:
(?:(?:Jr|Master|Mr|Ms|Mrs|Dr|Capt|Col|Sgt|Sr|Prof|Rep|Mt|Mount|St|Etc|Eg)\.\s+|["'“(\[]?)(?:\b(?:(?!(?:\S{1,})[.?!]+["']?\s+["']?[A-Z]).)*)(?:(?:(?:Jr|Master|Mr|Ms|Mrs|Dr|Capt|Col|Sgt|Sr|Prof|Rep|Mt|Mount|St|Etc|Eg)\.\s+(?:(?!\w{2,}[.?!]['"]?\s+["']?[A-Z]).)*)?)*(?:(?![.?!]["']?\s+["']?\w).)*(?:[.?!)\]]+["'”]?|[^\r\n]+$)
You can view this regex101 here.
For a visual representation of the node graph, visit and enter the regex string.
This regex was originally discussed on Sitepoint, and you can find an explanation here.
Purpose: The aim of this regex is to accurately match sentences while considering factors like quotations and abbreviations without breaking sentence structures.
Main Issue:
The main problem lies in situations where sentences are incorrectly split due to full stops within quotes that should remain intact.
PROBLEM: "This is a problem. You hear me?"
Aside from this issue, do you believe this regex is mostly reliable and efficient?
Two Possible Problems or 'Exceptions' (refer to above regex101):
Possible issue with a sentence (Misplacement around "Mr."): On Feb. 20 Mr. X said "Beyond the fourth wall, there shall be 'light'"?!... Or something. Second sentence. Third.
and
Another possible issue ("Really?" should not split before capitalized names?): "Really?" Mr. baker asked, as he proceeded to ponder.
Some previous issues that have been resolved since the thread started include:
No splitting after a single letter followed by punctuation and then a full stop representing a new sentence. (eg. A.S.A.P! New line.)
No splitting when a full stop occurs after a quotation.
Avoiding breakage with abbreviations at the start of a sentence. (eg. Sgt. Timothy.)
Capturing new lines without ending punctuation.
What are your thoughts on this implementation? Thank you!