Can a regular expression be created to specifically target and match a singular grapheme cluster?

Characters in text that are perceived by users, known as graphemes, can consist of multiple codepoints in unicode.

According to Unicode® Standard Annex #29:

Users may perceive a character as a single unit of writing in a language, but it could actually be represented by several Unicode code points. This concept is called a user-perceived character to avoid confusion with the computer's use of the term character. For example, "G" + grave-accent forms a user-perceived character which consists of two Unicode code points. These characters are approximated by grapheme clusters that can be determined programmatically.

Is there a regular expression available (in javascript) that will match a single grapheme cluster? e.g.

"한bar".match(/*?*/)[0] === "한"
"நிbaz".match(/*?*/)[0] === "நி"
"aa".match(/*?*/)[0] === "a"
"\r\n".match(/*?*/)[0] === "\r\n"
"💆‍♂️foo".match(/*?*/)[0] === "💆‍♂️"

Answer №1

Integrated support that is user-friendly and comprehensive: not available. However, there are approximations for different matching tasks: yes. As stated in the regex tutorial:

To match a single grapheme, whether it consists of a single code point or multiple code points with combining marks, various programming languages like Perl, PCRE, PHP, Boost, Ruby 2.0, Java 9, and Just Great Software applications provide an easy solution using \X. Think of \X as the Unicode equivalent of the dot metacharacter. One key distinction is that while \X matches line break characters, the dot does not unless you activate the dot matches newline mode.

In .NET, Java versions prior to 8, and Ruby 1.9, you can utilize \P{M}\p{M}+ or (?>\P{M}\p{M}) as a fairly close alternative. For matching any number of graphemes, consider using (?>\P{M}\p{M}*)+ as a substitute for \X+.

\X offers the closest solution but is absent from all versions up to ES6. A workaround such as \P{M}\p{M}+ may resemble \X, but doesn't exactly match. In cases where ES6 is present through native or transpilation means, consider using /(\P{Mark})(\p{Mark}+)/gu.

However, even with these alternatives, it's important to note that this approach may not be sufficient. Make sure to check out that link for detailed insights.

A proposal has been introduced to segment text, as mentioned in this repository. While not yet universally accepted, users on Chrome can explore the non-standard Intl.v8BreakIterator for cluster segmentation and manual matching.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Issue with Photoswipe pswp class Not Getting Properly Cleared Upon Closing Image

My website has a Photoswipe image gallery from . The issue I'm facing is that the CSS class does not reset or clear after closing the gallery for the second time. For example, when a user opens item 1, the images are loaded into the picture div via A ...

What is the best method to convert a variable array into a string array using jQuery in the context

I've been working with this jQuery code snippet: function extractParameter() { var values = []; $("input[name='SelectedServiceTypes']:checked").each(function (i) { values.push($(this).val()); }); if (values.length == ...

Having trouble sending data to an API through jQuery: the function is not functioning properly

Currently, I am in the process of developing an HTML form that allows users to input values into text fields and submit them to an external API called ThingSpeak. This API then uses the received values to generate a plot displayed within an iframe. Althoug ...

Issue with gMarker.key being undefined is causing an error in an angular application that utilizes the Google Maps API. This

I am currently working on an application that involves integrating Google Maps into an Angular app to display coordinates on the map when a user selects a location. However, I am encountering an error when trying to load the app. The HTML seems to be funct ...

Graph plot in a responsive div using Plotly.js

My project involves creating a webpage with dynamic div elements that resize upon mouseover using a straightforward CSS class. I've set it up so that these divs start off small when the page loads, but expand when a user hovers over them. The CSS cod ...

Implementing a Variety of Textures and Images in a Three.js Scene

I am utilizing THREE.js to showcase a 3D spinning globe in the web browser. Along with that, I intend for an image to display around the rotating globe. Despite attempting to use the provided function, var img = new THREE.ImageLoader(); img.load("texture/ ...

Is the state of the React.js component empty?

HTML: <!-- index.html --> <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title>React Tutorial</title> <script src="https://cdnjs.cloudflare.com/ajax/libs/react/0.14.6/react.js"></script> ...

Updating a nested property within an array of objects in MongoDB

Storing grades for an online education application using MongoDB. Here is a sample classRoom document stored in my mongoDB database. StudentGradeObjs are kept in an array within a GradeObject. GradeObjs are stored in an array of GradeObjects inside a class ...

What steps are necessary to integrate expo-auth-session with Firebase?

I am working on implementing a feature in my code that will allow users to login and authenticate using their Google credentials. Once they successfully log in, I want them to be added to my authentication database in Firebase. My attempt to achieve this ...

"Can you provide instructions on how to set the background to the selected button

How can I change the background color of a selected button in this menu? Here is the code for the menu: <ul id="menu"> <li class="current_page_item"><a class="button" id="showdiv1"><span>Homepage</span></a></ ...

Guide on retrieving data parameter on the receiving page from Ajax response call

I am working on dynamically opening a page using Ajax to avoid refreshing the browser. The page opens and runs scripts on the destination page, but before running the script, I need to retrieve parameters similar to request.querystring in JavaScript. Belo ...

Is it possible to change the input type of 'time' to a string when utilizing ng-change in AngularJS?

Is there a way to convert a time input into a string or a timestamp for Firebase support? For instance, the code below will not function correctly due to the time input type. HTML <html ng-app='app'> <head> <script src="http ...

Clicking to Load Images - Angular

Implementing a feature to load sets of images on button click instead of loading all at once. Although lazy load plugins are available, I decided to experiment with this approach. Here's the logic: Start with a data array called 'Images' co ...

Is there a way to activate ng-change when a space character is input?

My function is triggered when the value of a textarea changes, however, it does not work when the spacebar is pressed. Is there a way to make it work for spacebars as well? I want the function to be called whenever the content changes, regardless of the ke ...

It appears that there is a slight hiccup in the code when JavaScript is implementing the line skip functionality for the condition

Currently, I am working on a textbook exercise that involves the classic 99 Bottles of Beer on the wall JavaScript program. However, this assignment does not come with any examples or answers for reference. Despite searching online for assistance, the code ...

execute the execCommand function following an ajax request

I am currently developing a WYSIWYG editor and encountering an issue with image handling using execCommand. Here is a simplified example of my page structure: <div id="buttons_panel"><input id="img_submit" type="button"/></div> <div ...

Problems arising from Jquery append functionality

When using the append method, my inner div is only attaching one WHEAT-COLORED-BOX, whereas when using appendTo, my inner div attaches all the required number of WHEAT-COLORED-BOXES. Therefore, in this case, appendTo gives the correct result while append f ...

Displaying the currently logged in user's name with NodeJS/ExpressJS/Passport

In my quest to showcase the username for logged-in users (function 3), I encountered a dilemma. Initially, only function 1 existed in my codebase. To address this issue, I made modifications and introduced function 2 to facilitate displaying usernames of a ...

What steps should be taken to configure Multer so that it delays saving an image until after a successful database entry using Mongoose has been made?

In the typical Multer setup, files are saved immediately, as illustrated in the following basic example: const multer = require('multer'); const upload = multer({ dest: 'uploads/' }); const app = express(); app.post('/profile&apo ...

Peeling back the layers of a particular element

This is my code snippet: <pre id='code'> <ol> <li class='L1'><span>hello</span></li> <li class='L2'><span>Hi</span></li> <li class='L3&apos ...