An algorithm designed to identify matching song lyrics based on a keyword or its fragments

I am currently dealing with a large text file consisting of over 852000 lines, each containing song verses preceded by different numbers like 1., 134-20., or 1231.. The verses may have four or more lines. Additionally, there are variations within the lines that I need to ignore for now.

This is the code I've been struggling with and haven't achieved satisfactory results so far:

$.ajax({url:"LD.txt",dataType:'text',success:function(data){
//var lines=data.match(/(.*)\r\n(^[A-Z].*)+/mg);
var lines=data.match(/(.*)(^[A-Z].*)+/mg);
for(var i=0;i<50/*lines.length*/;i++){
var line=lines[i].replace("\r\n","");console.log(i+" "+line);
}}});

Here is an excerpt from the UTF-8 text file:

/* 1970  #1.#  PAR DZIESMĀM UN DZIEDAŠANU
#1. Dziesmas un dziedašana vispāriga tautas manta un cilvēka mūža pavadoņi.
1.Dziesmas visai Latvijai kopeja manta. */

15.
Dziesmiņ' mana, kā dziedama,
Ne ta mana pamanita;
Vecā māte pamācija,
Aizkrāsnē tupedama.
#279a.

16.
Māci, māte, man' dziedāt,
Mā...

The javascript solution I'm aiming for should allow searching for specific words in the text input. For example, if one searches for the exact word dziedama, the output should display the preceding number (which could be several lines before) along with the verse part containing the searched word highlighted in bold.

15. Dziesmiņ' mana, kā <b>dziedama</b>, Ne ta mana pamanita; Vecā māte pamācija, Aizkrāsnē tupedama.

If the search query contains an asterisk like dzie*, the full word should be shown in bold within the results.

15. <b>Dziesmiņ'</b> mana, kā <b>dziedama</b>, Ne ta mana pamanita; Vecā māte pamācija, Aizkrāsnē tupedama.
16. Māci, māte, man' <b>dziedāt</b>, Māc' ar vienu Dieva <b>dziesmu</b>, Ko <b>dziedās</b> dvēselite, Pie Dieviņa aizgājuse.
...

The search functionality should also cover words with an asterisk at the beginning like *esmu, which can match variations such as dziesmu, iesmu, Dievadziesmu, etc., with variable characters hidden behind the asterisk.

If the query includes letters followed by a question mark like dzied?, the search should return verses containing similar words like dziedu, dziedi, etc., with one character represented by the question mark.

In case the search query is enclosed in double quotes like vienu Dieva, it should precisely match the sequence of words in the verses.

The search should support diacritics-rich text and also provide options for normalization without diacritics.

Thank you for your assistance!

Answer №1

Alright, let's look at the regex needed to match an entire verse that starts with a number on its own line and contains the word xxxxx:

^[0-9]+\.$(?:.(?!^[0-9]+\.$))+\b(xxxxx)\b.*?(?=^[0-9]+\.$)
with flags gmsu

Breaking it down:

  • ^[0-9]+. matches a line starting with a number
  • (?:.(?!^[0-9]+$))+ matches any characters not followed by another line starting with a number
  • \b(xxxxx)\b ensures xxxxx is matched as a whole word
  • .*?(?=^[0-9]+\.$) grabs the shortest string before the next line with a number

However, there are issues with using the \b boundary. It doesn't fully support Unicode characters.

According to this source, for Unicode equivalent matching, we should use [^\p{L}\p{N}\p{M}\p{Pc}] instead of \W and [\p{L}\p{N}\p{M}\p{Pc}] for \w.

Using these Unicode patterns in our look-arounds instead of \b, the updated regex would be:

^[0-9]+\.$(?:.(?!^[0-9]+\.$))+(?<=^|[^\p{L}\p{N}\p{M}\p{Pc}])(xxxxx)(?=$|[^\p{L}\p{N}\p{M}\p{Pc}]).*?(?=^[0-9]+\.$)
with flags gmsu

Addressing special characters like * and ?, we must preprocess the user input accordingly:

  1. Take the user input
  2. Escape all regex-special characters with a backslash (\)
  3. Replace \? with [\p{L}\p{N}\p{M}\p{Pc}]
  4. Replace \* with [\p{L}\p{N}\p{M}\p{Pc}]+

Substitute this adjusted input for xxxxx in the following modified regex:

^[0-9]+\.$(?:.(?!^[0-9]+\.$))+(?<=^|[^\p{L}\p{N}\p{M}\p{Pc}])xxxxx(?=$|[^\p{L}\p{N}\p{M}\p{Pc}]).*?(?=^[0-9]+\.$)
with flags gmsu

To illustrate, consider the word dziedās in the pattern: Regex Example

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

The sequence of Angular directives being executed

When multiple directives are applied to an element in AngularJS, what determines the order in which they will be executed? For instance: <input ng-change='foo()' data-number-formatter></input> Which directive, the number formatter ...

How can I extract only certain keys from a large JavaScript object while keeping the code concise?

Simply put, I aim to streamline objects by discarding unnecessary keys. Imagine a scenario where a third party API sends back JSON data with numerous attributes that hold no importance to you. obj = { name: ..., id: ..., description: ..., blah: .. ...

Display information in real-time based on user input using Highcharts

I am trying to display data using highcharts on my index.php page. Can anyone help me with this?, here is what I have attempted so far: This is the HTML code I have: <!DOCTYPE HTML> <html> <head> <meta http-equiv="Content-Type" cont ...

Accessing Stencil through a corporate proxy network

As I embark on my inaugural Stencil project, I've encountered a puzzling error message: Cannot download "https://github.com/ionic-team/stencil- component-starter/archive/master .zip" Check your internet connection Error: connect ETIMEDOUT" De ...

Extracting duration information from a text

Having some difficulties with a function I'm developing. The objective is to take a string as an input: For example: 'from 16:00-17:00 we will be bowling and from 18:00-19:00 there is dinner' The expected output should be a list containing ...

Access the modal by simply clicking on the provided link

I have implemented a code snippet below to display data from MySQL in a modal popup. Everything is functioning correctly, but I am interested in triggering the modal by clicking a link instead of a button. Is it feasible to achieve this considering I have ...

please transmit the id (or retrieve the id from the router path)

I have a basic blog built with React/Redux for the frontend, featuring user registration and articles. I ran into an issue when trying to pass the article ID to the editor component. The form is the same, but the paths differ for adding new articles and ed ...

Retrieve the offspring with the greatest level of depth within a parental relationship

Consider the following tree structure: -root | | |-child1 | |-innerChild1 | |-innerChild2 | |-child2 I am looking to create a JavaScript function that can determine the depth of an element within the tree. For example: var depth = getInnerDepth( ...

Struggling with the alignment of pictures inside a container

I utilized the Instafeed.js library to fetch the three most recent images from an Instagram account. These images are loaded into a specific div and I successfully customized their styling according to my requirements. However, the current setup is quite s ...

Using jQuery to access the ID of a div and create a custom close button

I am trying to implement a close button for my popup DIVs. Each one has a unique ID and I want to hide them by setting the CSS 'display' property to 'none' when closed. However, the following example is not functioning as expected. I a ...

Master the Art of Scrollbar Control in Angular!

I am currently developing a chat web application that functions similar to gchat. One of the key features I'm trying to implement is an alert notification when the scrollbar is in the middle of the div, indicating a new message. If the scrollbar is at ...

Integrating tooltips on Dimple.js line charts

A simplified network-style chart has been created using Dimple's line plot as the foundation. For example, please refer to this link: http://jsfiddle.net/cc1gpt2o/ myChart.addCategoryAxis("x", "Entity"); myChart.addCategoryAxis("y", "Entity").add ...

Utilizing Angular 10 to Transform a JSON Data into a Custom String for HTML Rendering

I have received a JSON response from my backend built using Spring Boot. The "category" field in the response can either be 1 or 2, where 1 represents Notifications and 2 represents FAQs. { "pageNumber": 0, "size": 5, "totalPages&q ...

What is the best way to incorporate this HTML and script into a React component?

After receiving the embedded HTML and script from a platform, I discovered that a button triggers the script. It was originally designed to be embedded on a website, but I am attempting to integrate it into a React application. Here is the code provided fo ...

Developing a TypeScript NodeJS module

I've been working on creating a Node module using TypeScript, and here is my progress so far: MysqlMapper.ts export class MysqlMapper{ private _config: Mysql.IConnectionConfig; private openConnection(): Mysql.IConnection{ ... } ...

What is the best way to utilize $(target) within a directive?

I created a custom directive for selecting time using two blocks. The challenge is detecting the target event on specific blocks within the directive's template. Directive Template: <div class='time-picker-container'> <div clas ...

Unable to locate the "fcm-node" module in Node.js with TypeScript

When working on a TypeScript project, I usually rely on the fcm-node package to send Firebase push notifications in Node.js. However, this time around, I faced an issue. I know that for TypeScript projects, we also need to install type definitions (@types ...

What is the best way to vertically align an InputLabel within a MUI Grid item?

I'm trying to vertically center the InputLabel inside a MUI Grid item. Here's what I've attempted: import { FormControl, Grid, Input, InputLabel, TextField } from "@mui/material"; export default function App() ...

How can I ensure the Jquery datepicker functions correctly?

I've been attempting to create a jsp page with some Jquery functionalities. Unfortunately, despite my best efforts, I am unable to make it work. I have downloaded jquery1.7.1 and jquery-ui1.8.17 (non-mini), renamed them to jquery171.js and jquery-ui. ...

Implementing Next.js in a live production environment

I've been using next.js for some time now, but I'm still trying to wrap my head around how it operates in a production environment. As far as I understand it, when a request is made to the server, the server retrieves the requested page from the ...