An algorithm designed to identify matching song lyrics based on a keyword or its fragments

I am currently dealing with a large text file consisting of over 852000 lines, each containing song verses preceded by different numbers like 1., 134-20., or 1231.. The verses may have four or more lines. Additionally, there are variations within the lines that I need to ignore for now.

This is the code I've been struggling with and haven't achieved satisfactory results so far:

$.ajax({url:"LD.txt",dataType:'text',success:function(data){
//var lines=data.match(/(.*)\r\n(^[A-Z].*)+/mg);
var lines=data.match(/(.*)(^[A-Z].*)+/mg);
for(var i=0;i<50/*lines.length*/;i++){
var line=lines[i].replace("\r\n","");console.log(i+" "+line);
}}});

Here is an excerpt from the UTF-8 text file:

/* 1970  #1.#  PAR DZIESMĀM UN DZIEDAŠANU
#1. Dziesmas un dziedašana vispāriga tautas manta un cilvēka mūža pavadoņi.
1.Dziesmas visai Latvijai kopeja manta. */

15.
Dziesmiņ' mana, kā dziedama,
Ne ta mana pamanita;
Vecā māte pamācija,
Aizkrāsnē tupedama.
#279a.

16.
Māci, māte, man' dziedāt,
Mā...

The javascript solution I'm aiming for should allow searching for specific words in the text input. For example, if one searches for the exact word dziedama, the output should display the preceding number (which could be several lines before) along with the verse part containing the searched word highlighted in bold.

15. Dziesmiņ' mana, kā <b>dziedama</b>, Ne ta mana pamanita; Vecā māte pamācija, Aizkrāsnē tupedama.

If the search query contains an asterisk like dzie*, the full word should be shown in bold within the results.

15. <b>Dziesmiņ'</b> mana, kā <b>dziedama</b>, Ne ta mana pamanita; Vecā māte pamācija, Aizkrāsnē tupedama.
16. Māci, māte, man' <b>dziedāt</b>, Māc' ar vienu Dieva <b>dziesmu</b>, Ko <b>dziedās</b> dvēselite, Pie Dieviņa aizgājuse.
...

The search functionality should also cover words with an asterisk at the beginning like *esmu, which can match variations such as dziesmu, iesmu, Dievadziesmu, etc., with variable characters hidden behind the asterisk.

If the query includes letters followed by a question mark like dzied?, the search should return verses containing similar words like dziedu, dziedi, etc., with one character represented by the question mark.

In case the search query is enclosed in double quotes like vienu Dieva, it should precisely match the sequence of words in the verses.

The search should support diacritics-rich text and also provide options for normalization without diacritics.

Thank you for your assistance!

Answer №1

Alright, let's look at the regex needed to match an entire verse that starts with a number on its own line and contains the word xxxxx:

^[0-9]+\.$(?:.(?!^[0-9]+\.$))+\b(xxxxx)\b.*?(?=^[0-9]+\.$)
with flags gmsu

Breaking it down:

  • ^[0-9]+. matches a line starting with a number
  • (?:.(?!^[0-9]+$))+ matches any characters not followed by another line starting with a number
  • \b(xxxxx)\b ensures xxxxx is matched as a whole word
  • .*?(?=^[0-9]+\.$) grabs the shortest string before the next line with a number

However, there are issues with using the \b boundary. It doesn't fully support Unicode characters.

According to this source, for Unicode equivalent matching, we should use [^\p{L}\p{N}\p{M}\p{Pc}] instead of \W and [\p{L}\p{N}\p{M}\p{Pc}] for \w.

Using these Unicode patterns in our look-arounds instead of \b, the updated regex would be:

^[0-9]+\.$(?:.(?!^[0-9]+\.$))+(?<=^|[^\p{L}\p{N}\p{M}\p{Pc}])(xxxxx)(?=$|[^\p{L}\p{N}\p{M}\p{Pc}]).*?(?=^[0-9]+\.$)
with flags gmsu

Addressing special characters like * and ?, we must preprocess the user input accordingly:

  1. Take the user input
  2. Escape all regex-special characters with a backslash (\)
  3. Replace \? with [\p{L}\p{N}\p{M}\p{Pc}]
  4. Replace \* with [\p{L}\p{N}\p{M}\p{Pc}]+

Substitute this adjusted input for xxxxx in the following modified regex:

^[0-9]+\.$(?:.(?!^[0-9]+\.$))+(?<=^|[^\p{L}\p{N}\p{M}\p{Pc}])xxxxx(?=$|[^\p{L}\p{N}\p{M}\p{Pc}]).*?(?=^[0-9]+\.$)
with flags gmsu

To illustrate, consider the word dziedās in the pattern: Regex Example

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Utilizing JQuery for asynchronous calls with Ajax

I recently started working with ajax calls using jquery and I'm facing an issue while trying to bind values from the Database. The data is returned in a dataset from the Data Access Layer, and I am attempting to bind this dataset to a gridview on my . ...

Reading a JSON file using Javascript (JQuery)

Struggling to figure out how to handle a JSON file using JavaScript. Here are the details : { "streetCity": { "132":"Abergement-Clemenciat", "133":"Abergement-de-Varey", "134":"Amareins" } } I am attempting to access ...

What could be causing Jquery's $.ajax to trigger all status codes even when the call is successful?

Here is a simple Jquery ajax function call I have. function fetchData(){ var jqxhr = $.ajax({ url: "../assets/js/data/users.json", type: "GET", cache: true, dataType: "json", statusC ...

Click the button to increase the counter up to 2, and then decrease it back to 0 starting from 2

I'm struggling to implement this efficiently. My count keeps incrementing and decrementing by 1, causing me to get stuck at the value of 1 without going back down to zero. using react: jsfiddle The counter increases from 0 to 2 when onClick() is tri ...

Is it possible to assign default values to optional properties in JavaScript?

Here is an example to consider: interface Parameters { label: string; quantity?: number; } const defaultSettings = { label: 'Example', quantity: 10, }; function setup({ label, quantity }: Parameters = { ...defaultSettings }) { ...

Updating the CSS properties of a specific element within a dynamically generated JavaScript list

So I'm working on a JavaScript project that involves creating a navigation bar with multiple lists. My goal is to use the last list element in the bar to control the visibility (opacity) of another element. So far, I have been using the following code ...

Circular arrangement using D3 Circle Pack Layout in a horizontal orientation

I'm currently experimenting with creating a wordcloud using the D3 pack layout in a horizontal format. Instead of restricting the width, I am limiting the height for my layout. The pack layout automatically arranges the circles with the largest one ...

Experimenting with TypeScript code using namespaces through jest (ts-jest) testing framework

Whenever I attempt to test TypeScript code: namespace MainNamespace { export class MainClass { public sum(a: number, b: number) : number { return a + b; } } } The test scenario is as follows: describe("main test", () ...

Does the functionality of Protractor rely on a specific version of AngularJS?

Recently, I began exploring the world of Protractor. One burning question on my mind is its limitations and whether it relies heavily on a specific version of AngularJS. ...

The digest string for the crypto.pbkdf2Sync function is malfunctioning

I've been working on revamping the authentication system for an old application that previously ran on node 4.5, but I keep encountering an error whenever I attempt to log in. TypeError [ERR_INVALID_ARG_TYPE]: The "digest" argument must be one of type ...

Revamp the website's design

I am looking to revamp the color theme of my website with just a click of a button. Can someone provide me with a link to a reference website where I can get some inspiration? ...

Displaying a collapsible table directly centered within the table header

I am having trouble centering my table header in the web browser page. When I click the "+" button, the data is displayed beneath the table header, but I want the collapsible table to be centered directly below the header. I have tried making changes in CS ...

The Node.js Express server seems to be having trouble accessing static files

After successfully starting the express server, I encountered an issue when trying to load static files which resulted in an error message reading "Cannot GET /URL". The static files are located within a folder named "Login" and both the app.js and "Logi ...

Cleaning a string of word characters in Javascript: a step-by-step guide

I have been working on cleaning strings that were transformed from word text, but I am facing an issue with removing the special character '…' When I click on the "clean" button, the script currently removes all dots and only one special ...

Utilizing PHP Variables in an External JavaScript: A Step-by-Step Guide

I am attempting to utilize an array generated in PHP within my external JavaScript. My PHP code retrieves images from a directory based on the user ID provided via URL and stores them in an array. I aim to use this array in JavaScript to create a photo sli ...

Is it not possible to generate HTML tags using jQuery and JavaScript in JSF?

I'm currently working with jsf 2.0 and richfaces 4.0 to develop my application. Occasionally, I incorporate jQuery and JavaScript functions for displaying and hiding elements. However, I've encountered an issue when trying to generate tags within ...

Establish a connection between two pre-existing tables by utilizing the Sequelize framework

I have two tables already set up (User and PaymentPlan), but they were not initially linked together. PaymentPlan.ts import { DataTypes, Model } from "sequelize"; import { sequelize } from "./DBConnections/SequelizeNewConnection"; exp ...

Is it Possible to Achieve Callbacks From AJAX to PHP Despite the Inability to Serialize Closures?

Question How can I incorporate a PHP callback passed via AJAX, where the callback is executed by the page requested through AJAX? The Scenario Comments are submitted through AJAX with parameters serialized and encrypted for security. The issue arises wh ...

Unable to show input in Javascript HTML

Every time I try to run this code on my webpage, the buttons do not seem to respond when clicked. I am aiming to have the user input for full name, date of birth, and gender displayed in the text box when the display button is clicked. When the next butt ...

Developed a hierarchical JSON format using a JavaScript array

I am aiming to generate a properly structured nested JSON file from an array, with unique key values. Currently, I can only output the JSON without any nesting. The desired structure to be displayed in the console is : { "Key" : "data1", "header" ...