I am currently dealing with a large text file consisting of over 852000 lines, each containing song verses preceded by different numbers like 1.
, 134-20.
, or 1231.
. The verses may have four or more lines. Additionally, there are variations within the lines that I need to ignore for now.
This is the code I've been struggling with and haven't achieved satisfactory results so far:
$.ajax({url:"LD.txt",dataType:'text',success:function(data){
//var lines=data.match(/(.*)\r\n(^[A-Z].*)+/mg);
var lines=data.match(/(.*)(^[A-Z].*)+/mg);
for(var i=0;i<50/*lines.length*/;i++){
var line=lines[i].replace("\r\n","");console.log(i+" "+line);
}}});
Here is an excerpt from the UTF-8 text file:
/* 1970 #1.# PAR DZIESMĀM UN DZIEDAŠANU
#1. Dziesmas un dziedašana vispāriga tautas manta un cilvēka mūža pavadoņi.
1.Dziesmas visai Latvijai kopeja manta. */
15.
Dziesmiņ' mana, kā dziedama,
Ne ta mana pamanita;
Vecā māte pamācija,
Aizkrāsnē tupedama.
#279a.
16.
Māci, māte, man' dziedāt,
Mā...
The javascript solution I'm aiming for should allow searching for specific words in the text input. For example, if one searches for the exact word dziedama
, the output should display the preceding number (which could be several lines before) along with the verse part containing the searched word highlighted in bold.
15. Dziesmiņ' mana, kā <b>dziedama</b>, Ne ta mana pamanita; Vecā māte pamācija, Aizkrāsnē tupedama.
If the search query contains an asterisk like dzie*
, the full word should be shown in bold within the results.
15. <b>Dziesmiņ'</b> mana, kā <b>dziedama</b>, Ne ta mana pamanita; Vecā māte pamācija, Aizkrāsnē tupedama.
16. Māci, māte, man' <b>dziedāt</b>, Māc' ar vienu Dieva <b>dziesmu</b>, Ko <b>dziedās</b> dvēselite, Pie Dieviņa aizgājuse.
...
The search functionality should also cover words with an asterisk at the beginning like *esmu
, which can match variations such as dziesmu
, iesmu
, Dievadziesmu
, etc., with variable characters hidden behind the asterisk.
If the query includes letters followed by a question mark like dzied?
, the search should return verses containing similar words like dziedu
, dziedi
, etc., with one character represented by the question mark.
In case the search query is enclosed in double quotes like vienu Dieva
, it should precisely match the sequence of words in the verses.
The search should support diacritics-rich text and also provide options for normalization without diacritics.
Thank you for your assistance!