I am facing an issue with log files that are saved as raw text without any control over how they were written. These log files store data in a streaming manner, making it challenging to parse the content where each line begins with an index.
Upon examining the log files and expected output provided below, I noticed that they always start with a 13-digit index (possibly padded), which I assumed as the starting point for each line. My approach involved splitting the content using this index to process the initial lines. However, while implementing this solution in a loop, I realized that my usage of split was incorrect as it only identifies line endings rather than beginnings.
Despite this setback, I am looking for an easy fix to refine my current approach and achieve the desired outcome. Any suggestions or guidance on enhancing this partial solution would be greatly appreciated.
var reader = new FileReader();
var output = [];
reader.readAsText(f, "UTF-8");
// if file read successful then text string stored in the result property of FileReader()
reader.onload = function(evt){
var fileContents = evt.target.result;
var index = fileContents.slice(0,13);
var lines = fileContents.split(index);
// Continue splitting until we fail (nothing split = 1)
//while(lines.length > 1){
for(var i = 0; i < lines.length; i++){
output.push(index + ' ' + lines[i] + '<br>')
}
// go to next lines
index++;
lines = fileContents.split(index);
//}
document.getElementById('content').innerHTML = '<ul>' + output.join('') + '</ul>';
}
Content of the provided log file:
1564001512016 INFO: LOG MANAGER jdshfkjaafhdskfdsajfdsadsfj 1564001512016 INFO: some test stuff 1564001512016 INFO: kjhdshfakhfdskjdshkjfdsh 1564001517 INFO: hjkdsahfjkfhdskjfdsahkfdskjfdsakjfds 1564001517 INFO: hdskjahfjfdshdfsahfdsajfdsa
Current Output:
1564001512016 INFO: LOG MANAGER jdshfkjaafhdskfdsajfdsadsfj
1564001516 INFO: some test stuff
1564001516 INFO: kjhdshfakhfdskjdshkjfdsh 1564001517 INFO: hjkdsahfjkfhdskjfdsahkfdskjfdsakjfds 1564001517 INFO: hdskjahfjfdshdfsahfdsajfdsa
Desired Output:
1564001512016 INFO: LOG MANAGER jdshfkjaafhdskfdsajfdsadsfj
1564001516 INFO: some test stuff
1564001516 INFO: kjhdshfakhfdskjdshkjfdsh
1564001517 INFO: hjkdsahfjkfhdskjfdsahkfdskjfdsakjfds
1564001517 INFO: hdskjahfjfdshdfsahfdsajfdsa
Update: Addressing the provided answer, I tailored the code snippet below accordingly. Notable modifications include reintroducing the 'INFO' string removed by split and assigning the value of 'i' to a variable 'x' to prevent incrementation at every iteration:
var fileContents = evt.target.result;
var regex = /(\d{13}) INFO:/
var lines = fileContents.split(regex);
// Starting from 1 as split consistently returns empty at index 0
for(var i = 1; i < lines.length; i+=2){
var x = i;
var index = lines[x]
var context = lines[x+1]
// \xa0 = space
output.push('<li>' + index + "\xa0INFO:\xa0\xa0" + context + '</li>')
}
document.getElementById('content').innerHTML = output.join('') + '</br>';
Final Output:
1564001512016 INFO: LOG MANAGER jdshfkjaafhdskfdsajfdsadsfj
1564001516 INFO: some test stuff
1564001516 INFO: kjhdshfakhfdskjdshkjfdsh
1564001517 INFO: hjkdsahfjkfhdskjfdsahkfdskjfdsakjfds
1564001517 INFO: hdskjahfjfdshdfsahfdsajfdsa