Issue: I am facing a challenge with handling large files (>10GB). At times, I need to process the entire file while other times I only need to sample a few lines.
The processing mechanism involves a pipeline:
pipeline(
inStream,
split2(),
sc,
err => {
...
}
);
sc
is a transformation that primarily counts specific flags in the file.
Everything works smoothly when processing the complete file, however, it fails to generate output if I try to exit the transformation before inStream
completion.
_transform(chunk, encoding, done) {
let strChunk = decoder.write(chunk);
if(strChunk === '\u0003') {
this.push('\u0003');
process.exit(0;
}
if (strChunk.startsWith("@")) {
done();
} else {
if (this.sampleMax === 0) {
this.push('\u0003');
} else if (this.sampleMax > 0) {
this.sampleMax--;
}
let dta = strChunk.split("\t");
let flag = dta[1].trim();
this.flagCount[flag]++;
done();
}
Using //process.exit(0)
, prevents reaching the code following sc
in the pipeline. On the other hand, solely utilizing this.push('\u0003');
results in processing the full inStream.
The main dilemma is how to correctly end the transform and proceed with the downstream pipeline without thoroughly reading inStream
.