Bun: Execute the file one line at a time

When working with NodeJs, I often use the fs module and the readline interface to read large text files line by line. However, I'm looking for a similar solution using Bun instead.

I attempted to read the file with this code snippet:

const input = Bun.file("input.txt");

Unfortunately, this method behaves like fs.readFile, which does not meet my needs for handling extremely large files.


Here is an example using Nodejs:

const readline = require("readline");
const fs = require("fs");


const input = fs.createReadStream("file.txt", {
  encoding: "utf16le",
});
const rl = readline.createInterface({ input });

rl.on("line", (line) => {
  // process each line here
})

Answer №1

As mentioned by @GrafiCode in the comments

const file = Bun.file("foo.txt"); 

This code snippet does not directly read the file from the disk, instead it creates a BunFile object which can be read in various ways, one of them being through streaming

await file.stream(); // contents as ReadableStream

For more information, you can visit this link

A demonstration on how to read a file line by line

async function readFileLineByLine(filePath, onLineRead, onFinish) {
  const file = Bun.file(filePath);

  const stream = await file.stream();
  const decoder = new TextDecoder();

  let remainingData = "";

  for await (const chunk of stream) {
    const str = decoder.decode(chunk);

    remainingData += str; // Append the chunk to the remaining data

    // Split the remaining data by newline character
    let lines = remainingData.split(/\r?\n/);
    // Loop through each line, except the last one
    while (lines.length > 1) {
      // Remove the first line from the array and pass it to the callback
      onLineRead(lines.shift());
    }
    // Update the remaining data with the last incomplete line
    remainingData = lines[0];
  }

  return onFinish();
}

function onLineRead(line) {
  console.log("Line read: " + line);
}

function onFinish() {
  console.log("File read successfully");
}

readFileLineByLine("2.txt", onLineRead, onFinish);

Answer №2

NodeJS versus BunJS

In addition to Sarkar's response, another option is to utilize the node:fs library (which is fully compatible with bun).

There is an existing discussion on this topic in a node environment: Dan Dascalescu responds to Alex C's inquiry (Read a file one line at a time in node.js?).

Performance Evaluation

There are noticeable discrepancies in terms of performance. I conducted a comparison between Sarkar's bunjs solution using bun:stream and Dan Dascalescu's nodejs solution using node:stream. Additionally, two scenarios were tested where files were read as bulk and then split line by line (node:bulk and bun:bulk). The following are the results of the analysis.

Versions used: node v20.11.1, bun v1.1.6

📋 Results Run 1 (file: 5 MB)

  ⏱ node:stream = 24 ms
  ⏱ node:bulk = 4 ms
  ⏱ bun:stream = 8 ms
  ⏱ bun:bulk = 3 ms

📋 Results Run 2 (file: 368 MB)

  ⏱ node:stream = 470 ms
  ⏱ node:bulk = 204 ms
  ⏱ bun:stream = 716 ms
  ⏱ bun:bulk = 727 ms

📋 Results Run 3 (file: 5.5 GB)

  ⏱ node:stream = 18057 ms
  ⏱ node:bulk = 40004 ms
  ⏱ bun:stream = 25658 ms
  ⏱ bun:bulk = 35215 ms

📋 Results Run 4 (file: 12.8 GB)

  ⏱ node:stream = 46784 ms
  ⏱ bun:stream = 44367 ms

Conclusive Remarks

Unsurprisingly, stream-based file reading proves to be more efficient for larger files (approximately over 1GB), while bulk file reading exhibits better efficiency for smaller files (allowing for adaptability based on file size). When comparing node and bun solutions, it seems that bun's solution is notably slower than the node:fs solution (particularly evident for files > 100MB) up until files reach around 10GB in size. For larger files though, bun's performance starts catching up to node:fs.

This comparison could be further expanded upon (considering additional file sizes, impact of file structure - long lines vs short lines, etc.) in order to develop a smart switch mechanism for utilizing the most effective method based on specific file sizes.

Answer №3

I conducted my own performance tests.

Technique 1: Utilizing {createReadStream} from 'node:fs'

#!esn -i
import {createReadStream} from 'node:fs'

async function* readLines(filePath: string) {
    const stream = createReadStream(filePath, {
        encoding: 'utf-8',
    })

    let leftover = ''
    for await (const piece of stream) {
        let lines = (leftover + piece).split(/\r?\n/)
        leftover = lines.pop()!

        for(const line of lines) {
            yield line
        }
    }

    if(leftover) {
        yield leftover
    }
}


let totalLines = 0;
for await(const line of readLines(`${__dirname}/data.txt`)) {
    ++totalLines;
}
console.log(totalLines)

Technique 2: Using Bun.file

#!esn -i

async function* readLines(filePath: string) {
    const reader = Bun.file(filePath).stream().pipeThrough(new TextDecoderStream('utf-8')).getReader()

    let leftover = ''
    while(true) {
        const {value, done} = await reader.read()
        if(done) break
        let lines = (leftover + value).split(/\r?\n/)
        leftover = lines.pop()!

        for(const line of lines) {
            yield line
        }
    }

    if(leftover) {
        yield leftover
    }
}


let totalLines = 0
for await(const line of readLines(`${__dirname}/data.txt`)) {
    ++totalLines
}
console.log(totalLines)

Performance Test Results

#!esn
import {$} from 'bun'

await $`hyperfine --warmup 3 ${[
    `bun run ${__dirname}/node-read-stream.ts`,
    `bun run ${__dirname}/bun-stream.ts`,
]}`

Outcomes:

Test 1: bun run .../bench/node-read-stream.ts
  Time (mean ± σ):      89.4 ms ±   1.5 ms    [User: 90.0 ms, System: 55.1 ms]
  Range (min … max):    87.1 ms …  94.5 ms    32 runs

Test 2: bun run .../bench/bun-stream.ts
  Time (mean ± σ):      83.8 ms ±   1.2 ms    [User: 93.9 ms, System: 48.5 ms]
  Range (min … max):    82.2 ms …  87.6 ms    34 runs

Summary
  'bun run .../bench/bun-stream.ts' outperformed
    'bun run .../bench/node-read-stream.ts' by a margin of
    1.07 ± 0.02 in terms of speed.

The dataset comprises 99,991 lines of lorem ipsum.

Bun 1.1.25-canary.18+98a709fb1 on WSL environment.

Too Long; Didn't Read: Bun.file (Technique 2) showed slightly better performance.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Display Default Image in Vue.js/Nuxt.js when Image Not Found

I'm currently working on implementing a default image (placeholder image) for situations where the desired image resource is not found (404 error). I have a dictionary called article which contains a value under the key author_image. Although the stri ...

What does the JS Array Filter function return?

Is it necessary to include a return statement in the filter function when NOT using ES6 arrow syntax? Could the two separate filter functions be combined into one for efficiency? cars.filter(function(car, index) { return car[0].setAttribute("data-ori ...

JavaScript text spacing

I'm currently utilizing the JavaScript code snippet below to clear out the existing content within a div element in order to insert new elements. However, I've encountered an issue where once the div is cleared, its CSS styling is lost. I am atte ...

Setting default date and time for Bootstrap datetimepicker only in the expanded calendar view

Here is the setup: $(function () { $('#datetimepicker').datetimepicker({ defaultDate: moment(), sideBySide: true }); }); This configuration allows setting a default date & time when no value is provided for the f ...

How come this 2-dimensional array is adding elements unexpectedly?

I just realized my error where the tempRndNumber was being reset in the inner loop. Although, I am still encountering an issue where "," characters are appearing in the array. The goal is to create a 2D array that is populated only when a random number me ...

Using conditional subscriptions in Meteor at the template level

When working with the same template in different routes using a conditional publication method, I encountered an issue where the data was not being subscribed to as expected. The console log returned an empty array. server/publication.js Meteor.publish(& ...

Improving an unspecified JavaScript function

After taking inspiration from another website, I incorporated this code snippet into my project, only to find that it's not functioning properly. What could be the issue? var LessonsCustomControl = (function LessonsCustomControl_constructor(){ va ...

What happens when ES6 async/await interacts with Observables and streams during failures?

Recently, I attempted to reproduce this code from a GitHub repository (link provided). It worked as intended, but I encountered an issue with unhandled promise warnings. Where should I place the catch statement in a situation like this, if necessary? Are ...

Make sure that the TextBox OnTextChanged event in ASP.NET triggers a "setTimeout" function before the OnClick event is fired

Imagine the following situation: <asp:TextBox ID="txt" runat="server" AutoPostBack="true" OnTextChanged="txt_TextChanged"></asp:TextBox> <asp:Button ID="btn" runat="server" OnClick="btn_Click" CausesValidation="false" UseSubmitBehavior="fal ...

Implement a feature on React using hooks that detects when a user clicks

I'm attempting to utilize React hooks to check if a user has clicked outside of an element. I'm using useRef to grab a reference to the element. Could someone help me troubleshoot this? I'm encountering the following errors and referencing ...

How to iterate through two arrays using AngularJS ng-repeat

I have been attempting to create a specific layout by iterating through two arrays However, the output I am receiving from the ng-repeats does not match my desired view Below is the current code that I am working with: $scope.properties = ["First name", ...

Runtime hydration error triggered by React Interweave Library

I'm currently working on a react component that involves fetching data from an array of strings containing HTML code. To iterate through this array and render a table, I am using the Interweave library. The issue I'm facing is that although the r ...

endless refreshing material ui accordion

Facing an issue with infinite rerender while trying to create a controlled accordion component using Material UI accordion. Here is the code snippet, any insights on why this could be causing an infinite rerender? const [expanded, setExpanded] = React.us ...

The essential guide to creating a top-notch design system with Material UI

Our company is currently focusing on developing our design system as a package that can be easily installed in multiple projects. While the process of building the package is successful, we are facing an issue once it is installed and something is imported ...

Tips for canceling an http request in angularjs when the requesting controller has exited its scope

Developed a custom angularjs application with ng-view for loading four different modules using route provider. Each module makes HTTP requests as shown below: var requestManager = { "locations": {}, "employees": {}, "items": {}, "templates ...

PHP versus JavaScript: A Comparison of XML Parsing Performance

As I delve into creating a flickr plugin for WordPress, I can't help but notice the performance difference between the PHP and Javascript scripts I've written. Interestingly, my JavaScript code seems to be running faster than its PHP counterpart. ...

Subscriber client successfully activated and operational via the command line interface

I have incorporated a script into my PHP file that reads the Pusher channel and performs various actions when a new event occurs on the specified channel. If I access the following URL in my browser: http:/localhost/pusher.php and keep it open, the p ...

Adding QML code into a Jade file

Currently working on developing a straightforward video streaming application using Node.js and integrating the WebChimera plugin. The player configuration is done in QML with Chimera, and I am facing numerous errors during the compilation process in Jade. ...

displaying li tag depending on the index value within angularjs

I have an HTML structure similar to the following: <div class="main"> <ul> <li ng-repeat='save in saves'> <h3>{{save.name}}</h3> <div > <ul> ...

Clicking on the input triggers the appearance of a border using the OnClick function

I am currently developing my own website with a login feature that requires input tags of text-type. I would like to implement a functionality where clicking on these input tags will display a border, even when the mouse is not directly hovering over them. ...