Tips for storing a JSON file with GridFS

In my possession is an extensive dataset. Utilizing mongoose schemas, each data element has a structure resembling the following:

    {
      field1: “>HWI-ST700660_96:2:1101:1455:2154#5@0/1”: 
      field2: “GAA…..GAATG”
    }

Reference: Reading an FASTA file

The individual elements are relatively simple and small, yet abundant in number, with a collective size of over 200MB.

The dilemma I face is that I am unable to store it in mongo due to its large size (> 200MB).

While I have come across GridFS as a potential solution,

  • All available resources primarily focus on uploading images and videos;

  • No guidance has been provided on how to retain the functionality of mongoose schema;

  • The existing examples do not allow user-defined paths for saving the data, which is common in mongoose settings.

In a basic setting: how would I go about saving a JSON file using GridFS or a similar approach, akin to working with small JSON files? What are the advantages and disadvantages of this method compared to other alternatives? Is my proposed approach considered valid? Specifically, utilizing a hierarchy of JSON files and later populate function has proven effective!

As a demonstration of saving a JSON file with mongoose:

Model.create([        
          {
          field1: “>HWI-ST700660_96:2:1101:1455:2154#5@0/1”: 
          field2: “GAA…..GAATG”
        }, 
        {
          field1: “>HWI-ST700660_96:2:1101:1455:2154#5@0/1”: 
          field2: “GAA…..GAATG”
        }]);

In the above example, only two-element JSON file was saved. For larger files, I must divide them into smaller chunks (such as 1%) and structure them accordingly, as mentioned earlier, at least that is my current solution.

My concern is that I might be reinventing the wheel. While I can save the files independently, there is a need for correlation among them since they belong to the same file, similar to segments of an image belonging together.

This is my current solution, devised using my own methodology! Although it does not incorporate GridFS, suggestions involving GridFS are still welcomed. It relies solely on JSON files, breaking down the document into smaller pieces arranged in a hierarchical tree fashion.

https://i.stack.imgur.com/QYJXt.png

The issue has been resolved utilizing this diagram. Yet, out of curiosity, I am interested to explore whether achieving something similar using GridFS is possible for educational purposes.

Discussion

Initially, I attempted to maintain them as subdocs, which failed. Subsequently, I tried preserving just their ids, which amounted to 35% of the entire chunk and exceeded 16MB: again unsuccessful. Finally, I settled on creating a placeholder document to store the ids exclusively, resulting in success!

Answer №1

It is highly unlikely to be beneficial to store data in Mongo using GridFS.

Storing binary data in a database is generally not recommended. However, for small data, the advantages of being able to query it might outweigh the drawbacks such as server load and slow processing.

If you intend to store JSON document data in GridFS, treat it like any other binary data. Keep in mind that the stored data will remain opaque, meaning you can only access file metadata but not the JSON content itself.

Handling Large Data Queries

If querying data is crucial for your needs, assess the data format first. If the data structure resembles the example provided where simple string matching suffices for queries, consider these options:

Scenario 1: Big Data with Minimal Points

If you have few sets of data but each set contains large amounts of information, consider storing the bulk data elsewhere and referencing it instead. For instance, save the actual data in an external file on Amazon S3 and store the link in your MongoDB entry.

{
  field1: “>HWI-ST700660_96:2:1101:1455:2154#5@0/1”,
  field2link: "https://my-bucket.s3.us-west-2.amazonaws.com/puppy.png"
}

Scenario 2: Numerous Small Data Points

If individual datasets are relatively small (under 16 MB) but there are many of them, opt to save the data directly in MongoDB without utilizing GridFS.

Data Storage Approaches

Given your circumstances involving sizable data, leveraging GridFS could prove inefficient.

A provided benchmarking analysis suggests retrieval time scales substantially based on file size. In a comparable setup, fetching a document from the database could take up to 80 seconds.

Possible Enhancements

The default chunk size in GridFS typically stands at 255 KiB. Consider boosting this value to the maximum permissible limit (16 MB) to optimize larger file access times. Alter the chunk size setting while initializing the GridFS bucket.

new GridFSBucket(db, {chunkSizeBytes: 16000000})

Alternatively, for improved efficiency, merely store filenames within Mongo entries and retrieve corresponding files directly from the filesystem.

Additional Considerations

Another potential downside of storing binary data in Mongo has been highlighted by this source: "If the binary data is extensive, loading it into memory may impede access to frequently used text documents or exceed available RAM capacity, affecting overall database performance."

Illustrative Instance

An adapted example of saving a file in GridFS can be found in the Mongo GridFS tutorial.

const uri = 'mongodb://localhost:27017/test';

mongodb.MongoClient.connect(uri, (error, db) => {
  const bucket = new mongodb.GridFSBucket(db);

  fs.createReadStream('./fasta-data.json')
    .pipe(bucket.openUploadStream('fasta-data.json'))
    .on('finish', () => console.log('done!'))
  ;
});

Answer №2

After exploring different options, I have discovered a more efficient way to address this issue compared to the method described in the original question. Utilizing Virtuals has proven to be incredibly effective!

Initially, I had concerns about using ForEach to append an additional element to the Fasta file, fearing potential slowdowns. However, my worries were unfounded as the process turned out to be quite speedy!

My solution involves modifying each Fasta file structure as shown below:

{
  Parentid: { type: mongoose.Schema.Types.ObjectId, ref: "Fasta" }//include this new line with its parent id
  field1: “>HWI-ST700660_96:2:1101:1455:2154#5@0/1”: 
  field2: “GAA…..GAATG”

}

Subsequently, I implement the following code snippet:

FastaSchema.virtual("healthy", {
  ref: "FastaElement",
  localField: "_id",
  foreignField: "parent",
  justOne: false,
});

Finally, I use the populate function:

  Fasta.find({ _id: ObjectId("5e93b9b504e75e5310a43f46") })
    .populate("healthy")
    .exec(function (error, result) {          
      res.json(result);
    });

This approach effectively avoids complications related to subdocument overload. Populating the Virtual proves to be swift and does not lead to any overload issues. While I haven't formally tested it yet, I am curious to compare its performance with conventional populate methods. One clear advantage is the elimination of the need for storing ids in hidden documents.

I am astonished by the elegance of this straightforward solution, which emerged while responding to another inquiry on this platform!

Kudos to mongoose for enabling such seamless functionality!

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Retrieve a nested JSON item using Java code

I'm working with a JSON object that has the following structure: { "accessToken" : "<dont need this>", "clientToken" : "<nor this>", "selectedProfile" : { "id" : "<nope>", "name" : "<I need this>", ...

Having difficulty assigning a value to a specific element within an array

When a button labeled "update" is clicked in an array called admin, I would like to display a div. The goal is for the div to appear below the selected element only and not affect any others in the array. function Admin(props) { const [showMe, setShowMe ...

Users are reporting that verification emails are not being sent when the Accounts.createUser function is used within

I have a simple meteor method set up to create user accounts. In my server/methods.js file: Meteor.methods({ createUserAccount: function(user) { return Accounts.createUser(user); } }); Then in my server/init.js file: Meteor.startup(function() ...

Displaying only one modal at a time with Bootstrap 3

The code snippet below is used to trigger my newsletter modal: $(window).load(function(){ if (sessionStorage.getItem("is_seen") === null) { setTimeout(function(){ $('#newsletter_modal').modal('show&ap ...

Initiate the Material TextField onChange event from within a nested component

I am currently working on a component that looks like this: class IncrementField extends Component { inputRef; changeValue() { this.inputRef.value = parseInt(this.inputRef.value) + 1; } render() { const { ...other } = this.props; return ( ...

HTML code featuring multiple dropdown menus, each equipped with its own toggleable textarea

I have multiple HTML drop downs, each triggering a textarea based on the selection. Currently, I'm using show and hide JavaScript functions for each question individually. Is there a way to streamline this so that I don't have to write separate c ...

Deconstructing JavaScript scripts to incorporate HTML5/CSS3 functionality for outdated browsers such as Internet Explorer

I have been researching various resources and guides related to different scripting libraries, but none of them seem to address all the inquiries I have regarding performance and functionality. With so many scripts available, it can be overwhelming to dete ...

Switching Next.js JavaScript code to Typescript

I am currently in the process of transforming my existing JavaScript code to TypeScript for a web application that I'm developing using Next.Js Here is the converted code: 'use client' import React, { useState, ChangeEvent, FormEvent } fro ...

Personalized modify and remove elements on a row of the DataGrid material-ui version 5 component when hovered over

In my React Js app, I am utilizing Material UI components or MUI v5 as the UI library for my project. Within the DataGrid/DataGridPro component, I am implementing a custom edit and delete row feature. The requirement is to display edit and delete icons w ...

Developing a custom function within an iterative loop

Can someone assist me with a coding problem? I have these 4 functions that I want to convert into a loop: function Incr1(){ document.forms[0].NavigationButton.value='Next'; document.PledgeForm.FUDF9.value='Y1'; document.fo ...

Implementing dynamic content updating in WordPress by passing variables and utilizing AJAX

Currently, I am working on a shopping page that displays a list of all the stores. To streamline the user experience, I have created a sidebar containing various categories and implemented pagination within the store listings. These lists are generated thr ...

Eliminate null objects from a JSON array with the help of GSON

{ "ChangeRequests": [ {} ] } Utilize Gson to remove the empty model from the JSON array. To achieve this, add a new model inside the list where all values are set to null using Gson. data class TestRequest( @SerializedName("ChangeRequests") val ...

Turning a JSON dot string into an object reference in JavaScript: A simple guide

Having a JSON object labeled test with values like this: {"items":[{"name":"test"}]}, I need a way to apply the string items[0].name to it in order to search for a specific value (test.items[0].name). Currently, my only idea is to create a function that pa ...

"How to retrieve the height of an element within a flexslider component

I need some assistance with using JavaScript to determine the height of an element within a flexslider. There are two challenges I am facing. When I attempt to use a regular function getHeight(){ var h = document.getElementById("id-height").style.height; ...

What is the best method for concealing a specific element on the screen using ReactJS?

I'm looking for a way to have text displayed on the screen that is only hidden when a button is pressed, but I'm struggling to figure it out. I had the idea of using useState in this way: const [textVisibility, setTextVisibility] = useState(true) ...

Exploring AngularJS and Jasmine: Testing a controller function that interacts with a service via $http

I encountered an issue while testing a controller that relies on a service. The problem arises because the service is currently set to null in order to focus solely on testing the controller. The current test setup is failing due to the BoardService being ...

Definition in Typescript: The term "value is" refers to a function that takes in any number of arguments of

export function isFunction(value: any): value is (...args: any[]) => any { return typeof value === 'function'; } What is the reason behind using value is (...args: any[]) => any instead of boolean ? ...

What could be causing AngularJS to fail to send a POST request to my Express server?

I am currently running a Node Express server on localhost that serves a page with AngularJS code. Upon pressing a button on the page, an AngularJS controller is triggered to post a JSON back to the server. However, I am facing an issue where the post requ ...

Implementing a Timer on an HTML Page with JavaScript

I am looking to add a timer to my HTML page using JavaScript. The timer should get the selected date and time from an input field <input type="date" /> and control it. If the time difference between the current time and the selected time is less than ...

What is the best way to implement TrackballControls with a dynamic target?

Is there a way to implement the three.js script TrackballControls on a moving object while still allowing the camera to zoom and rotate? For example, I want to track a moving planet with the camera while giving the user the freedom to zoom in and rotate ar ...