Error Timeout Encountered by Langchain UnstructuredDirectoryLoader

I am facing an issue while trying to load a complex PDF file with tables and figures, spanning approximately 600 pages. When utilizing the fast option in Langchain-JS with NextJS Unstructured API, it partially works but misses out on some crucial data. On the other hand, selecting the hi_res option results in a timeout error. Despite adjusting the timeout settings to different values, the problem persists. I am willing to wait for the process to complete, so any assistance would be greatly appreciated.

ERROR:

error TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at UnstructuredLoader._partition (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/unstructured.js:139:26)
    at UnstructuredLoader.load (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/unstructured.js:154:26)
    at UnstructuredDirectoryLoader.load (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/directory.js:80:40)
    at run (e:\Web-Development\Developing\Nextjs\projects\gpt4-pdf\scripts\ingest.ts:48:21)
    at <anonymous> (e:\Web-Development\Developing\Nextjs\projects\gpt4-pdf\scripts\ingest.ts:78:3) {
cause: HeadersTimeoutError: Headers Timeout Error
    at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:9748:32)
    at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:8047:17)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7) {
code: 'UND_ERR_HEADERS_TIMEOUT'
 }
}

The code causing the error:

const options = {
    apiKey: process.env.UNSTRUCTURED_API_KEY,
    strategy: "hi_res",
    timeout: 10000, //Tried various from 10000-10000000
};

const unstructuredLoader = new UnstructuredDirectoryLoader(
  filePath,
  options
);

const rawDocs = await unstructuredLoader.load();

Answer №1

these are the available options for selection:

export type UnstructuredLoaderOptions = {
    apiKey?: string;
    apiUrl?: string;
    strategy?: StringWithAutocomplete<UnstructuredLoaderStrategy>;
    encoding?: string;
    ocrLanguages?: Array<string>;
    coordinates?: boolean;
    pdfInferTableStructure?: boolean;
    xmlKeepTags?: boolean;
};
type UnstructuredDirectoryLoaderOptions = UnstructuredLoaderOptions & {
    recursive?: boolean;
    unknown?: UnknownHandling;
};

Please choose a strategy:

strategy?: StringWithAutocomplete<UnstructuredLoaderStrategy>;

This defines the type of strategy to be chosen:

type UnstructuredLoaderStrategy = "hi_res" | "fast" | "ocr_only" | "auto"

If handling 600 pages with UnstructuredDirectoryLoader seems overwhelming, it is recommended to opt for the fast strategy. More information can be found here

The Unstructured document loader allows users to specify a strategy parameter that guides how the document should be partitioned. The currently supported strategies are "hi_res" (the default) and "fast". Hi res partitioning strategies offer higher accuracy but take longer processing time. Fast strategies partition the document quicker, sacrificing some accuracy. Not all document types support both hi res and fast partitioning strategies. For such cases, the specified strategy will be disregarded. Sometimes, the high res strategy may switch to fast if a necessary dependency is missing (e.g., a model for document partitioning).

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Problem with <meta> tag occurring when initial-scale is adjusted

Initially, in the index.html file: <meta name="viewport" content="width=device-width, initial-scale=1" /> I decided to modify it to: <meta name="viewport" content="width=device-width, initial-scale=2" /> ...

Unable to locate the JavaScript files within the NextJs and ReactJs project

I've encountered an issue when trying to import js files (which are libraries) in my project. I am currently using NextJS version 14.1.3 and ReactJS version 18.2.0. You can find the path to these files here Here is a glimpse of the project structure ...

Concealing URL in client-side fetch request within Next.js

A contact form built in Next.js makes use of the FormSubmit API to send emails upon clicking the submit button. Below is the code for the onSubmit handler: const handleSubmit = async (e) => { e.preventDefault(); const res = await fetch("https:/ ...

Performing JSON data extraction and conversion using JavaScript

Hello! I'm working with a script that converts data into an array, but I want to enhance it so that I can extract and convert data for each object (arb, astar, aurora, avax, baba, bsc, etc.), as shown in the screenshot. Here is the current script tha ...

What are the steps to enable a Vue component to handle its own transitions?

I am looking for a way to handle enter and leave animations in Vue when components are mounted or removed. My goal is to consolidate all the animation code into its own component for better organization. <template> <transition @enter="enter" ...

The JSON.parse function encountered an Uncaught SyntaxError due to an unexpected token 'o

I'm struggling with this JSON data: const info = [{ "ID":1,"Name":"Test", "subitem": [ {"idenID":1,"Code":"254630"}, {"idenID":2,"Code":"4566"}, {"idenID":3,"Code":"4566"} ] }]; console.log(JSON.parse(info)); //U ...

Can a single value be stored in a table using a radio button?

I have created an HTML table that is dynamically generated from a database. I am using a for loop to populate the table. However, I am facing an issue where each radio button in the table holds only one value. What I actually want is for each row to have ...

Is the branch of ExtJS 4.1 TreeStore lazy loading extending?

I am working on implementing lazy loading of tree branches in an MVC application using extjs4.1. The branches are located on different URLs and I have faced several challenges along the way. Unfortunately, at this point, the branching functionality is not ...

choose multiple elements from an array simultaneously

Looking for help with a basic Array question and seeking the most effective solution. The scenario involves having an array: var pathArr = [element1, element2, element3, element4, element5, element6] If I want to select multiple elements from this array ...

How can one resolve the error message that says "WebDriverError: Connection refused"?

I am facing an issue with running Protractor tests on my local machine. A few days ago, everything was working fine but now I am unable to run the tests even after rebooting Ubuntu. Here are the versions of different components: $cat /etc/issue Ubuntu 14. ...

How to optimize the utilization of Javascript variables within a Jquery function?

Currently, I am working on implementing an HTML5 min and max date range function. Initially, I wrote the code using variables and then embedded them in the correct attribute locations. However, after reviewing my code, my client (code reviewer) suggested ...

Showing a variety of pictures within a specified time frame

In my CSS, I have defined classes that allow me to display different background images on a page at set intervals: .image-fader { width: 300px; height: 300px; } .image-fader img { position: absolute; top: 0px; left: 0px; animation-name: imagefade; ...

Remove webpack functions from the bundle

After following a comprehensive tutorial on bundling files with webpack and some additional online research, I successfully created a configuration file that organizes the modules in my library into the specified structure: dist/node/weather.min.js dist/we ...

extract information from an external JSON document

I have a JSON file filled with data, along with a JSX file containing a button and a div. I'm looking to extract the data from the JSON file and display it in the div when the button is clicked. However, I'm at a loss on how to achieve this. The ...

When a page is changed, the Vue.js Active Menu color remains enabled

Check out my website at . I want to customize the navigation bar so that only the active page's navbar li is colored in red. <div class="navigation-items"> <ul class="nav-list"> <li class="nav-item"><nuxt-link to="/en" ...

What is the best way to incorporate a minimum width and maximum width in CSS that add up to 100% when necessary?

Can anyone help me find CSS rules that can set a fixed width for an element (width: 500px) but also allow it to be responsive with max-width: 100% if the container is narrower than the element? I saw this example and it works perfectly: .elem { width: 60 ...

Using Angular's built-in dependency injection with the $resource factory allows for

Question regarding Dependency Injection on factory resource: As far as I know, the following example is the recommended approach for injecting dependencies in Angular1: angular.module('myApp').factory('Resource', Resource); Resource. ...

extract keys and values from an array of objects

I would like assistance with removing any objects where the inspectionScheduleQuestionId is null using JS. How can we achieve this? Thank you. #data const data = [ { "id": 0, "inspectionScheduleQuestionId": 1, ...

Ensuring the accuracy of nested objects through class validator in combination with nestjs

I'm currently facing an issue with validating nested objects using class-validator and NestJS. I attempted to follow this thread, where I utilized the @Type decorator from class-transform but unfortunately, it did not work as expected. Here is my setu ...

What is the reason behind having to refresh the page or switch to another tab for the field to display?

Currently, I am in the final stages of completing my update form. However, I am facing an issue with the conditional field. The select field should display a conditional field based on the selected value. The problem I'm encountering is that I need to ...