Error Timeout Encountered by Langchain UnstructuredDirectoryLoader

I am facing an issue while trying to load a complex PDF file with tables and figures, spanning approximately 600 pages. When utilizing the fast option in Langchain-JS with NextJS Unstructured API, it partially works but misses out on some crucial data. On the other hand, selecting the hi_res option results in a timeout error. Despite adjusting the timeout settings to different values, the problem persists. I am willing to wait for the process to complete, so any assistance would be greatly appreciated.

ERROR:

error TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at UnstructuredLoader._partition (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/unstructured.js:139:26)
    at UnstructuredLoader.load (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/unstructured.js:154:26)
    at UnstructuredDirectoryLoader.load (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/directory.js:80:40)
    at run (e:\Web-Development\Developing\Nextjs\projects\gpt4-pdf\scripts\ingest.ts:48:21)
    at <anonymous> (e:\Web-Development\Developing\Nextjs\projects\gpt4-pdf\scripts\ingest.ts:78:3) {
cause: HeadersTimeoutError: Headers Timeout Error
    at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:9748:32)
    at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:8047:17)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7) {
code: 'UND_ERR_HEADERS_TIMEOUT'
 }
}

The code causing the error:

const options = {
    apiKey: process.env.UNSTRUCTURED_API_KEY,
    strategy: "hi_res",
    timeout: 10000, //Tried various from 10000-10000000
};

const unstructuredLoader = new UnstructuredDirectoryLoader(
  filePath,
  options
);

const rawDocs = await unstructuredLoader.load();

Answer №1

these are the available options for selection:

export type UnstructuredLoaderOptions = {
    apiKey?: string;
    apiUrl?: string;
    strategy?: StringWithAutocomplete<UnstructuredLoaderStrategy>;
    encoding?: string;
    ocrLanguages?: Array<string>;
    coordinates?: boolean;
    pdfInferTableStructure?: boolean;
    xmlKeepTags?: boolean;
};
type UnstructuredDirectoryLoaderOptions = UnstructuredLoaderOptions & {
    recursive?: boolean;
    unknown?: UnknownHandling;
};

Please choose a strategy:

strategy?: StringWithAutocomplete<UnstructuredLoaderStrategy>;

This defines the type of strategy to be chosen:

type UnstructuredLoaderStrategy = "hi_res" | "fast" | "ocr_only" | "auto"

If handling 600 pages with UnstructuredDirectoryLoader seems overwhelming, it is recommended to opt for the fast strategy. More information can be found here

The Unstructured document loader allows users to specify a strategy parameter that guides how the document should be partitioned. The currently supported strategies are "hi_res" (the default) and "fast". Hi res partitioning strategies offer higher accuracy but take longer processing time. Fast strategies partition the document quicker, sacrificing some accuracy. Not all document types support both hi res and fast partitioning strategies. For such cases, the specified strategy will be disregarded. Sometimes, the high res strategy may switch to fast if a necessary dependency is missing (e.g., a model for document partitioning).

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

When the user clicks, the template data should be displayed on the current page

I need help with rendering data from a template on the same HTML page. I want to hide the data when the back button is clicked and show it when the view button is clicked. Here is my code: <h2>Saved Deals</h2> <p>This includes deals wh ...

What's the reason my JavaScript isn't functioning properly?

Brand new to coding and I'm delving into the world of Javascript for the first time. In my code, there's a checkbox nestled within an HTML table (as shown below). <td><input type="checkbox" id="check1"/> <label th:text="${item.co ...

Is there a way to modify a specific key value in all embedded objects using Mongoose?

I'm currently working on a chat application and I'm facing a challenge with implementing the 'seen' functionality. I need to update all chat messages in the database by setting the 'seen' column to true. Here is the schema st ...

Comparing Optimistic Updates and Tag Invalidation in RTK Query

I found a basic code example from the RTK query documentation: updatePost: build.mutation<void, Pick<Post, 'id'> & Partial<Post>>({ query: ({ id, ...patch }) => ({ url: `posts/${id}`, method: 'PUT', ...

How can you animate the background of a website using AngularJS - CSS or JavaScript?

My aim is to create a dynamic animation for the background image when the view changes. The current background image is set through a function defined within MainController: // app/js/controllers.js $scope.getBg = function() { return $route.current.sco ...

Retrieving data from a div container on an active website

Imagine having a website similar to , where you want to automatically retrieve the time value every second and save it in a .txt file. Initially, I considered using OCR (optical character recognition) software for this task, but soon realized that relying ...

Troubleshooting problems with data binding in Angular Ionic

Just starting out with Angular and experimenting with building an app in Ionic. I have a screen with 2 input fields and I want to achieve the following. When a user inputs something in the price field, I want the weight field to update accordingly. Simil ...

Mastering data extraction from JSON using React JS (with Axios)

Being new to ReactJS and axios, I am facing a challenge. I need to iterate through JSON data and extract values where the key is a number (e.g. 0, 1, 2...). However, I am unsure how to implement this in my code since the server provides dynamic JSON data ...

Inserting a Specific Iframe into a Designated Location in HTML with the Help of Jquery

Currently, I am encountering an issue with placing a dynamically created iframe inside a specific section of my webpage. The iframe is supposed to be contained within a div element named "maps", but instead it is appearing at the bottom of the page.This ma ...

Transferring data using a JavaScript enhanced form

I'm currently working on a search page that showcases results in a table format. I am looking to enhance the functionality using Javascript. The table is contained within a form, and each row offers multiple actions, such as adding a comment. While I ...

Tips for showing variables in Console directly from the Sources panel?

As I dive into debugging front-end code in Chrome, I am encountering a question that may seem basic to experienced developers. Specifically, while exploring the Sources panel within Chrome's Dev Tools, I find myself hovering over a variable labeled _ ...

Limit the selection of 'pickable' attributes following selections in the picking function (TypeScript)

In the codebase I'm working on, I recently added a useful util function: const pick = <T extends object, P extends keyof T, R = Pick<T,P>>( obj: T, keys: P[] ): R => { if (!obj) return {} as R return keys.reduce((acc, key) => { re ...

Creating dynamic routes in express to enable flexible and customizable paths

Exploring the dynamic usage of paths in Express has been on my mind. Specifically, I have been employing lodash to locate a path in a separate file using regex methods. routes.js const json = require('./routes.json') const _ = require('l ...

Accessing the parent scope from a directive within a nested ng-repeat in AngularJs

Seeking guidance on accessing the parent scope within a directive nested in an ng-repeat. Here is an example: <div ng-app="myApp" ng-controller="myCtrl"> <div ng-repeat="section in sections"> {{section.Name}} <div ng-rep ...

Tips for choosing the desired test to execute with Nightwatch Programmatic API

Currently, I am in the process of developing a web application that enables me to execute Nightwatch tests through a visual interface. At this point, I have successfully been able to run all my tests using a post request from my web app utilizing the Nig ...

Following my ajax submission, the functionality of my bootstrap drop-down menu seems to have been compromised

I'm having an issue with my login page. After implementing Ajax code for the reset password feature, the dropdown menu on the login page doesn't work properly when wrong details are entered and the page reloads. I've tried using the $(' ...

Trouble detecting click event in jQuery after triggering radio button in HTML

Encountering a peculiar jQuery issue where triggering a click on a radio button does not fire completely and is ignored by an on click function, while a similar call to the jQuery trigger method is successfully captured. In the below jQuery snippet, a < ...

Update the DOM if the index of any data elements have been modified

Can Vue.js detect the swapping of array elements in my data object? data: { list: [ 'Foo', 'Bar', 'Test' ] } This is the method I am using to swap entries: swapIndex: function(from, to) { var first = this ...

Insert DOM elements at the start of the parent element

I'm currently using the following JavaScript code to insert AJAX responses into a div with an ID of results: document.getElementById("results").innerHTML=xmlhttp.responseText; The issue I am encountering is that this code adds all new elements after ...

Combining Objects in an Array using Node.js: A Step-by-Step Guide

I am working with Node.js and have 3 different sets of data. The first set contains: [ { "userId":"54c7f3ef-64d4-40de-8100-d2ec81e8aaf3", "dailyData":159392.235451, "dailyDataInUSC":255.28 ...