Having difficulty retrieving text data from a web URL using JavaScript

I am trying to extract text data from a web URL ()

My approach involved using two node modules.

1) Using crawler-Request

it('Read Pdf Data using crawler',function(){
        const crawler = require('crawler-request');
        function response_text_size(response){
            response["size"] = response.text.length;
            return response;
        }
        crawler("http://www.africau.edu/images/default/sample.pdf",response_text_size).then(function(response){
            // handle response

            console.log("Response =" + response.size);
        });

    });

The issue here is that it does not print anything on the console as expected.

2) Using pfd2json/pdfparser

it('Read Data from url',function(){
        var request = require('request');
        var pdf = require('pfd2json/pdfparser');
        var fs = require('fs');
        var pdfUrl = "http://www.africau.edu/images/default/sample.pdf";
        let databuffer = fs.readFileSync(pdfUrl);
        pdf(databuffer).then(function(data){
            var arr:Array<String> = data.text;
            var n = arr.includes('Thursday 02 May');
            console.log("Print Array " + n);
        });

    });
  • Failed: ENOENT: no such file or directory, open ''

While I can access data from a local path successfully, extracting it from a URL seems to be causing issues.

Answer №1

The problem lies in your usage of the fs module (File System) to read a file from a remote server.

You also made a mistake with the pdf2json module, which likely resulted in an error?

Make sure you have imported the request module. This will enable you to fetch the file from the remote location. Here's one approach to achieve this:

it('Read Data from url', function () {
    var request = require('request');
    var PDFParser = require('pdf2json');

    var pdfUrl = 'http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf';

    var pdfParser = new PDFParser(this, 1);

    // Executed if there's an error during parsing
    pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError));
    // Executed when parsing is complete
    pdfParser.on("pdfParser_dataReady", pdfData => console.log(pdfParser.getRawTextContent()));

    // Send a request to get the content of the pdf file and then pass it to the pdf parser
    request({ url: pdfUrl, encoding: null }, (error, response, body) => pdfParser.parseBuffer(body));
});

By following these steps, you should be able to access the distant .pdf file within your application.

If you wish to explore further capabilities, I suggest referring to the pdf2json documentation. This will help you extract textual content from the .pdf file once the parsing process is completed.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Error: Express JS custom module cannot be located in the root directory's modules folder

The file structure of my express js app resembles thishttps://i.sstatic.net/57NIQ.png I'm attempting to load a modules folder from the root directory. routes/users.js var express = require('express'); var router = express.Router(); var md ...

Do you have any recommendations for a jQuery plugin that can create a sleek horizontal scrolling image gallery?

Recently, I came across the Smooth div scroll plugin developed by Thomas Kahn, and it fits my requirements perfectly. However, I have encountered a bug that seems to be persisting. The issue arises when both mousewheel scroll and touch scroll are enabled s ...

Leveraging selenium for automating client interactions by enabling camera functionality

As I develop a WebRTC application, the permission to use the camera prompt appears. While I understand that it is not possible to remove this prompt, I am wondering if there is a way to automate the clicking of the allow button on the client's side us ...

What is preventing these AngularJS applications from functioning simultaneously?

I have a fully functioning AngularJS app that I developed as a standalone "CreateUser" widget. Now, I am working on creating a second widget called "ViewUsers," which will display a table of current users (with the intention of connecting them or keeping t ...

Troubleshooting a Peculiar Problem with Form Submission in IE10

Please take a look at the code snippet provided below: <!DOCTYPE html> <html> <body> <form name="editProfileForm" method="post" action="profileDataCollection.do"> <input type="text" id="credit-card" name="credit-card" onfocu ...

What significance does the slash hold in a package name when using require for an npm package?

When we "require" non-local NodeJS modules, what does the slash in the module name signify? For instance: from the GitHub page of the ShellJS npm module (link: https://github.com/shelljs/shelljs#javascript) require('shelljs/global'); requir ...

A guide on extracting/filtering information from JSON files using JavaScript

Is it possible to extract specific data from a JSON file using JavaScript? I have imported a JSON file into my local JavaScript and I am having difficulty in retrieving and displaying certain information. Could someone offer assistance with this? The JS ...

Python is unable to locate the geckodriver within my system's directory

Currently, I am diving into the world of Python and using Al Sweigart's Automate the Boring Stuff with Python as my guide. My current aim is to launch Firefox using a webdriver. Here is the code I executed: from selenium import webdriver No errors ...

Error accessing element - The RemoteWebDriver encountered a 'System.InvalidOperationException' exception

Having trouble with the code below for moving to an element, actionExecutor is throwing an exception. IWebElement elem = driver.FindElement(By.LinkText(link_text)); Actions act = new Actions(driver); act.MoveToElement(elem); act.Build().Perform(); Receive ...

Press the button to reveal the hidden Side Menu as it gracefully slides out

I'm interested in creating a navigation menu similar to the one on m.facebook.com, but with a unique animated slide-out effect from the left side of the website. Here's the flow I have in mind: Click a button > (Menu is hidden by default) Men ...

Conceal the div with ID "en" if the value matches $en

Looking for assistance with language settings on my page. I have a menu where I can select English, Croatian, or German. Below is the code to manage language changes: <?php class home_header_language { protected $_DBconn; ...

Stop users from saving the page in Next.js

I'm currently working on a NextJs project that involves building an editor application. I want to ensure that the editor functionality does not work when users attempt to save the page in a different format, similar to how applications like Youtube an ...

Save room for text that shows up on its own

I am currently dealing with a situation where text appears conditionally and when it does, it causes the rest of the page to be pushed down. Does anyone know the best way to reserve the space for this text even when it's not visible so that I can pre ...

Tips on entering a text field that automatically fills in using Python Selenium

One of the challenges I am facing on my website is an address input text field that gets automatically populated using javascript. Unlike a drop-down field where you can select values or a standard text field where you can manually type in information, thi ...

Having trouble retrieving the keyword property within a Vue.js promise

Struggling with an async validation process in Vue.js where I need to globally access the $axios instance, but encountering failures Validator.extend('async_validate_job_type', { getMessage: field => `The Name already exists`, val ...

React Material UI DataGrid: Error encountered - Unable to access property 'useRef' due to being undefined

The challenge at hand Currently, I am faced with a dilemma while attempting to utilize the React DataGrid. An error in the form of a TypeError: Cannot read property 'useRef' of undefined is appearing in my browser's stack trace. https://i.s ...

Using Nestjs to inject providers into new instances of objects created using the "new" keyword

Is it possible to inject a provider into objects created by using the new keyword? For instance: @Injectable() export class SomeService { } export class SomeObject { @Inject() service: SomeService; } let obj = new SomeObject(); When I try this in my t ...

Learn the process of transferring information through ajax while managing dependent drop-down menus

I have successfully set the initial value from the first combo-box and now I am looking to send the second variable from the second combo-box and receive it in the same PHP file. Below is the Ajax code snippet: $(document).ready(function(){ $(".rutas") ...

display and conceal elements according to the slider's current value

Currently, I am working on creating a slider that can show and hide elements as the slider bar moves (ui.value). Firstly, I used jQuery to create 30 checkboxes dynamically: var start = 1; $(new Array(30)).each(function () { $('#showChck') ...

Selenium Edge and the macOS Sierra operating system

I am currently using a MacBook running Sierra 10.12.1 along with Selenium 3.0.1 and Nightwatchjs for testing. Chrome and Firefox are working fine, Safari is partially working (having trouble finding CSS Elements), but Internet Explorer is not responding a ...