Can anyone provide a simple example of PDF.js that allows for text selection in a minimalist design?

Currently experimenting with PDF.js.

An issue I'm facing is that the text selection feature is not supported in the Hello World demo. It simply renders everything onto a canvas without displaying the text layer. On the other hand, the official PDF.js demo does offer text selection but its code seems overly complicated. Is there a simpler demonstration available that includes the text layer functionality?

Answer №1

I successfully submitted the example to Mozilla's pdf.js repository, accessible in the examples directory.

The initial example I contributed to pdf.js is no longer available, but I believe that this one demonstrates text selection. The text-selection logic has been revamped within the text-layer of the reorganized pdf.js code, generated using a factory.

In particular, PDFJS.DefaultTextLayerFactory efficiently manages the fundamental aspects of text selection.


Note: The following example is outdated and retained here for historical reference.

I encountered difficulties with this issue over the past 2-3 days, but finally managed to resolve it. View a demonstration illustrating how to load a PDF with enabled text selection here.

The challenge lied in disentangling the text-selection mechanism from the viewer-related code (viewer.js, viewer.html, viewer.css). To make it functional, I had to isolate relevant code snippets and CSS properties (the JavaScript file referenced there can also be accessed here). The final outcome is a simplified demo likely to be beneficial. For proper implementation of text selection, the CSS styles in viewer.css play a crucial role in configuring the styling for subsequently created divs intended for text selection functionality.

The core functionality is handled by the TextLayerBuilder object responsible for generating the selection divs. References to this object are visible within viewer.js.

Below you'll find both the code snippet and associated CSS. Remember, you will still require the pdf.js file. My fiddle includes a link to a customized version sourced from Mozilla's GitHub repo for pdf.js. I opted not to directly link to the repository's version due to ongoing developments which might cause disruptions.

HTML:

<html>
    <head>
        <title>Basic pdf.js text-selection showcase</title>
    </head>

    <body>
        <div id="pdfContainer" class = "pdf-content">
        </div>
    </body>
</html>

CSS:

.pdf-content {
    border: 1px solid #000000;
}

/* CSS classes utilized by TextLayerBuilder to stylize the text layer divs */

/* Crucial for preventing text display when selecting */
::selection { background:rgba(0,0,255,0.3); }
::-moz-selection { background:rgba(0,0,255,0.3); }

.textLayer {
    position: absolute;
    left: 0;
    top: 0;
    right: 0;
    bottom: 0;
    color: #000;
    font-family: sans-serif;
    overflow: hidden;
}

.textLayer > div {
    color: transparent;
    position: absolute;
    line-height: 1;
    white-space: pre;
    cursor: text;
}

.textLayer .highlight {
    margin: -1px;
    padding: 1px;

    background-color: rgba(180, 0, 170, 0.2);
    border-radius: 4px;
}

.textLayer .highlight.begin {
    border-radius: 4px 0px 0px 4px;
}

.textLayer .highlight.end {
    border-radius: 0px 4px 4px 0px;
}

.textLayer .highlight.middle {
    border-radius: 0px;
}

.textLayer .highlight.selected {
    background-color: rgba(0, 100, 0, 0.2);
}

JavaScript:

//Demonstration of minimal PDF rendering and text selection using pdf.js by Vivin Suresh Paliath (http://vivin.net)
//This fiddle incorporates a compiled pdf.js version encompassing all necessary modules.
//
//For simplicity, PDF data retrieval does not involve external sources. Instead, the data is stored internally.
//
//Understanding text selection was challenging as the selection logic intertwines heavily with viewer.html and viewer.js. 
//Relevant portions were extracted into a separate file to exclusively implement text selection. Key component is TextLayerBuilder
//managing creation of text selection divs, added as an external resource.
//
//The demo showcases a single-page PDF rendering. Customization for additional pages is possible, focusing on text selection.
//Additional importance lies in the included CSS setting up styling for selected text overlays.
//
//Reference point for rendered PDF document:
//http://vivin.net/pub/pdfjs/TestDocument.pdf

var pdfBase64 = "..."; //contains base64 representing the PDF

var scale = 1; //Set zoom factor as required.

/**
 * Converts a base64 string into a Uint8Array
 */
function base64ToUint8Array(base64) {
    var raw = atob(base64); 
    var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
    for(var i = 0; i < raw.length; i++) {
        uint8Array[i] = raw.charCodeAt(i);
    }

    return uint8Array;
}

function loadPdf(pdfData) {
    PDFJS.disableWorker = true; 

    var pdf = PDFJS.getDocument(pdfData);
    pdf.then(renderPdf);                               
}

function renderPdf(pdf) {
    pdf.getPage(1).then(renderPage);
}

function renderPage(page) {
    var viewport = page.getViewport(scale);
    var $canvas = jQuery("<canvas></canvas>");

    var canvas = $canvas.get(0);
    var context = canvas.getContext("2d");
    canvas.height = viewport.heigh...

Answer №2

Since this question and answer are from a while back, you'll need to make some adjustments to get it to work with more recent versions of PDF.JS. Here's a helpful solution:

http://www.example.com/convert-pdf-to-html-canvas-using-pdf-js

Below is the code snippet they used: Include the following CSS and scripts from the PDF.js library

<link rel="stylesheet" href="pdf.js/web/text_layer_builder.css" />
<script src="pdf.js/web/ui_utils.js"></script>
<script src="pdf.js/web/text_layer_builder.js"></script>

Use this code to load the PDF :

PDFJS.getDocument("example.pdf").then(function(pdf){
    var page_num = 1;
    pdf.getPage(page_num).then(function(page){
        var scale = 1.5;
        var viewport = page.getViewport(scale);
        var canvas = $('#the-canvas')[0];
        var context = canvas.getContext('2d');
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        var canvasOffset = $(canvas).offset();
        var $textLayerDiv = $('#text-layer').css({
            height : viewport.height+'px',
            width : viewport.width+'px',
            top : canvasOffset.top,
            left : canvasOffset.left
        });

        page.render({
            canvasContext : context,
            viewport : viewport
        });

        page.getTextContent().then(function(textContent){
           console.log( textContent );
            var textLayer = new TextLayerBuilder({
                textLayerDiv : $textLayerDiv.get(0),
                pageIndex : page_num - 1,
                viewport : viewport
            });

            textLayer.setTextContent(textContent);
            textLayer.render();
        });
    });
});    

Answer №3

If you're looking to display all the pages of a PDF document on different pages while still allowing text selection, there are a couple of options available:

  1. Utilize a PDF viewer
  2. Use a combination of canvas and renderer to parse the text and place it onto the canvas in a way that mimics text selection.

However, in practical terms, if you plan on implementing features like zooming in or out using the canvas, this could significantly impact your browser's performance. Please refer to the following URL for more information:

You can access the complete code from the link below: https://github.com/explorethis/simpleChatApp/tree/master/pdfViewer

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Display the output of JSON.stringify in a neatly formatted table

After sending my table data to the database using ajax, I am now trying to retrieve it by clicking on the open button. $.ajax({ type: "POST", url: "http://localhost/./Service/GetPageInfo", dataType: "json", ...

Script in Powershell for converting multiple .csv files to .html format and managing the conversion process

I need to create a script that can merge the content of various .csv files and output them into a single .html file. The objective is to display all filenames as clickable buttons which, when expanded, will show the corresponding content. However, I am fa ...

Obtaining the value of a specific dynamic `<td>` element using jQuery

My page layout resembles this design (I am unable to post an image due to insufficient points, so I have it hosted elsewhere) My goal is for jQuery to identify the specific email when a user clicks on the delete button. For instance, if a user clicks on & ...

Glitch in Mean App: front-end feature malfunctioning during data storage in mongodb

I am encountering difficulties while working on a MEAN app. I attempted to establish a connection between the backend (Node.js, Express.js) and the frontend (Angular 6), but encountered some issues. The backend port is http://localhost:3000, and the fron ...

What is the best way to redirect a user to a different URL in Express while also sending additional data along with the request

[NODE, express] Developing a Facebook application where user grants access and is redirected to my site with a unique code at abc.com/heyBuddy/fb/callback?code="adasdasdasda". Once the code is received in route router.get('/heyBuddy/fb/callback', ...

What is the method for retrieving a value with a designated key?

An array has been created from JSON source using the $.each method. The console displays the following: $vm0.sailNames; [Array(24), __ob__: Observer] (Interestingly, jQuery seems to have created an Observer) In the Vue.js development view, it appears li ...

Utilizing the v-for directive to loop through JSON data with unique IDs and linking them to Input components in PrimeVue

In my database, I have a collection of products with a column named attributes that stores property/value pairs in JSON format. Each product can have unique attributes. For instance, one product's attributes could be: #product1 attributes { color: & ...

Provide the remaining arguments in a specific callback function in TypeScript while abiding by strict mode regulations

In my code, I have a function A that accepts another function as an argument. Within function A, I aim to run the given function with one specific parameter and the remaining parameters from the given function. Here's an example: function t(g: number, ...

What can you create with Angular 4, the angular material 2 library, and angular flex-layout

I am fairly new to visual design and currently using Angular 4 with the Material 2 module and Bootstrap for the grid system. However, I have come to realize that the container sizes are not suitable for certain aspect ratios like 21:9. I discovered the ...

What causes the lack of impact on lambda rendering speed despite integrating webpack?

Hey there, I've been working on implementing webpack for a project that involves microservices, Node.js, TypeScript, AWS, and AWS SAM. My main objectives are: Reduce the cold start time of lambda functions. Minimize security vulnerabilities by e ...

Is it possible to rewrite this function recursively for a more polished outcome?

The function match assigns a true or false value to an attribute (collapsed) based on the value of a string: function match(children) { var data = $scope.treeData for (var i = 0; i < data.length; i++) { var s = data[i] for (var ...

What is the best way to populate all data in select2 (4.0) upon page load?

I'm currently utilizing the select2 plugin (v.4.0) and have a specific goal in mind: $("#search-input-chains").select2({ placeholder: "Unit", theme: "bootstrap4", ...

What could be causing my JavaScript code to malfunction, even though it appears to be coded correctly?

// JavaScript Document "use strict"; $(window).scroll(function(){ if($(window).scroll() > 100){ $("#scrollTop").fadeIn(); } }); $(window).scroll(function(){ if($(window).scroll() < 100){ $("#scrollTop").fadeOut(); } }); $(document).ready(function() ...

I would like to include the value of the "file_id" variable in the href attribute of an appended HTML element, but it seems

<div id="invite_popup"> </div> $(".invite_button2").click(function(){ var file_id = $(this).data("id"); //alert(file_id); var popup2 ='< ...

Sending a document to a nodeJS server

Client code: var data = new FormData(); data.append(fileName, blob, 'test.html'); fetch('http://localhost:3000/', { method: 'POST', headers: { }, body: data }).then( response => { console.log(res ...

Refreshing CKFinder Form Field with jQuery

Looking to update the value of an input field .ckfinder-input using CKFinder's pop-up image selector. Everything runs smoothly until attempting to assign the selected image URL to the input field with input.val() = fileUrl, resulting in the error mess ...

What's the best way to track changes in multiple form fields simultaneously in Angular?

Situation I have a form with 8 fields, but I want to monitor changes in just three of them to apply the same function. I don't want to set up individual subscriptions for each field like this: this.headerForm.get('start').valueChanges.subsc ...

The latest update to the Server Stats code mistakenly changes the channel to "undefined" instead of displaying the total number

I've been working on a private bot for a specific server that displays server statistics and more. However, I've encountered an issue where every time a user joins or leaves the guild, the bot updates a channel with 'undefined' instead ...

Ways to store data in the localStorage directly from a server

I'm facing an issue - how can I store data in localStorage that was received from the server? Should I use localStorage.setItem for this purpose? And how do I handle storing an array in localStorage? Or am I missing something here? import { HttpCli ...

Guide to transforming a JSON file value into a JavaScript list

I need assistance with converting a string of comma-separated values in a JSON file into a list. The goal is to iterate through the list in a for loop and click on each element. Can you help me with this task? testdata.json : {"optionsList":&quo ...