What is the best method for executing an HTTP request to retrieve the complete source page if certain aspects are loaded through JavaScript?

Question

What is the best method for executing an HTTP request to retrieve the complete source page if certain aspects are loaded through JavaScript?

I am trying to retrieve the HTML webpage from . However, a portion of the HTML file is loaded through JavaScript. When using HTTP.jl to fetch the webpage with HTTP.request(), I only receive the part of the HTML file that was loaded before the execution of JavaScript. As a result, the webpage obtained differs from what Chrome displays. How can I obtain the webpage identical to what Chrome renders? Is it necessary to utilize WebDriver.jl, which serves as a wrapper for Selenium WebDriver's Python bindings?

The snippet of my source code:

function get_page(w::word)::Bool
    response = nothing
    try
        response = HTTP.request("GET", "https://www.collinsdictionary.com/dictionary/$(dictionary)/$(w.org_word)",
                                                 connect_timeout=connect_timeout, readtimeout=readtimeout, retries=retries, redirect=true,proxy=proxy)
    catch e
        push!(w.err_log, [get_page_http_err, string(e)])
        return falses
    end
    open("./assets/org_page.html", "w") do f 
        write(f, String(response.body))
    end
    return true
end

dictionary and w.org_word are both of type String, and the function resides within a module.

javascript selenium http web-scraping julia

Answer 1

Answer №1

To achieve your desired outcome, using only HTTP.jl is not sufficient. Running the JavaScript portion of the page requires a JavaScript engine, which is a more complex task.

It's important to note that this challenge is not unique to Julia's HTTP capabilities. Python requests.get(url) returning javascript code instead of the page html

(Interestingly, the standard library in Python has recently added JavaScript rendering capability to its request module)

Answer 2

To achieve your desired outcome, using only HTTP.jl is not sufficient. Running the JavaScript portion of the page requires a JavaScript engine, which is a more complex task.

It's important to note that this challenge is not unique to Julia's HTTP capabilities. Python requests.get(url) returning javascript code instead of the page html

(Interestingly, the standard library in Python has recently added JavaScript rendering capability to its request module)

What is the best method for executing an HTTP request to retrieve the complete source page if certain aspects are loaded through JavaScript?

Answer №1

Similar questions

Stopping the parent onclick event from propagating to a child element within a bootstrap modal

How to activate a function or event upon closing a browser tab with JavaScript

The authorization header for jwt is absent

Does expressJS use the express() function as a global function?

What is the most efficient way to cycle through HTML image elements and display a unique random image for each one?

Steps for automatically toggling a button within a button-group when the page is loaded using jQuery in Bootstrap 3

Code has been loaded successfully for react-loadable Chunks, however it has not been

Scale the cylinder in Three.js from a specific point

Transferring Text from PDF to Clipboard using Selenium

Issues arising from TypeScript error regarding the absence of a property on an object

What is a more efficient method for verifying the value of an object within an array that is nested within another object in JavaScript?

Just a quick question about using AJAX

Mongoose: efficiently fetching the query response

Common issues encountered when using the app.get() function in Node.js

Disabling Navigation with Arrow Keys in Prettyphoto

Is there a way to personalize the appearance of a specific page title in my navigation menu?

JavaScript's functionality akin to PHP's exec() function

Using jQuery to find duplicated records within two JSON object arrays

How can I retrieve information on a logged in Auth0 user from an API?

Discovering the most efficient route on a looped set of items