Recursively mirroring the contents of a webpage following the execution of JavaScript code

My goal is to recursively mirror a webpage, meaning I want to retrieve all pages within that webpage. Since all the webpages are located in subfolders of one main folder, I thought I could easily accomplish this using wget:

wget --mirror --recursive --page-requisites --adjust-extension --no-parent --convert-links https://www.example.com/

The issue I encountered is that the page gets mirrored before certain JavaScript scripts are executed, and these scripts do not get mirrored along with the rest of the content. These scripts are important as they modify the webpage's Document Object Model (DOM), so I need to find a way to include them in the mirror process. Alternatively, I could wait for the site to finish loading and then mirror the fully loaded webpage (the timing isn't critical).

I have tried mirroring the webpage with PhantomJS, but it seems like recursion is not supported using PhantomJS, or at least I haven't been able to figure out how to do it. I also consulted the wget manual, but couldn't find any suitable options for my requirements.

Is there a way to achieve what I'm looking for? Any suggestions would be greatly appreciated. Thank you.

Answer №1

wget doesn't handle JavaScript execution. To work with dynamic content, you may want to consider using a tool like splash. I have experience using splash alongside scrapy spiders, but not in conjunction with wget. It could be worth experimenting with.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Structuring JavaScript in Rails' asset pipeline

Overall: What are the most effective strategies for structuring JavaScript within the Rails pipeline? Specifically: My JS files are growing rapidly and while I'm okay with including them in the main application.js bundle and using Sprockets to minify ...

The React component fails to render on the screen

Upon retrieving data from the database server, attempts to render it result in the data being shown in the console log but not displayed in the component. What could be causing this issue? useEffect(() => { readRequest().then(setTodos); c ...

Unexpected Error: Null value prevents accessing 'getElementsByClassName' property in HTML, along with Troubleshooting Inactive Button Behavior in HTML

Can someone identify the error in my HTML code? I keep getting an "Uncaught TypeError: Cannot read property 'getElementsByClassName' of null" error on the second line of the script tag. Additionally, my implemented active header code is not funct ...

I am required to filter out any child elements from a find() operation

HTML: <ul> <li class="listExpandTrigger"><h3>2010 - present</h3></li> <li class="listCollapseTrigger"><h3>2010 - present</h3></li> <li> <ul class="volumeList"> <l ...

Show the current time using Moment.js

I am currently working on developing a clock component that displays the current time in real-time. Issue: The initial time is correctly displayed when the page loads (HH:mm A), but the clock does not update dynamically. clock.component.ts : import { ...

Determine the combined height of the initial group of elements using jQuery

Is there a way to calculate the total height of the first three elements with the "latest" class in my Angular directive using the Ticker jQuery plugin? <div class="myWrapper" ticker> <div> <div ng-repeat="latest in latests" class="late ...

Error found in Nuxt3 application when using locomotive scroll functionality

I'm working on a Nuxt3 project with Locomotive Scroll and GSAP (repository link: https://github.com/cyprianwaclaw/Skandynawia-Przystan). I'm facing an issue where, when I change the page from index to test and then revert back, the page doesn&apo ...

Automatically bypassing git conflicts in package.json: A step-by-step guide

We encounter frequent updates to shared npm packages in our app, resulting in multiple pull requests updating the same package version. Consequently, conflicts arise on GitHub when these pulls are merged into the master branch. Is there a way to automati ...

Swapping out a sequence of characters in a web address with a different set

Swapping out imgur for filmot, Enter URL - https://i.stack.imgur.com/Uguvn.jpg Click the submit button After clicking submit, a new tab should open with the link .filmot.com/abcde.jpg. <html> <head> <title>input</title> <sc ...

Trouble with the drop-down menu displaying "string:2" in an AngularJS application

Currently, I am utilizing AngularJS ng-model to choose the value from a drop-down menu. Additionally, I am implementing datatable for organizing the columns. <select id="{{user.id}}" ng-model="user.commit" name="options" ng-change="update_commit_level ...

Building a listview using Angular, MySQL, and Node.js

As a newcomer to Angular, I've been navigating my way through the learning process with some success but also encountering challenges. Although I've managed to resolve certain issues within the application, such as successfully inserting data int ...

Transforming a high chart into an image and transmitting it to the server through an ajax request

I am looking for a way to save multiple charts as PDF files on the server using an AJAX call. Each chart is rendered in a distinct container on the same page, and I need to convert them into images before sending them to the server for export. Any assist ...

Tips for automatically filling in fields when a button is clicked in a React application

I'm attempting to pre-fill the form fields that are duplicated with data from already filled fields. When I click the "Add Fields" button, new fields are replicated, but I want them to be pre-populated with data from existing fields. How can I access ...

When attempting to invoke the rest function, an error occurs stating that the dataService.init in Angular is not

Learning AngularJS has been my current focus. To practice, I have been working on a Quiz app tutorial. However, I encountered an issue when trying to call the rest function of a factory after injecting it into one of my controllers. The JSON data for this ...

I desire for both my title and navigation bar to share a common border-bottom

As I embark on my journey into the world of HTML and CSS, my knowledge is still limited. Despite trying various solutions from similar queries, none seem to resolve my specific issue. What I yearn for is to have both my title and navigation bar share the s ...

CSS ID selectors are not functioning properly

In my React.JS project, I am working with a div that contains a button and a list. The list is specifically identified by the id "results". return <div> <Button label="Combine Cards" disabled={!this.props.combineReady} onClick={this.handleCli ...

Retrieve the chosen item along with its quantity

I'm currently working on building a shopping cart application similar to this example using React.js. index.js: (Sending each product to the product component) {products.length > 0 ? products.map((product) => ( <Produ ...

A Kendo editor necessitates the use of Kendo JavaScript specifically designed for Angular

Can someone provide guidance on the necessary Kendo libraries for implementing the Kendo editor in AngularJS? The tutorial site suggests that "kendo.all.min.js" is required, but I am hesitant to use it due to its resource-heavy nature. Any assistance wou ...

Ways to update the component's state externally

I'm new to Next.js (and React) and I'm attempting to update the state of a component from outside the component. Essentially, I am conditionally rendering HTML in the component and have a button inside the component that triggers a function to se ...

Manipulation of CSS DOM elements for achieving responsive design

I am working with a div field that contains an input element and a label element, both set to display:block <div class="cf-full"> <input id="a" > <label class="helptext"></label> </div> In the standard view, the inpu ...