Extract images from dynamically loaded JavaScript website

I'm attempting to extract images from a webpage that is rendered using JS, but the picture links in the source code are incomplete. Here is where the images are located:

<script language="javascript" type="text/javascript">
</script>
<div id="ImagesSection" class="ImagesSection">
<div id='HybridImageViewPrimaryImageDiv'>
<a href='/ItemImages/000450/18190933_1_lg.jpeg'  class="MagicZoom" data-options="  zoomMode:off; cssClass: dark-bg; zoomOn: click"  title='Multi-Faced Doll By Cark Bergner.' id="xxxyyyzzz"     ><img id='fullimage' src='/ItemImages/000450/18190933_1_med.jpeg'  alt='Multi-Faced Doll By Cark Bergner.' /></a>
</div>
<div style="margin-top:15px;width:300px;"> <button class="cfg-btn" onclick="MagicZoom.prev('xxxyyyzzz');return false;">Prev</button> <button class="cfg-btn" onclick="MagicZoom.next('xxxyyyzzz') ;return false;">Next</button>
</div><div style="margin-top:15px;" width="350px" >
 <a data-zoom-id="xxxyyyzzz" href="/ItemImages/000450/18190933_1_lg.jpeg"    data-image="/ItemImages/000450/18190933_1_med.jpeg"       >  <img    src="/ItemImages/000450/18190933_1_sm.jpeg"  height="60px"   />  </a>   
 <a data-zoom-id="xxxyyyzzz" href="/ItemImages/000450/18190933_2_lg.jpeg"    data-image="/ItemImages/000450/18190933_2_med.jpeg"       >  <img    src="/ItemImages/000450/18190933_2_sm.jpeg"  height="60px"   />  </a>   
 <a data-zoom-id="xxxyyyzzz" href="/ItemImages/000450/18190933_3_lg.jpeg"    data-image="/ItemImages/000450/18190933_3_med.jpeg"       >  <img    src="/ItemImages/000450/18190933_3_sm.jpeg"  height="60px"   />  </a>   
 <a data-zoom-id="xxxyyyzzz" href="/ItemImages/000450/18190933_4_lg.jpeg"    data-image="/ItemImages/000450/18190933_4_med.jpeg"       >  <img    src="/ItemImages/000450/18190933_4_sm.jpeg"  height="60px"   />  </a>   
 <a data-zoom-id="xxxyyyzzz" href="/ItemImages/000450/18190933_5_lg.jpeg"    data-image="/ItemImages/000450/18190933_5_med.jpeg"       >  <img    src="/ItemImages/000450/18190933_5_sm.jpeg"  height="60px"   />  </a>   
</div>
</div>

I am only interested in extracting the following images:

/ItemImages/000450/18190933_1_sm.jpeg
/ItemImages/000450/18190933_2_sm.jpeg
/ItemImages/000450/18190933_3_sm.jpeg
/ItemImages/000450/18190933_4_sm.jpeg
/ItemImages/000450/18190933_5_sm.jpeg

Here is the code I am using:

import os
import shutil
import time
import requests
from bs4 import BeautifulSoup as bSoup
from selenium import webdriver

url = "https://auctions.morphyauctions.com/French_Fashion_Doll_with_Unusual_Body_-LOT450029.aspx"

driver = webdriver.Chrome(executable_path="/mypath/")

driver.get(url)

iterations = 0
while iterations <10:
    html = driver.execute_script("return document.documentElement.outerHTML")
    sel_soup = bSoup(html, 'html.parser')
    print (sel_soup.findAll('img'))
    images = []
    for i in sel_soup.findAll('img'):
        src = i['src']
        images.append(src)
    print(images)
    current_path = os.getcwd()
    for img in images:
        try:
            file_name = os.path.basename(img)
            img_r = requests.get(img, stream=True)
            new_path = os.path.join(current_path, 'images', file_name)
            with open(new_path, 'wb') as output_file:
                shutil.copyfilobj(img_r.raw, output_file)
            del img_r
        except:
            pass
    iterations +=1
    time.sleep(5)

When running this code, no images are saved. Any assistance would be greatly appreciated.

Answer №1

Using Selenium is unnecessary because the HTML for the image is not rendered by JavaScript. Instead, you can utilize BeautifulSoup along with re.compile to locate the href in the a element that begins with /ItemImages/.

It's important to note that the URL is relative, so you will need to append the domain to the images URL.

base_url = 'https://auctions.morphyauctions.com'
url = base_url + "/French_Fashion_Doll_with_Unusual_Body_-LOT450029.aspx"

html = requests.get(url).text
sel_soup = BeautifulSoup(html, 'html.parser')

images = []
for a in sel_soup.findAll('a', href=re.compile(r'^/ItemImages/')):
    ahref = base_url + a['href'] # we need to append the domain here
    images.append(ahref)
print(images)
current_path = os.getcwd()
for img in images:
    try:
        file_name = os.path.basename(img)
        img_r = requests.get(img)
        new_path = os.path.join(current_path, 'images', file_name)
        with open(new_path, 'wb') as output_file:
            output_file.write(img_r.content)
    except:
        print(ex)

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Trouble arises with the chrome extension code for retrieving tweets from Twitter

I'm currently working on developing a Chrome extension that will display tweets featuring the hashtag perkytweets within the extension's popup. However, I'm facing an issue where nothing is being displayed. Let's take a look at the cod ...

Exploring additional parameters within express.js

I'm currently working on creating an email sender that includes all necessary components such as 'from', 'to', 'subject', 'textBody', etc. However, I am facing a challenge in setting optional parameters in expre ...

The issue with ngFileUpload causing empty file posts on Safari

Currently, I am utilizing ngFileUpload to transmit images to the Cloudinary service. My application is constructed on Ionic and is meant to be functional on both iOS and Android platforms. The code snippet below showcases my image uploading process: .se ...

Constructing a hierarchical tree structure using an array of objects that are initially flat

My goal is to create a hierarchical tree structure from a flat array: The original flat array looks like this: nodes = [ {id: 1, pid: 0, name: "kpittu"}, {id: 2, pid: 0, name: "news"}, {id: 3, pid: 0, name: "menu"}, {id: 4, pid: 3, name: ...

Incorporating tawk.to into a Nuxt/Vue application

Has anyone had success implementing tawk.to in a Nuxt application? I took the initiative to create a file called "tawk.js" in my plugin folder and added the following code: var Tawk_API = Tawk_API || {}, Tawk_LoadStart = new Date() (function () { ...

Tips on showcasing Javascript filter outcomes after at least 2 characters have been entered

Currently, I have implemented a filter search box that displays results as soon as a single letter is inputted. However, due to the large amount of data that needs to be filtered through, I would like the search to only show results when a minimum of two l ...

Dealing with "Element has been detached from the DOM" issue in page object - Strategies and Solutions

During my testing of the website using Page Object, I am encountering an intermittent error message "Element is no longer attached to the DOM (Selenium::WebDriver::Error::StaleElementReferenceError)" while calling a function in my scripts. Does anyone hav ...

Fetching response headers object in redux using React.js

Currently, I am using Redux in React.js to fetch the most starred repositories from the past 30 days. However, I now want to implement pagination using the headers provided by the GitHub API. How can I modify my code to extract the headers from the respons ...

Unable to properly access required file path through HTML source

I have a confidential folder named 'inc' where I store sensitive files such as passwords in php connection files. This folder is located at the same level as the 'public_html' folder. I successfully accessed php files with database conn ...

Issue with CSS: 200vw not scaling correctly for mobile devices

I'm attempting to create a horizontal slide effect between two div elements, so here is the HTML code: <div id="container"> <div class="viewport-1"> <div class="inner-div"> <h1>Viewport background 1</h1></ ...

Is there a way to switch on and off an ngrx action?

Here is a statement that triggers a load action to the store. The relevant effect will process the request and return the response items. However, my goal is to be able to control this action with a button. When I click on start, it should initiate dispa ...

Utilizing Facebook's JavaScript SDK to transmit variables to PHP using ajax

Thank you in advance for your attention. I am trying to utilize the Facebook js SDK to retrieve the user's name and id, and then send them to PHP on the same page (index.php). After successfully obtaining the username and id and storing them in two Ja ...

Performing a JavaScript/jQuery callback to server without the need for an accompanying executable backend

Today, a colleague in the tech world posed an interesting question to me: Can jQuery (or JavaScript in general) retrieve information about the filesystem from its source without the need for executable code? Initially, I instinctively responded by saying t ...

Is it possible to activate the nearby dropdown based on the user's selection?

On my html webpage, I have a form that consists of three dropdown menus each with different options: The first dropdown (A) includes choices from 1 to 6, as well as 'not set'. The second dropdown (B) allows selections from 1 to 7, and also has ...

Using Javascript to modify file permissions in Google Drive

I'm new to writing and seeking amazing solutions for all the issues I encounter. I have a website hosted on Google Drive, utilizing its SDK for Javascript. Everything functions exceptionally well, except for one problem. I need to adjust the permissi ...

Connect to a point on the leaflet map using an external <a> tag reference

I have a leaflet map and I am trying to create a link that, when clicked, will activate a specific marker on the map. Essentially, I want the linked marker to simulate being clicked when the link is clicked. Here is an example of the link: <a href="#" ...

Tips for adjusting the border color of a MUI Select field

https://i.stack.imgur.com/rQOdg.png This MUI select box changes color from blue to black based on selection. The challenge is to customize the text and border color to white (currently set as blue). Any suggestions on how to achieve this? ...

The OTP submission in Phone Email's phone authentication using Node JS did not result in the reception of the token

I have implemented the “Login with Phone” Button from Phone Email on my Node JS website. The button opens a popup to enter the mobile number and then displays an OTP window after submission. Although I receive the OTP SMS and enter it successfully, I a ...

Exploring node.js: How to extract elements from a path

I have an array of individuals as shown below: individuals = ['personA', 'personB', 'personC']; I am looking to create a dynamic way to showcase each individual's page based on the URL. For instance, localhost:3000/indi ...

The AJAX POST function is not functioning properly when clicking on contextmenus

Can someone please assist me? I am having trouble sending data via ajax post to my nodejs server from contextmenus. It is not functioning as expected. Although the ajax request does not give any error alert, the data is not being sent successfully. I hav ...