What is the best way to extract information from a button that does not provide a response?

Trying to extract data from the site . There's a "next" button that needs to be clicked in order to scrape the contents. However, I'm facing difficulty in identifying the correct xpath or css selector for this button which is preventing me from progressing with the scraping process. Any assistance would be greatly appreciated as I'm currently stuck at this point. Below is the code snippet I have been working with, but it's not yielding the desired outcomes.

# -*- coding: utf-8 -*-

import scrapy import scrapy_selenium from scrapy_selenium import SeleniumRequest

class VisionSpider(scrapy.Spider): name = 'vision'

def start_requests(self):
    yield SeleniumRequest(
        url= 'https://tonaton.com',
        wait_time=3,
        screenshot=True,
        callback=self.parse
    )


def parse(self, response): 
    businesses = response.xpath("//a[@class='link--1t8hM gtm-home-category-link-click']")
    for business in businesses:
        link = business.xpath(".//@href").get()
        category = business.xpath(".//div[2]/p/text()").get()

        yield response.follow(url=link, callback=self.parse_business, meta={'business_category': category})


def parse_business(self, response):
    
    category = response.request.meta['business_category']
    rows = response.xpath("//a[@class='card-link--3ssYv gtm-ad-item']")
    for row in rows:
        new_link = row.xpath(".//@href").get()

        yield response.follow(url=new_link, callback=self.next_parse, meta={'business_category': category})

    next_page = response.xpath("//div[@class = 'action-button--1O8tU']")
    if next_page:
        button = next_page.click()
        yield SeleniumRequest(
            url=button,
            wait_time=3,
            callback=self.parse
        )



def next_parse(self, response):
    category = response.request.meta['business_category']
    lines = response.xpath("//a[@class='member-link--IzDly gtm-visit-shop']")
    for line in lines:
        next_link = line.xpath(".//@href").get()

        yield response.follow(url=next_link, callback=self.another_parse, meta={'business_category': category})

def another_parse(self, response):
    category = response.request.meta['business_category']
    button = response.xpath("//button[@class = 'contact-section--1qlvP gtm-show-number']").click()
    
    yield response.follow(url=button, callback=self.new_parse, meta={'business_category': category})


def new_parse(self, response):
    category = response.request.meta['business_category']
    times = response.xpath("//div[@class='info-container--3pMhK']")
    for time in times:
        name = time.xpath(".//div/span/text()").get()
        location = time.xpath(".//div/div/div/span/text()").get()
        phone = time.xpath(".//div[3]/div/button/div[2]/div/text()").get()

        yield {
            'business_category': category,
            'business_name': name,
            'phone': phone,
            'location': location
        }

Answer №1

Despite attempting to implement changes, I am still unable to get the pagination functioning properly. Additionally, the process of clicking the call button for scraping purposes is taking longer than expected. Is there a method to enhance the speed of this operation?

class VisionSpider(scrapy.Spider):
    name = 'vision'
    main_domains = ['tonaton.com']
    start_urls =['https://tonaton.com']

def parse(self, response):   
    businesses = response.xpath("//a[@class='link--1t8hM gtm-home-category-link-click'][1]")
    for business in businesses:
        link = business.xpath(".//@href").get()
        category = business.xpath(".//div[2]/p/text()").get()

        yield response.follow(url=link, callback=self.parse_business, meta={'business_category': category})


def parse_business(self, response):
    category = response.request.meta['business_category']
    rows = response.xpath("//a[@class='card-link--3ssYv gtm-ad-item']")
    for row in rows:
        new_link = row.xpath(".//@href").get()
        if new_link:

            yield response.follow(url=new_link, callback=self.new_parse, meta={'business_category': category, 'newlink':new_link})

    chrome_options = Options()
    chrome_options.add_argument("--headless")

    chrome_path = which("chromedriver")
    driver = webdriver.Chrome(options=chrome_options, executable_path=chrome_path)
    driver.get(response.url)
    driver.maximize_window

    next_page = wait(driver, 300).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//div[@class='icon--3D09z extra-small--_AIuZ arrow-right--17oRn']"))) 
    if  next_page:
        next_page.click()

        yield SeleniumRequest(callback=self.parse_business)
    
    driver.close()



def new_parse(self, response):
    category = response.request.meta['business_category']
    chrome_options = Options()
    chrome_options.add_argument("--headless")
# options=chrome_options
    chrome_path = which("chromedriver")  
    driver = webdriver.Chrome(options=chrome_options, executable_path=chrome_path)
    driver.get(response.url)
    driver.maximize_window
    category = response.request.meta['business_category']

    call_button = wait(driver, 500).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='call-button--3uvWj']")))
    call_button.click()
    
    html = driver.page_source
    resp = Selector(text=html)

    driver.close()

    contacts = resp.xpath("//div[@class='call-button--3uvWj']/div[1]")
    for contact in contacts:
        phone = contact.xpath(".//text()").get()
    times = resp.xpath("//div[@class='details-section--2ggRy']")
    for time in times:
        name = time.xpath(".//div[2]/div/div[2]/div/div/div/div/div/div/div/div/div/text()").get()
        if name is None:
            name =time.xpath(".//div[2]/div/div[2]/div/div/div/div/div/div/div/div/div/text()").get()

        location = time.xpath(".//div/div/div/span/a/span/text()[1]").get()
        region = time.xpath(".//div/div/div/span/a[2]/span/text()").get()

        yield {
            'business_category': category,
            'business_name': name,
            'phone': phone,
            'region':region,
            'location': location
        }

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Sending a `refresh` to a Context

I'm struggling to pass a refetch function from a useQuery() hook into a context in order to call it within the context. I've been facing issues with type mismatches, and sometimes the app crashes with an error saying that refetch() is not a funct ...

Issue an alert and refresh the webpage when a file extension upload script is detected

Here is the JavaScript code I'm using to restrict the file type extension during uploads: function TestFileType( fileName, fileTypes ) { if (!fileName) return; dots = fileName.split(".") //get the part AFTER the LAST period. fileType = "." + dots[do ...

Tips for transferring directive parameters to another directive

I could really use your assistance. I am trying to incorporate my custom directive with parameters in the following way: <p customImage URL_PARAM="some_picture.jpg" FIG_PARAM="Figure 1."></p> The goal is to utilize these directive parameters ...

Incorporate and interpret a custom JSON object within my Shopify Liquid theme

In an attempt to integrate custom data into my Shopify liquid theme, I stored a JSON file in the assets folder. My goal is to retrieve and parse this JSON object within a jQuery method from a JavaScript file also located in the assets folder. Despite try ...

Issue NG0203 encountered during material import attempt

I've been encountering an issue with importing material. Despite using code similar to the examples on material.angular.io, I keep running into the ""inject() must be called from an injection context..." error. My goal is to create a simple table ...

React State not refreshing

Currently tackling a challenging e-commerce project and facing an obstacle with the following component: import React, { useEffect, useState } from 'react'; const Cart = () => { let [carts, setCarts] = useState([]); let [price, se ...

Managing arrayBuffer in hapi.js: A Comprehensive Guide

I am struggling to upload an arrayBuffer to my server and save it to a file. On the client side, I am using axios, and on the server side, I have implemented Hapi js. However, I am facing difficulties in extracting data from the request in the Hapi handler ...

Cloning a repository using the GitHub API via client-side Javascript and OAuth.io

I am currently developing client-side Javascript code that will fork a specified GitHub repository. To achieve this, I am utilizing the OAuth.io service to obtain an OAuth token with the necessary API scopes of "public_repo" and "repo". For accessing the ...

Leverage i18next in an offline setting without the need for a web

I have developed a javascript application that works offline. Now, my clients are requesting me to translate it into another language. My idea is to utilize i18next for this task, but the hurdle lies in the fact that it requires a JSON file. Firebug indic ...

"Troubleshooting: jQuery Find function not functioning correctly with HTML template

I am having trouble with a Shopify liquid code that I am trying to load into an HTML template <script type="text/template" id="description"> <div class="product-ddd"> {{ product.description }} </div> ...

Load charts.js synchronously into a div using XMLHttpRequest

At the moment, there is a menu displayed on the left side of the page. When you click on the navigation links, the page content loads using the code snippet below: if (this.id == "view-charts") { $("#rightContainer").load("view-charts.php"); $(thi ...

Saving videos for offline viewing in a Vue Progressive Web App with Vue.js 3

I'm having a bit of an issue with my Vue 3 and Typescript setup. I've created a PWA where I display videos, which works perfectly online. However, when I try to access them offline, the videos don't load properly. I have stored the videos in ...

Is it possible to integrate iOS automation and web automation into a unified script utilizing selenium?

Currently, I am working on automating tasks on an iOS device using Selenium WebDriver and Appium. A specific scenario that I am tackling involves creating an account where I need to switch between the web browser to verify the account and then return to th ...

Can Webdrivermanager be used with RemoteWebDriver in Selenium Grid?

Upon updating the dependency for the WebDrivermanager in my pom.xml file, I encountered an issue. When running the code locally with the ChromeDriver, everything seems to work fine. driver = new ChromeDriver(capability); However, when attempting to run t ...

JavaScript Astro file not triggering window.onload event

I need assistance with my Astro components in my App: apps/myProject libs/components/header Within the header.astro component, I have a script that should execute once the entire page is rendered: <script is:inline> console.log('hello!' ...

Python Selenium is having trouble finding the elements

I've been struggling for hours to extract the specific text from a variety of elements on a website. I attached 2 images in hopes that they will help in identifying these elements by their similarities, such as having the same class name. The black un ...

Adjust the background color of the unordered list when it corresponds to the current

Looking to customize a course calendar so that the day column stands out on its designated day of the week. For example, on Wednesday, the Wednesday column should have a distinct appearance. I've been experimenting with different methods and searchi ...

Transform an array into a hierarchical JSON structure within an Angular Material tree

I am in a desperate quest to extract the selected nodes from an angular tree and convert them into a JSON nested format. So far, I have been able to retrieve the selected array of flat nodes using this.checklistSelection.selected. However, what I really ne ...

Send information from the textbox

I am trying to extract data from an input field and use it to create a QR code. While passing the value as text works, I am facing issues with passing the actual input value. Does anyone have a straightforward example of this process? import React, {Comp ...

Extend the center of the image horizontally

Is there a way to horizontally stretch only the middle section of an image? Let's say I have this specific image: https://i.sstatic.net/pty5A.jpg (source: lawrenceinspections.com) I need the rounded corners to remain unchanged, so simply stretchi ...