Tips for extracting information from a website that uses Javascript with Python?

I am currently working on a web scraping project to extract data from the DoorDash website specifically for restaurants located in Chicago. The goal is to gather information about all the restaurant listings in the city, such as reviews, ratings, cuisine, address, and state.

Although the site is organized by different cities, I am solely focused on extracting data for Chicago. There are approximately 4,326 listings for restaurants in this city that I aim to capture in an Excel sheet.

My attempt to extract details like restaurant name, cuisine, ratings, and reviews using the class "StoreCard_root___1p3uN" has been unsuccessful so far. The output shows up as blank without any data being displayed.


from selenium import webdriver

chrome_path = r"D:\python project\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get("https://www.doordash.com/food-delivery/chicago-il-restaurants/")

driver.find_element_by_xpath("""//*[@id="SeoApp"]/div/div[1]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[3]""").click()

posts = driver.find_elements_by_class_name("StoreCard_root___1p3uN")

for post in posts:
    print(post.text) 

Answer №1

Utilize the API link to retrieve data through a XHR request.

Explore and extract information from the API provided in the link below using scraping techniques.

You can cycle through this link with the parameter offset=0, incrementing it by +50 each time as each page displays 50 items until you reach the end at 4300. Simply use range(0, 4350, 50)

import requests
import pandas as pd

data = []
for item in range(0, 4350, 50):
    print(f"Extracting item# {item}")
    r = requests.get(
        f"https://api.doordash.com/v2/seo_city_stores/?delivery_city_slug=chicago-il-restaurants&store_only=true&limit=50&offset={item}").json()
    for item in r['store_data']:
        item = (item['name'], item['city'], item['category'],
                item['num_ratings'], item['average_rating'], item['average_cost'])
        data.append(item)

df = pd.DataFrame(
    data, columns=['Name', 'City', 'Category', 'Num Ratings', 'Average Ratings', 'Average Cost'])
df.to_csv('output.csv', index=False)
print("done")

Sample of Output:

https://i.sstatic.net/Cn1ER.png

View Output online: Click Here

Full Data is here: Click Here

Answer №2

I encountered the same issue and found a solution using a combination of selenium and BeautifulSoup. Here's how I resolved it:

  1. Ensure that the script clicks on the button to reveal the menu and prices, if needed.
  2. The extracted menu and prices may appear as nested lists, requiring additional processing before using the get_text() function on them directly. For detailed code and explanation, refer to this informative medium article

Addressing empty list challenges in web scraping using selenium

Answer №3

After looking into the API mentioned by αԋɱҽԃ αмєяιcαη, I discovered they also provide a restaurant information endpoint.

If you're interested, here is the URL: https://api.doordash.com/v2/restaurant/[restaurantId]/

However, recently it seems to be encountering an issue where it returns {"detail":"Request was throttled."}

Is anyone else facing the same problem or has managed to find a workaround for this issue?

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Updating the background image with Jquery is resulting in a distracting flicker effect

Here is a snippet of the script I'm using to create a slider that changes the background image for each image object in a specified time cycle. #Sliderimg - height set to 500px, $("#Sliderimg").css({ "background-image": "url(../Images/" + SliderIma ...

Can a method be generated through the use of an argument?

Imagine having two distinct functions below: def do1(x, y): return x + y def do2(x, y): return x - y You could define a class in the following manner: class foo(object): def __init__(self, func): self.func = func abc = foo(func=do1 ...

Limiting image size in react-dropzone

I am currently utilizing react-dropzone for image uploads within my application. The functionality is working smoothly, including the validation for image size. However, I am facing an issue with checking the dimensions of the uploaded image. I am looking ...

A guide to tracing a button click with a personalized <img> tag in Google Tag Manager

Recently, a marketing firm provided me with a custom tag to implement on a Wordpress site for tracking clicks on specific buttons. I am utilizing the Elementor page builder and have assigned unique IDs to each button. Although I am new to Google Tag Manage ...

JQuery 1.9 Javascript file contains a broken link with the path "/a"

During a security audit of our project, we discovered a vulnerability related to a broken link "/a". After thoroughly searching our project, we pinpointed the issue to a link in the JQuery-1.9.js JavaScript file that is being utilized. Within the JQuery- ...

The JavaScript script to retrieve the background color is malfunctioning

I am currently working on developing a highlighting feature for an HTML table that will dynamically change the row colors on mouseover. Below is the code snippet I have been using, but it seems to be experiencing some issues. Any assistance would be greatl ...

GATSBY: Error: Unable to find the specified property 'includes' within an undefined value

I'm struggling to figure out how to properly filter images in my portfolio website as discussed in this post… Every time I attempt it, I encounter the following error: "TypeError: Cannot read property 'includes' of undefined" D ...

Converting an Angular1 App into a VueJs app: A step-by-step guide

Let's dive right in: I'm embarking on the journey of revamping an app that originally utilized Angular 1, but this time around I'll be harnessing the power of VueJS 2. As someone unfamiliar with Angular 1, I'm faced with some perplexing ...

Experiencing difficulties with knockout bindings

I have a situation where I have multiple tabs labeled A, B, and C, and upon loading the 'C' tab, the model properties should data-bind to tab 'C'. I am encountering an issue with data binding. These three tabs (A, B, C) are located ins ...

Is it beneficial to avoid using an inner loop for iterating through nested data in order to improve performance

I am currently faced with a challenge involving a large dataset consisting of over 1 million rows. My task involves counting the truthy values for each ID and creating a new dictionary based on this information. Although I have managed to come up with a s ...

Unable to show the response from an HTML servlet using Ajax and qTip2

I am having an issue where I am trying to display text (or html) received from a servlet response in a qTip2 tooltip within a jsp page. Despite verifying that the servlet is being invoked and returning text using Firebug, I encountered an error when attemp ...

Compiling TypeScript into JavaScript with AngularJS 2.0

Exploring the capabilities of AngularJS 2.0 in my own version of Reddit, I've put together a script called app.ts ///<reference path="typings/angular2/angular2.d.ts" /> import { Component, View, bootstrap, } from "angular2/angular2 ...

iPython does not show Folium map due to an error message stating: 'Uncaught ReferenceError: L is not defined'

Attempting to showcase a basic map in iPython using the Folium leaflet library. Recently installed iPython via Anaconda with Folium added through Pip. Verified that everything is fully updated Ran this code in iPython import folium map = folium.Map(locat ...

Step-by-step guide on transforming jQuery code into Vue JS:

Recently delving into Vue, attempting to translate previous JS + JQuery code into Vue Here is the list I'm working with: <ul id="progressbar"> <li class="active">integration Ip's</li> <li>T ...

Creating a dynamic div with various paragraphs using only Javascript

My goal is to dynamically generate paragraphs with their respective icons inside individual div elements. For instance, if the service API returns 30 items, I will create 30 div elements with the class "tile". However, if only one item is returned, then I ...

Using express.static can cause an issue with a Nodejitsu application

I'm completely puzzled by this issue that keeps cropping up. Whenever I try to add a static path to my app, I encounter an error on the hosting platform I use called "nodejitsu". The error message indicates that the application is not functioning prop ...

What is the reason behind SeleniumDriver/Java requiring a scroll down in order to successfully post on Facebook?

While attempting to have the Selenium Driver post a Facebook Post using the Firefox Driver, I encountered an issue. In order to successfully click the "post" button, I had to scroll down the browser before clicking it, otherwise an error would occur. Here ...

What could be the reason for the failure of my class isInstance() check

Do you see any issues with the object being an instance of ChatRoom? Let me know your thoughts. Class: export class ChatRoom { public id?: number; public name_of_chat_room: string; public chat_creator_user_id: number; public chat_room_is_active: 0 ...

Move an object within one iframe and place it into a different iframe

I am facing a challenge with dragging an image from one iframe and dropping it into another iframe. The code I have tried so far seems to be ineffective. Actions builder = new Actions(driver); builder.moveToElement(fromElement); builder.clickAndHold(fromE ...

Updating React props using useState?

Below is a component that aims to enable users to update the opening times of a store. The original opening times are passed as a prop, and state is created using these props for initial state. The goal is to use the new state for submitting changes, while ...