Having trouble extracting artist information from Beatport using Cheerio within Next.js 14 Server Actions

I'm currently working on scraping artist data from Beatport by using Cheerio within a Next.js 14 Server Action. The main objective is to look for an artist, select the first artist card from the results, and extract the URL of that artist. However, my current implementation isn't able to locate the artist card even though I can visibly see it in the HTML when I inspect the page.

Below is the code I am utilizing:

"use server";

import fetch from "node-fetch";
import { load } from "cheerio";

interface BeatportArtist {
  name: string;
  beatportUrl: string;
  imageUrl: string;
}

const BASE_URL = "https://www.beatport.com";

export async function scrapeBeatportArtist(
  name: string
): Promise<BeatportArtist | null> {
  try {
    const searchUrl = `${BASE_URL}/search?q=${encodeURIComponent(name)}`;
    console.log(`Searching Beatport for artist: ${name}`);
    console.log(`Search URL: ${searchUrl}`);

    const searchResponse = await fetch(searchUrl);
    const searchHtml = await searchResponse.text();
    const $search = load(searchHtml);

    console.log('Search HTML loaded.');

    // Finding the initial div with the specific class
    const artistCard = $search("div.ArtistCard-style__Wrapper-sc-7ba2494f-10.gdlIrO.show-artist").first();
    console.log('Artist card:', artistCard.html());  // Recording the HTML of artistCard

    if (!artistCard.length) {
      console.log(`No artist card found for artist: ${name}`);
      return null;
    }

    // Locating the <a> inside artistCard with the title corresponding to the artist's name
    const artistLink = artistCard.find(`a.artwork[title="${name}"]`).attr("href");

    console.log('Artist link:', artistLink);  // Capturing the artistLink

    if (!artistLink) {
      console.log(`No Beatport profile found for artist: ${name}`);
      return null;
    }

    const artistUrl = `${BASE_URL}${artistLink}`;
    console.log(`Found Beatport profile for artist ${name}: ${artistUrl}`);
    const artistResponse = await fetch(artistUrl);
    const artistHtml = await artistResponse.text();
    const $artist = load(artistHtml);

    const imageUrl = $artist(".artist-hero__image img").attr("src") || "";

    return {
      name,
      beatportUrl: artistUrl,
      imageUrl,
    };
  } catch (error) {
    console.error(`Error scraping Beatport for artist ${name}:`, error);
    return null;
  }
}

Challenges Faced:

The script indicates "No artist card found for artist: [artist name]" despite the artist card being visible in the HTML when the page is inspected. I have utilized the class ArtistCard-style__Wrapper-sc-7ba2494f-10.gdlIrO.show-artist to find the artist card, followed by attempting to find the <a> tag with the artwork class and title attribute matching the artist's name.

<div class="ArtistCard-style__Wrapper-sc-7ba2494f-10 gdlIrO show-artist" data-testid="artist-card">
  <div class="ArtistCard-style__Meta-sc-7ba2494f-9 bcxGRv">
    <a title="Artist Name" class="artwork" href="/artist/artist-name/123456">
      <div class="ArtistCard-style__Overlay-sc-7ba2494f-7 kSaKRF"></div>
      <span class="ArtistCard-style__Name-sc-7ba2494f-5 derVIL">Artist Name</span>
      <div class="ArtistCard-style__ImageWrapper-sc-7ba2494f-8 hmTKKR">
        <img alt="Artist Name" src="artist-image-url.jpg" />
      </div>
    </a>
  </div>
</div>

Troubleshooting Efforts:

I have verified that the artistCard.html() record outputs the anticipated HTML structure. I experimented with different selectors and analyzed the loaded HTML to ensure consistency with the targeted structure. What could potentially be the issue here? Any advice or recommendations on how to precisely spot and retrieve the artist's URL from the search outcomes would be greatly appreciated.

Answer №1

The information is dynamically injected into the DOM tree by JavaScript after the page loads, so what you see in your developer tools may not match the original HTML content delivered from the server. However, the desired data can be found within a JSON payload embedded in the base HTML:

<script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":...

You can extract this JSON data and navigate through it to locate the specific details you need. Once found, you can construct the corresponding URL using the extracted information:

const data = JSON.parse($("#__NEXT_DATA__").text());
const {artist_id, artist_name} =
  data.props.pageProps.dehydratedState.queries[0].state.data.artists.data[0];
const url = `https://www.beatport.com/artist/${artist_name}/${artist_id}`;
console.log(url);

Here's a complete example that demonstrates how to achieve this functionality:

const cheerio = require("cheerio"); // ^1.0.0-rc.12

// Additional code implementation goes here

If you encounter difficulties with multi-word artist names, consider replacing spaces with hyphens in the URL construction process. In some cases, using Puppeteer for dynamic URL extraction might be necessary if the URL generation process requires JavaScript execution post-initial page load. Here's a basic setup using Puppeteer:

const puppeteer = require("puppeteer"); // ^22.10.0

// Same searchUrl variables as above

// Puppeteer implementation details go here

Additionally, it's advisable to block non-essential requests such as fonts, images, and scripts to streamline the data retrieval process. Alternatively, utilizing the Beatport API could also simplify the information extraction task.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Sending database data from PHP to JavaScript - mysql_fetch_array behaving unexpectedly

Forgive me if there is already an answer out there to my question, but after a week of searching online and experimenting, I decided to turn to the experts for help. Purpose: My goal is to query an SQL database using server-side code (specifically PHP), t ...

React causing issues when displaying PNG images on browser

Running into an issue with my React app where I am unable to render a PNG file from the "src" folder. The error message pops up on Google Chrome browser, showcasing the problem: https://i.stack.imgur.com/y8dJf.png Unfortunately, my project doesn't ha ...

Can you show me how to use vuejs to implement h2 tags?

Within the article, there are H2 tags that I want to use as a table of contents. However, despite attempting methods like Replace and substring, I haven't been able to isolate only the H2 tags from the content. The post content is structured in JSON f ...

Prevent the submit button from being clicked again after processing PHP code and submitting the form (Using AJAX for

I've set up a voting form with submit buttons on a webpage. The form and PHP code work fine, and each time someone clicks the vote button for a specific option, it gets counted by 1. However, the issue is that users can spam the buttons since there i ...

What is the best approach for managing caching effectively?

My SPA application is built using Websocket, Polymer, ES6, and HTML5. It is hosted on a Jetty 9 backend bundled as a runnable JAR with all resources inside. I want to implement a feature where upon deploying a new version of the JAR, I can send a message ...

The form tag is failing to display the Name input field

Currently, I am facing an issue with the form tag rendering like this: <form method="post" action="AdminEditListTest.aspx" id="Form1" name="Form1"> This piece of code is considered as legacy and relies on document.Form1.XXX for DOM manipulation o ...

How can you include a comma between two objects within a string, excluding the last object, using JavaScript?

I have a string object stored in a variable passed from another class. My question is how can I add a comma between two objects, excluding the last object? Below is my code snippet: { "id":"57e4d12e53a5a", "body":"asdas", "publishe ...

Is there a way to validate form input before inserting it into a database using the onsubmit event?

Looking for a way to enhance the verification process of my signup form, I aim to ensure that all data entered is validated before being saved in the database. The validation process involves checking if the phone number consists only of numerical values a ...

Node.js Firebase 3.0 authentication integration

After upgrading Firebase to version 3.0 and needing to migrate, I encountered an issue with the authentication of my node server. The code in question is as follows: var firebase = require('firebase'); var config = { apiKey: "<my apiKey> ...

The utilization of ES Modules within a Next.js server.js

After reviewing a few examples in the Next.js repository, I came across: https://github.com/zeit/next.js/tree/master/examples/custom-server-express https://github.com/zeit/next.js/tree/master/examples/custom-server-koa I observed that these examples ut ...

Is it necessary for the useSWR hook in swr to send an HTTP request for revalidation each time the data is accessed, even if cached data already exists?

As I delved into the SWR react hook documentation and the Stale-While-Revalidate methodology, it became clear that SWR utilizes cached data as a temporary placeholder to quickly display results to users. (While SWR has numerous benefits, this aspect stood ...

The catch block seems to be failing to capture the errors thrown by the fetch API

I am facing an issue with my code where the fetch method, enclosed within a catch block, fails to capture errors. Despite attempting various error handling approaches on my node backend, the problem persists. https://i.stack.imgur.com/0reJj.png const e ...

unable to append double quotation marks at the end of a string in node.js

I have a large translation file and I need to add double quotes at the end of each string. My Objective. input string ===> _your_report_is_being_prepared = Your report is being prepared desired output ===> "_your_report_is_being_prepared" : "Your ...

Can I Select a Specific Node in React Component with Unique Behavior Upon Reuse?

Seeking advice on how to uniquely target a specific node within my react reusable component in order to perform operations on it multiple times independently. I will be rendering this reusable component several times on the same page in my App.js. Here is ...

I recently incorporated Puppeteer along with its necessary dependencies onto Heroku, and as a result, pushing my small app to Heroku now takes approximately 5 to 6 minutes

I am currently using Puppeteer to create a PDF from an HTML page. I installed the npm package by running the following command: npm i puppeteer However, when I deployed to Heroku, I encountered an error: Error while loading shared libraries: libnss3.s ...

I'm trying to figure out how to save a Mongoose query and then pass it as a parameter to render an EJS page. How can I achieve this

I'm currently in the process of constructing an admin dashboard, and one feature I want to include is displaying mongoose data such as user information and recent tutoring sessions. However, I'm facing challenges when it comes to saving this data ...

Looking to incorporate an Ajax feature that allows for updating dropdown menus in each row of the database

Please find below the UI screenshot highlighting the dropdown menu: What I am looking for? I would like the option selected in the dropdown menu to be updated for each specific row in the database using AJAX. Below are the codes I have written. As a beg ...

Tips for executing an asynchronous fetch prior to the first rendering

Currently, I am working with the Wordpress API using Next.js on the front end. My goal is to fetch my navigation/menu data and have it pre-rendered. However, my attempts have only resulted in an empty <nav> </nav> element being rendered when I ...

encountering a problem with permissions while attempting to update npm

Has anyone encountered a permission error with npm when trying to update to the latest version? I recently tried updating npm and received this error message. I'm unsure of how to resolve it. Any suggestions? marshalls-MacBook-Air:Desktop marshall$ n ...

Formatting time on the x-axis in a Chart.js graph

I've been attempting to create a scatter plot with Chart.js using some data, but no matter what I try from various internet resources, the x-axis continues to display as integers instead of dates. Here's a screenshot for reference. UPDATE: I dis ...