I'm currently working on scraping artist data from Beatport by using Cheerio within a Next.js 14 Server Action. The main objective is to look for an artist, select the first artist card from the results, and extract the URL of that artist. However, my current implementation isn't able to locate the artist card even though I can visibly see it in the HTML when I inspect the page.
Below is the code I am utilizing:
"use server";
import fetch from "node-fetch";
import { load } from "cheerio";
interface BeatportArtist {
name: string;
beatportUrl: string;
imageUrl: string;
}
const BASE_URL = "https://www.beatport.com";
export async function scrapeBeatportArtist(
name: string
): Promise<BeatportArtist | null> {
try {
const searchUrl = `${BASE_URL}/search?q=${encodeURIComponent(name)}`;
console.log(`Searching Beatport for artist: ${name}`);
console.log(`Search URL: ${searchUrl}`);
const searchResponse = await fetch(searchUrl);
const searchHtml = await searchResponse.text();
const $search = load(searchHtml);
console.log('Search HTML loaded.');
// Finding the initial div with the specific class
const artistCard = $search("div.ArtistCard-style__Wrapper-sc-7ba2494f-10.gdlIrO.show-artist").first();
console.log('Artist card:', artistCard.html()); // Recording the HTML of artistCard
if (!artistCard.length) {
console.log(`No artist card found for artist: ${name}`);
return null;
}
// Locating the <a> inside artistCard with the title corresponding to the artist's name
const artistLink = artistCard.find(`a.artwork[title="${name}"]`).attr("href");
console.log('Artist link:', artistLink); // Capturing the artistLink
if (!artistLink) {
console.log(`No Beatport profile found for artist: ${name}`);
return null;
}
const artistUrl = `${BASE_URL}${artistLink}`;
console.log(`Found Beatport profile for artist ${name}: ${artistUrl}`);
const artistResponse = await fetch(artistUrl);
const artistHtml = await artistResponse.text();
const $artist = load(artistHtml);
const imageUrl = $artist(".artist-hero__image img").attr("src") || "";
return {
name,
beatportUrl: artistUrl,
imageUrl,
};
} catch (error) {
console.error(`Error scraping Beatport for artist ${name}:`, error);
return null;
}
}
Challenges Faced:
The script indicates "No artist card found for artist: [artist name]" despite the artist card being visible in the HTML when the page is inspected. I have utilized the class ArtistCard-style__Wrapper-sc-7ba2494f-10.gdlIrO.show-artist to find the artist card, followed by attempting to find the <a> tag with the artwork class and title attribute matching the artist's name.
<div class="ArtistCard-style__Wrapper-sc-7ba2494f-10 gdlIrO show-artist" data-testid="artist-card">
<div class="ArtistCard-style__Meta-sc-7ba2494f-9 bcxGRv">
<a title="Artist Name" class="artwork" href="/artist/artist-name/123456">
<div class="ArtistCard-style__Overlay-sc-7ba2494f-7 kSaKRF"></div>
<span class="ArtistCard-style__Name-sc-7ba2494f-5 derVIL">Artist Name</span>
<div class="ArtistCard-style__ImageWrapper-sc-7ba2494f-8 hmTKKR">
<img alt="Artist Name" src="artist-image-url.jpg" />
</div>
</a>
</div>
</div>
Troubleshooting Efforts:
I have verified that the artistCard.html() record outputs the anticipated HTML structure. I experimented with different selectors and analyzed the loaded HTML to ensure consistency with the targeted structure. What could potentially be the issue here? Any advice or recommendations on how to precisely spot and retrieve the artist's URL from the search outcomes would be greatly appreciated.