A guide on extracting JSON data from a script tag using Cheerio

I am in the process of extracting metadata from various websites. While utilizing Cheerio to retrieve elements like

$('meta[property="article:published_time"]').attr('content')
works smoothly for most sites, there are some where this specific metadata property is not clearly defined but can still be found within the HTML.

For instance, if I attempt to extract data from this particular page, there is no explicit published_time metadata property listed, yet the information is present within the file...

{"@context":"http://schema.org","@type":"NewsArticle","mainEntityOfPage":"https://news.yahoo.com/venezuela-deploys-soldiers-face-guyana-175722970.html","headline":"Venezuela Deploys Troops to East Caribbean Coast, Citing Guyana Threat","datePublished":"2023-12-28T19:53:10.000Z","dateModified":"2023-12-28T19:53:10.000Z","keywords":["Nicolas Maduro","Venezuela","Bloomberg","Guyana","Essequibo","Exxon Mobil Corp"],"description":"(Bloomberg) -- Venezuela has decided to deploy more than 5,000 soldiers on its eastern Caribbean coast after neighboring Guyana received a warship from the...","publisher":{"@type":"Organization","name":"Yahoo News","logo":{"@type":"ImageObject","url":"https://s.yimg.com/rz/p/yahoo_news_en-US_h_p_news_2.png","width":310,"height":50},"url":"https://news.yahoo.com/"},"author":{"@type":"Person","name":"Andreina Itriago Acosta","url":"","jobTitle":""},"creator":{"@type":"Person","name":"Andreina Itriago Acosta","url":"","jobTitle":""},"provider":{"@type":"Organization","name":"Bloomberg","url":"https://www.bloomberg.com/","logo":{"@type":"ImageObject","width":339,"height":100,"url":"https://s.yimg.com/cv/apiv2/hlogos/bloomberg_Light.png"}},"image":{"@type":"ImageObject","url":"https://s.yimg.com/ny/api/res/1.2/hs3Vjof2BqloeagLdsvfDw--/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD0xMjAy/https://media.zenfs.com/en/bloomberg_politics_602/2db14d66c52bec70cb0ec6d0553968c6","width":1200,"height":1202},"thumbnailUrl":"https://s.yimg.com/ny/api/res/1.2/hs3Vjof2BqloeagLdsvfDw--/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD0xMjAy/https://media.zenfs.com/en/bloomberg_politics_602/2db14d66c52bec70cb0ec6d0553968c6"}

Within this object, the "datePublished" field is available. How can I access this property using Cheerio?

Answer №1

If you're looking for specific data in JSON format nestled inside a <script> tag, here's how you can retrieve it: first, target all <script> tags and then iterate through them to locate one containing the snippet '"datePublished":'. Then, extract the text, utilize JSON.parse(), and access the .datePublished property:

const cheerio = require("cheerio"); // ^1.0.0-rc.12

const url = "<Your URL>";

fetch(url)
  .then(res => {
    if (!res.ok) {
      throw Error(res.statusText);
    }

    return res.text();
  })
  .then(html => {
    const $ = cheerio.load(html);
    const el = [...$("script")].find(e =>
      $(e).text().includes('"datePublished":')
    );
    const meta = JSON.parse($(el).text()); // => 2023-12-28T19:53:10.000Z
    console.log(meta.datePublished);
  })
  .catch(err => console.error(err));

For a more in-depth guide on this technique, check out this post. Although Python is used in the example, the principles are transferable to Node.js. Occasionally, the JSON embedded within the <script> may be encapsulated differently, requiring additional steps such as regex or JSON5 for parsing. Refer to this answer for a more intricate demonstration of extracting data from a <script> tag using Cheerio.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

How can one "organize" a dictionary from JSON in Swift to visualize it in a time-series chart?

While it's true that a dictionary is unsorted by definition, I need the date key to be sorted in order to plot it on a graph, similar to a stock price plot over the last 30 days. I've tried two methods from this source, but they didn't work ...

Running system commands using javascript/jquery

I have been running NodeJS files in the terminal using node filename.js, but now I am wondering if it is possible to execute this command directly from a JavaScript/jQuery script within an HTML page. If so, how can I achieve this? ...

Including onMouseUp and onMouseDown events within a JavaScript function

I am experiencing an issue with a div that contains an input image with the ID of "Area-Light". I am attempting to pass the ID of the input image to a function. Although event handlers can be directly added inside the input tag, I prefer to do it within ...

Eliminating carriage returns from JSON data

To get the results of a REST API call using the requests module, save the output to a file named 1.json. From this JSON file, extract the description and JIRA key. import csv import urllib2 import argparse import json from bson import json_util #password ...

"Troubleshooting: Why are errors not appearing in ts-node

Whenever I encounter an error in my code while compiling with ts-node, the error does not seem to appear in the console. For instance:let data = await fs.readFileSync(path); In the following code snippet, I am using "fs" to read a file by passing a path ...

Managing multiple Angular calls when additional submit buttons are present

I am working on a form that includes a drop-down menu, check box, and four buttons. Whenever any action is taken (such as checking/unchecking the box, selecting an option from the drop-down, or clicking a button), it should trigger a service call to update ...

What is the best way to automatically show scrollbars when a page loads using JavaScript?

How can I make the vertical scrollbar appear as soon as the page loads using javascript? I am currently using a jquery slide toggle animation that causes the vertical scrollbar to show up because it makes the page longer. However, when the scrollbar appear ...

evaluation is not being executed in javascript

I am struggling to assign a value of 1 to the educationflag variable. I am trying to avoid calling the enableEdit.php file when the flag is set to 1. The issue arises when control reaches the if condition but fails to set the variable to 1. Here is my cod ...

PHP and AJAX: Combining Powers to Fetch Data

Greetings. I am currently in the process of creating a WordPress plugin that will manually send an email containing WooCommerce Order details to a specified supplier's email address. I am facing a challenge in understanding how to load data when a use ...

Tips for preventing page breaks (when printing or saving as a PDF) in lengthy HTML tables

Here is the link to a single HTML file (including style and scripts): FQ.html The problem I'm facing can be seen in this image: https://i.sstatic.net/Nr4BZ.png I've tried several solutions, the latest of which involves the following CSS... @me ...

Accordion menu causing Javascript linking problem

After following a tutorial, I managed to create an accordion menu in JavaScript. In the current setup, each main li with the class "ToggleSubmenu" acts as a category that can hide/show the sub-lis. Now, my query is this: How can I retain the link functio ...

Combining multiple template filters in ng-table with the power of CoffeeScript

Combining AngularJS, ng-table, and coffeescript has been quite a task for me. I've been trying to create a multiple template filter within coffeescript and pass it into my angularjs template. One of the challenges I'm facing is with a combined & ...

What is preventing the direct passing of the dispatch function from a useState hook into an onClick handler function?

const Counter = () => { const [count, setCount] = useState(0); // Uncommenting the line below would result in an infinite loop because it directly invokes setCount(count + 1) on render. // This causes the component to continuously re-render and up ...

Attempting to remove certain selected elements by using jQuery

Struggling to grasp how to delete an element using jQuery. Currently working on a prototype shopping list application where adding items is smooth sailing, but removing checked items has become quite the challenge. Any insights or guidance? jQuery(docume ...

Encountered an error message stating 'Unexpected Token <' while attempting to launch the node server

After adapting react-server-example (https://github.com/mhart/react-server-example), I encountered an issue with using JSX in my project. Despite making various changes like switching from Browserify to Webpack and installing babel-preset-react, I am still ...

UV mapping with Plane BufferGeometry in Three.js

I'm facing some challenges creating a buffergeometry plane, specifically with the uv coordinates. Despite following advice from Correct UV mapping Three.js, I can't seem to achieve the desired result. Below is the snippet of code for the uv coor ...

Identify when 2 sets of radio buttons are chosen using jQuery

I need assistance with a webpage that presents the user with two simple yes-no inquiries. Below each question, there are two radio buttons for selecting either yes or no. <p>Question 1: Yes or No?</p> <input type="radio" name="q ...

Using jQuery to retrieve the content of a textarea and display it

I need help finding the right way to read and write to a Linux text file using JavaScript, jQuery, and PHP. Specifically, I want to retrieve the value from a textarea (#taFile) with jQuery ($("#taFile").val();) and send it via $.post to a PHP script that w ...

Sending parameters to a service's factory

Here is the HTML code I am working with: <div class='container-fluid' ng-controller="TypeaheadCtrl"> <p></p> <b>Selected User</b> Enter a name: <input type="text" ng-model="selected" typeahead="user ...

Swap out the <a> tag for an <input type="button"> element that includes a "download" property

I have been working on a simple canvas-to-image exporter. You can find it here. Currently, it only works with the following code: <a id="download" download="CanvasDemo.png">Download as image</a> However, I would like to use something like th ...