Tips for retrieving specific information from Wikipedia using AJAX

Is there a way to retrieve the information consistently displayed in the right box during searches using AJAX? I've tried using the Wikipedia API, but haven't been able to find the specific information I need.

Answer №1

Oh my goodness, I never imagined spending so much time responding to this query on stackoverflow,

Presented below is a crude yet functional code snippet:

// wikipedia article (in url)
const wiki_article_title = 'Grand_Theft_Auto_V';

// please check https://www.mediawiki.org/wiki/API:Get_the_contents_of_a_page
const url_api = `https://en.wikipedia.org/w/api.php?action=parse&page=${wiki_article_title}&prop=text&formatversion=2&origin=*`;

function extractInfoboxFromWiki(doc) {
  // here we extract the json provided by api
  const json = doc.querySelector('pre');
  const obj = JSON.parse(json.innerText);
  let html = obj.parse.text;

  // for whatever reason '\n' substring are present in html text
  // so we remove them with a regex to not break 'JSON.parse()'
  html = html.replace(/\\n/gm, '');

  // get the interesting part of api reponse
  const node = document.createElement('div');
  node.innerHTML = html;
  const infobox = node.querySelector('.infobox');
  let infos = [...infobox.firstChild.children];

  let output = {};

  // parse title
  output['title'] = infos[0].querySelector('th').innerText;
  infos.shift();

  // parse image url
  output['image_url'] = infos[0].querySelector('a').getAttribute("href");
  infos.shift();

  // traverse the nodes to map captions with values
  infos.forEach( tr => {
    const key = tr.querySelector('th').innerText;

    if(tr.querySelector('ul')) {
      const lis = tr.querySelectorAll('li');
      const values = [...lis].map( li => li.innerText);
      output[key] = values;
    } else {
      const value = tr.querySelector('td').innerText;
      output[key] = value;
    }

  });

  // return beautified json
  return JSON.stringify(output, null, 4);
}

fetch(url_api)
  .then(response => response.text())
  .then(text => {
    const parser = new DOMParser();
    const doc = parser.parseFromString(text, 'text/html');

    const DESIRED_RESULT = extractInfoboxFromWiki(doc);
    const formattedOutput = `<pre>${DESIRED_RESULT}</pre>`;

    document.write(formattedOutput);
  });

If you test it with the Grand Theft Auto V article, you will observe:

{
    "title": "Grand Theft Auto V",
    "image_url": "/wiki/File:Grand_Theft_Auto_V.png",
    "Developer(s)": "Rockstar North[a]",
    "Publisher(s)": "Rockstar Games",
    "Producer(s)": [
        "Leslie Benzies",
        "Imran Sarwar"
    ],
    "Designer(s)": [
        "Leslie Benzies",
        "Imran Sarwar"
    ],
    "Programmer(s)": "Adam Fowler",
    "Artist(s)": "Aaron Garbut",
    "Writer(s)": [
        "Dan Houser",
        "Rupert Humphries",
        "Michael Unsworth"
    ],
    "Composer(s)": [
        "Tangerine Dream",
        "Woody Jackson",
        "The Alchemist",
        "Oh No"
    ],
    "Series": "Grand Theft Auto",
    "Engine": "RAGE",
    "Platform(s)": [
        "PlayStation 3",
        "Xbox 360",
        "PlayStation 4&q...</answer1>
<exanswer1><div class="answer accepted" i="66002290" l="4.0" c="1612216816" m="1612229346" v="1" a="U2ltb24gRGVoYXV0" ai="12153710">
<p>ok, omg, I never spend this much time for answering a question on stackoverflow,</p>
<p>so you have a working snippet below, it's dirty but it's working :)</p>
<p><div>
<div>
<pre class="lang-js"><code>// wikipedia article (in url)
const wiki_article_title = 'Grand_Theft_Auto_V';

// please check https://www.mediawiki.org/wiki/API:Get_the_contents_of_a_page
const url_api = `https://en.wikipedia.org/w/api.php?action=parse&page=${wiki_article_title}&prop=text&formatversion=2&origin=*`;

function extractInfoboxFromWiki(doc) {
  // here we extract the json provided by api
  const json = doc.querySelector('pre');
  const obj = JSON.parse(json.innerText);
  let html = obj.parse.text;

  // for whatever reason '\n' substring are present in html text
  // so we remove them with a regex to not break 'JSON.parse()'
  html = html.replace(/\\n/gm, '');

  // get the interesting part of api reponse
  const node = document.createElement('div');
  node.innerHTML = html;
  const infobox = node.querySelector('.infobox');
  let infos = [...infobox.firstChild.children];

  let output = {};

  // parse title
  output['title'] = infos[0].querySelector('th').innerText;
  infos.shift();

  // parse image url
  output['image_url'] = infos[0].querySelector('a').getAttribute("href");
  infos.shift();

  // traverse the nodes to map captions with values
  infos.forEach( tr => {
    const key = tr.querySelector('th').innerText;

    if(tr.querySelector('ul')) {
      const lis = tr.querySelectorAll('li');
      const values = [...lis].map( li => li.innerText);
      output[key] = values;
    } else {
      const value = tr.querySelector('td').innerText;
      output[key] = value;
    }

  });

  // return beautified json
  return JSON.stringify(output, null, 4);
}

fetch(url_api)
  .then(response => response.text())
  .then(text => {
    const parser = new DOMParser();
    const doc = parser.parseFromString(text, 'text/html');

    const WHAT_YOU_WANT = extractInfoboxFromWiki(doc);
    const formated = `<pre>${WHAT_YOU_WANT}</pre>`;

    document.write(formated);
  });

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Use jQuery to remove an ID from an element using AJAX to interact with a database, then update

Hello, I am currently using jQuery to retrieve ids of multiple fields with ajax and send the data to be deleted via php. Despite being able to delete one item successfully, I am struggling to remove other ids. For example: Within a for loop that retrieves ...

Tips on securely saving passwords using Greasemonkey

Created a custom userscript that prompts users to save their login credentials in order to avoid entering them repeatedly. Familiar with using localStorage.setItem() to store key-value pairs, but concerned about storing passwords in clear text. Seeking ad ...

Aligning a div vertically in the center of its parent container

I am trying to vertically align a child element within its parent element <!DOCTYPE html> <html> <head> <title>Test</title> <style type="text/css"> #body { font-family: sans-serif, arial, 'Roboto'; } #outer ...

Developing a Typescript module, the dependent module is searching for an import within the local directory but encounters an issue - the module cannot be found and

After creating and publishing a Typescript package, I encountered an issue where the dependent module was not being imported from the expected location. Instead of searching in node_modules, it was looking in the current folder and failing to locate the mo ...

Utilizing the Google Translate API within an ASP MVC framework to translate a div's content from English to Arabic

Currently, I am working on a small project that involves two divs: one for English and another for Arabic. Despite creating the project, I am encountering an issue with getting the translation from English to Arabic. Below is the code I have attempted, but ...

Automatically reloading POST request when browser back button is pressed

Our web application is built on Spring MVC and Thymeleaf. When a user lands on the home page with a GET request, they enter 2 parameters and submit a POST request. After successful submission, they are directed to another page (let's call it page2) wi ...

In Three.js, FBX bones exhibit smooth rotation, but GLTF bones display strange rotation behavior

Currently, I have successfully implemented a dynamic model that works with an FBX file using the Three.js FBXLoader. However, for convenience sake, I decided to switch to using a GLTF/GLB file as it contains textures within the same file. To achieve this, ...

Learn how to access nested JSON data within a DStream using PySpark

I have been working on a code to retrieve data from the Tweepy API through streaming. I am successfully receiving the data inside a stream object, but facing challenges in extracting information like streamp["user"]["followers_count"]. I attempted using js ...

What is the best way to pause the display of dynamic search outcomes in React?

I am currently working on developing a dynamic search results workflow. While I have successfully managed to render the results without any issues, I am facing a challenge in toggling them off when all input is deleted from the search bar. When typing begi ...

Using Joomla to generate JSON data for AJAX requests

I encountered an issue while working on creating components with Joomla. The problem I faced is related to printing labels on Yandex map through objectManager, which requires retrieving data from the site in JSON format. To test this, I quickly created v ...

Transform the outcome of Request() into a variable

I'm currently working with the following code snippet: request('http://steamcommunity.com/market/priceoverview/?currency=1&appid=730&market_hash_name=Gamma Case', function (e, r, body){ var req_data = JSON.parse(body); conso ...

A method to deactivate a button cell after it has been clicked within a for loop

I am struggling with correctly disabling a button in my React code when it is clicked. I attempted to pass in an array and handle the button click, but it consistently results in errors. This task seems more complicated than it should be. How can I ensure ...

Having trouble accessing undefined properties? Facing issues with the latest Angular version?

Why am I encountering an error and what steps can be taken to resolve it? Currently using the latest version of Angular. ERROR TypeError: Cannot read properties of undefined (reading 'id') Here is the JSON data: { "settings": [ { ...

Receiving a null response from an Ajax post request

I am facing an issue where a model object I am passing into a post Action in my web app contains null variables. The action looks like this: [HttpPost] [ValidateAntiForgeryHeader] public async Task<JsonResult> StartRound(RoundModel model) Here are ...

The term "Cardlist" has not been defined and is therefore causing an

I created a CardList and attempted to add cards into the list using map, but encountered an error import React from 'react'; import Card from './Card'; const CardsContainer = ({robots}) => { const cardComponents = robots.map((r ...

Is it possible to include a visible comment in an ajax call that can be viewed in Fiddler when analyzing the outgoing data?

Here is an example of the code I am working with: $.ajax({ cache: false, url: "/xx" }).done(onAjaxDone).fail(function (jqXHR, textStatus, errorThrown) { Dialog.Alerts.ajaxOnFailure(jqXHR, textStatus, err ...

Switch up the like button for users who have previously liked a post

Greetings, I trust everything is going well. I am attempting to implement a feature where users can like a post or article created by others using a button. I have established a const function named "getAPostLikeRecord()," which retrieves a list of users w ...

Tips for limiting a website to load exclusively within an iframe

I am managing two different websites: https://exampleiframe.com (a third-party website), https://example.com (my own website) My goal is to limit the loading of https://example.com so that it only opens within an iframe on https://exampleiframe.com To a ...

NodeJS is facing a severe challenge in properly rendering HTML and its accompanying CSS code, causing a major

Once upon a time, I built a beautiful website while practicing HTML, CSS, and JS. It had multiple web pages and used Express for the backend. Unfortunately, I lost all the files associated with it and took a break from web programming for some time. Now, w ...

A guide on incorporating recursion to nest dictionaries within current data records

Currently, I am in the process of converting XML data from Open Street Map into JSON format to upload it into a database. Due to the size of the data, I am using iterparse for parsing. However, I have encountered some tags that have a specific structure li ...