Extracting data from website's table using JavaScript and opening the link in href

My goal is to extract the details page for each link found on this particular page.

The link provides access to all the information required: PAGE

However, I'm interested in extracting details from pages that have links like this:

href="javascript:subOpen('9ca8ed0fae15d43dc1257e7300345b99')"

I've shared a sample spreadsheet using the ImportHTML feature to get an overview.

Google Spreadsheet

Any ideas on how to proceed with retrieving details from these individual pages?

UPDATE

I tried implementing the following method:

function doGet(e){
  var base = 'http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/'
  var feed =  UrlFetchApp.fetch(base + 'suche?OpenForm&subf=e&query=%28%5BVKat%5D%3DEH%20%7C%20%5BVKat%5D%3DZH%20%7C%20%5BVKat%5D%3DMH%20%7C%20%5BVKat%5D%3DMW%20%7C%20%5BVKat%5D%3DMSH%20%7C%20%5BVKat%5D%3DGGH%20%7C%20%5BVKat%5D%3DRH%20%7C%20%5BVKat%5D%3DHAN%20%7C%20%5BVKat%5D%3DWE%20%7C%20%5BVKat%5D%3DEW%20%7C%20%5BVKat%5D%3DMAI%20%7C%20%5BVKat%5D%3DDTW%20%7C%20%5BVKat%5D%3DDGW%20%7C%20%5BVKat%5D%3DGA%20%7C%20%5BVKat%5D%3DGW%20%7C%20%5BVKat%5D%3DUL%20%7C%20%5BVKat%5D%3DBBL%20%7C%20%5BVKat%5D%3DLF%20%7C%20%5BVKat%5D%3DGL%20%7C%20%5BVKat%5D%3DSE%20%7C%20%5BVKat%5D%3DSO%29%20AND%20%5BBL%5D%3D0').getContentText();

       var d = document.createElement('div'); //assuming you can do this
       d.innerHTML = feed;//make the text a dom structure
       var arr = d.getElementsByTagName('a') //iterate over the page links
       var response = "";
       for(var i = 0;i<arr.length;i++){
         var atr = arr[i].getAttribute('onclick');
         if(atr) atr = atr.match(/subOpen\((.*?)\)/) //if onclick calls subOpen
         if(atr && atr.length > 1){ //get the id
            var detail = UrlFetchApp.fetch(base + '0/'+atr[1]).getContentText();
            response += detail//process the relevant part of the content and append to the reposnse text
         }
        }      
       return ContentService.createTextOutput(response);
}

Unfortunately, I encountered an error when running this method:

ReferenceError: "document" is not defined. (line 6, file "")

What exactly does the object document refer to?

I have updated the Google Spreadsheet with a webapp integration.

Answer №1

To inspect the contents and javascript of a page, Firebug can be used effectively. An interesting find is that subOpen is an alias for subOpenXML as declared in xmlhttp01.js.

function subOpenXML(unid) {/*open found doc from search view*/
 if (waiting) return alert(bittewar);
 var wState = dynDoc.getElementById('windowState');
 wState.value = 'H';/*httpreq pending*/
 var last = '';
 if (unid==docLinks[0]) {last += '&f=1'; thisdocnum = 1;}
 if (unid==docLinks[docLinks.length-1]) {
  last += '&l=1';
  thisdocnum = docLinks.length;
 } else {
  for (var i=1;i<docLinks.length-1;i++)
   if (unid==docLinks[i]) {thisdocnum = i+1; break;}
 }
 var url = unid + html_delim + 'OpenDocument'+last + '&bm=2';
 httpreq.open('GET',    // &rand=' + Math.random();
  /*'/edikte/test/ex/exedi31.nsf/0/'+*/ '0/'+url, true);
 httpreq.onreadystatechange=onreadystatechange;
// httpreq.setRequestHeader('Accept','text/xml');
 httpreq.send(null);
 waiting = true;
 title2src = firstTextChild(dynDoc.getElementById('title2')).nodeValue;
}

To enhance the function source, you can modify it within the Console tab of Firebug by inserting a console.log(url) before the http call like so:

 var url = unid + html_delim + 'OpenDocument'+last + '&bm=2';
 console.log(url)
 httpreq.open('GET',    // &rand=' + Math.random();
  /*'/edikte/test/ex/exedi31.nsf/0/'+*/ '0/'+url, true);

Executing the function declaration in the Console tab allows you to update subOpen with the modified source. Clicking on the link will reveal that the URL being requested consists of the passed ID prefixed by '0/'. For example, in the provided instance, it would result in a GET request to:

http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/0/1fd2313c2e0095bfc1257e49004170ca?OpenDocument&f=1&bm=2

You can confirm this by examining the Network tab within Firebug and following the link.

To scrape details from the page, the following steps are required:

  1. Analyze the ID passed to subOpen
  2. Initiate a GET call to '0/'
  3. Parsing the response from the request

Reviewing the request response in the Network Tab reveals similar parsing might be necessary to retrieve the displayed content, although further investigation is needed.

UPDATE For the scraping task at hand, using the importHTML function may not be optimal. Google's HTML or Content Services could be more appropriate. Building a web app and implementing the doGet function is recommended:

function doGet(e){
  var base = 'http://www.ediktsdatei.justiz.gv.at/edikte/ex/exedi3.nsf/'
  var feed =  UrlFetchApp.fetch(base + 'suche?OpenForm&subf=e&query=%28%5BVKat%5D%3DEH%20%7C%20%5BVKat%5D%3DZH%20%7C%20%5BVKat%5D%3DMH%20%7C%20%5BVKat%5D%3DMW%20%7C%20%5BVKat%5D%3DMSH%20%7C%20%5BVKat%5D%3DGGH%20%7C%20%5BVKat%5D%3DRH%20%7C%20%5BVKat%5D%3DHAN%20%7C%20%5BVKat%5D%3DWE%20%7C%20%5BVKat%5D%3DEW%20%7C%20%5BVKat%5D%3DMAI%20%7C%20%5BVKat%5D%3DDTW%20%7C%20%5BVKat%5D%3DDGW%20%7C%20%5BVKat%5D%3DGA%20%7C%20%5BVKat%5D%3DGW%20%7C%20%5BVKat%5D%3DUL%20%7C%20%5BVKat%5D%3DBBL%20%7C%20%5BVKat%5D%3DLF%20%7C%20%5BVKat%5D%3DGL%20%7C%20%5BVKat%5D%3DSE%20%7C%20%5BVKat%5D%3DSO%29%20AND%20%5BBL%5D%3D0').getContentText();
       var response = "";
       var match = feed.match(/subOpen\('.*?'\)/g)
       if(match){
         for(var i = 0; i < match.length;i++){
              var m = match[i].match(/\('(.*)'\)/);
              if(m && m.length > 1){
                var detailText = UrlFetchApp.fetch(base + '0/'+m[1]);
                response += //dosomething with detail text 
                            //and concatenate in the response
              }
         }
       }
       return ContentService.createTextOutput(response);
}

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Arranging elements based on specific coordinates

const renderTimeSlots = () => { const timeSlots = []; for (let i = parseInt(workStartsAt); i <= parseInt(workEndsAt); i++) { if (i !== 0) { timeSlots.push( <div className="flex flex-row cursor-pointer"> ...

Prevent selection of rows in the initial column of DataTables

I am working on a basic datable, the code can be found here JS: var dataSet = [ ["data/rennes/", "Rennes", "rennes.map"], ["data/nantes/", "Nantes", "nantes.map"], ["data/tours/", "Tours", "tours.map"], ["data/bordeaux/", "Bordeaux", ...

What is the best way to display an image path and add it to the Ajax success function in a CodeIgniter application?

I am struggling to display my image path correctly using append and a variable to store the value. However, whenever I try, it results in an error. Let me provide you with the code snippet: <script type="text/javascript"> $(document).ready(funct ...

Invoking Node to utilize React/Webpack module code

Trying to figure out how to integrate client-side import/export modules into a Node.js require script within my custom NextJS webpack config: module.exports = { webpack: (config, options) => { if (options.isServer) { require("./some-scr ...

Create a request for an update by utilizing Axios within a React and Material UI environment

I am completely new to React, and my experience with CRUD functionality is limited to ASP.NET MVC. I'm feeling a bit lost as none of the tutorials I've come across seem to cater to my specific situation. The tips I received previously were helpfu ...

I would like to modify the text color of a disabled input field

I need to adjust the font color of V1, which is a disabled input field. I want to make it darker specifically for Chrome. Any suggestions on how I can achieve this? Here's my HTML code: <mat-form-field appearance="outline" fxFlex=" ...

Sustaining the Status of a Changeable Form Element

To fulfill the requirement of constructing a Form Builder component that generates forms based on JSON data from the backend, I have identified 7 types of input fields that may be encountered: email password select file date input of type text input of ty ...

Having trouble storing radio buttons and checkboxes in MySQL database using AJAX

My app is facing an issue where radio buttons and checkboxes are not correctly entering information into the database. Currently, only text fields are successfully saving data, while checkboxes and radio buttons are only recording the value of the first ra ...

Troubleshooting Test Failures: The importance of passing $controller in the callback of 'it' function in Angular

As a newcomer to testing, I am attempting to write Jasmine/Karma tests for a controller. Given a sample test to use as a starting point, the issue arises when passing the $controller in the argument of the it block. The test passes successfully with this s ...

Is it possible to utilize EmberJS or other frameworks without the necessity of setting up its server?

I am in search of a JavaScript framework that offers the following features: MV* Well-structured HTML file as template Fast rendering (possibly using virtual DOM) Ability to combine and be compatible with other plugins or libraries Edit on tablet IDE app ...

Choose a Range of DOM Elements

My challenge is to select a range of DOM elements, starting from element until element. This can be done in jQuery like this: (Source) $('#id').nextUntil('#id2').andSelf().add('#id2') I want to achieve the same using JavaScr ...

An error occurred when trying to pass JSON data to the view in the "orchard" framework: TypeError - e.slice is not a function

public ActionResult Grouping() { return View(); } public ActionResult Read([DataSourceRequest] DataSourceRequest request, string text) { var result = _auto.Table.ToList().Where(s => s. ...

Retrieving JSON data in Angular 2

There are limited options available on SO, but it seems they are no longer viable. Angular 2 is constantly evolving... I am attempting to retrieve data from a JSON file in my project. The JSON file is named items.json. I am pondering if I can achieve th ...

Is there a way to attach a hidden input to the file input once the jquery simpleUpload function is successful?

Attempting to add a hidden form field after the file input used for uploading a file through the simpleUpload call. Here is the HTML (loaded dynamically): <div class="col-md-6"> <div class="form-group"> ...

What is the best method for toggling a class to indicate the active tab in VueJS?

My goal is to set the active class on the first <li> tab by default, and then switch it to the second <li> tab when selected. I plan to use CSS to visually indicate which tab is currently active. If you would like to see the current output, ch ...

Highest Positioned jQuery Mobile Section

Requesting something a bit out of the ordinary, I understand. I currently have a jQueryMobile page set up with simplicity: <div data-role="page" class="type-home" id="home"> <div data-role="header" data-theme="b"> <h1>Our To ...

Executing ws.send from the controller in a Node.js environment

In my project, I am looking to send a websocket using express-ws from a different controller rather than through a route. In my server.js file, I have the following code: var SocketController = require('./routes/ws.routes'); var app = express(); ...

Steps for creating an AJAX request to a variable defined within the local scope

I want to create a list using a JSON object that I already have stored in a variable. I have been exploring the dynatable library and its documentation on populating a table using AJAX to receive JSON data. However, I am stuck on how to make it work with ...

Display refined outcomes on the search results page

In my app, the main feature is a search box on the homepage. Users can input their search queries and upon submission, they are redirected to a result page displaying the relevant results along with filtering options. The filtering functionality allows use ...

What is the best way to extract the text between the @ symbol and the next space using JavaScript or React?

As a beginner in programming, I am looking to extract the text following an @ symbol and preceding the next space entered by a user into an input field. For instance, consider the following input: somestring @user enters In this case, my goal is to cap ...