What is the best method for extracting JSON objects from an HTML page after executing a JS script with PhantomJS, and then transferring them to Java code?

I have been utilizing a JavaScript script code as mentioned in this specific answer. However, my goal is to avoid saving the resulting HTML page into an HTML file. Instead, I am looking to extract a JSON object from the <div class="rg_meta"> and transfer it to Java code.

During my search, I came across the use of "document", but I encountered an undefined error. I am relatively new to PhantomJS and the manipulation of JSON in Java.

var page = require('webpage').create();
var fs = require('fs');
var system = require('system');

var url = "";
var searchParameter = "";
var count=0;

if (system.args.length === 4) {
    url=system.args[1];
    searchParameter=system.args[2];
    count=system.args[3];
}

if(url==="" || searchParameter===""){
    phantom.exit();
}

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';

page.zoomFactor = 0.1;

page.viewportSize = {
  width: 1920,
  height: 1080
};

var divCount="-1";
var topPosition=0;
var unchangedCounter=0;


page.open(url, function(status) {
console.log("Status: " + status);
if(status === "success") {

    window.setInterval(function() {

        var newDivCount = page.evaluate(function() { 
            var divs = document.querySelectorAll(".rg_di.rg_bx.rg_el.ivg-i");
            return divs[divs.length-1].getAttribute("data-ri");
        });

        topPosition = topPosition + 1080;

        page.scrollPosition = {
            top: topPosition,
            left: 0
        };

        if(newDivCount===divCount){
            page.evaluate(function() {
                var elems=document.getElementByClassName("rg_meta");
                console.log(elems.length);
                var button = document.querySelector("#smb");
                if(!(typeof button === "undefined")) {
                    button.click();
                    console.log('Clicked');
                    return true;
                }else{
                    return false;
                }
            });

            if(parseInt(unchangedCounter,10) === parseInt(count,10)){
               /* var path = searchParameter+'.html';
                fs.write('seedHtml/'+path, page.content, 'w');
                console.log('printing html');*/
                phantom.exit();
            }else{
                unchangedCounter=unchangedCounter+1;
            }
        }else{
            unchangedCounter=0;
        }
        divCount = newDivCount;

    }, 500);
}else{
    phantom.exit();
}
});

Answer №1

Exploring HTML5 Data Attributes

Thankfully, HTML5 has introduced a feature called custom data attributes.

<div id="msglist" data-user="bob" data-list-size="5" data-maxage="180"></div>

Custom data attributes:

can store any string encoded data, including JSON. JavaScript must handle any necessary type conversions. should be utilized only when no appropriate HTML5 element or attribute exists.

Using JavaScript to Access Data #1:

All browsers allow you to fetch and modify data- attributes using the getAttribute and setAttribute methods, like so:

var msglist = document.getElementById("msglist");

var show = msglist.getAttribute("data-list-size");
msglist.setAttribute("data-list-size", show+3);

While functional, this approach is best used as a fallback for older browsers.

Using JavaScript to Access Data #2:

Since version 1.4.3, jQuery’s data() method has been able to parse HTML5 data attributes. The data- prefix is no longer required, making the code cleaner:

var msglist = $("#msglist");

var show = msglist.data("list-size");
msglist.data("list-size", show+3);

Hope this information proves useful!

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

I am having trouble getting the bootstrap link and css files to work on a specific URL. Can you please help me troubleshoot this issue and let me know if there are any additional files needed to

When attempting to apply the bootstrap link and css files to the URL "/list/:customListName", they are not working. However, changing the URL to "/:customListName" somehow makes it work. What is the reason behind this behavior and how can I properly style ...

What is causing ngdocs to produce zero files?

I have created a basic project to experiment with grunt-ngdocs (https://www.npmjs.org/package/grunt-ngdocs). But, for some reason, when I attempt to generate documentation, it fails to recognize any comments. Why is this happening? Can someone offer assist ...

Issue encountered at 'site construction' phase: failure in build script with exit code 2

I am a novice and struggling to solve this error. I have attempted to fix it by adjusting the deploy settings, but I can't seem to resolve this compilation error. The syntax errors are overwhelming, and I'm not sure if there is something crucial ...

Ways to effectively unit test the JSON exception and ensure full coverage of the catch block

Below is an excerpt from the code I am focusing on, particularly the catch block. JSONObject dataObject = new JSONObject(); String content = null; try{ content = dataObject.put(DecisionTreeConstants.EDGE_ACCOUNT_STATUS, processStatusPretty).toString( ...

Is it possible to swap out images using srcset attribute?

Currently, I am facing an issue regarding changing the img on mobile and tablet devices to display different images. As a beginner with jQuery, I couldn't resolve it using that framework, so I am exploring options with html5. I am wondering if there ...

Can MooTools be used for asynchronous file uploads?

Currently, I am working on file uploading using asp.net: <asp:FileUpload ID="Upload" runat="server" /> <!-- HTML --> Upload.PostedFile.SaveAs(physicalPath + "newAvatarTemp.png"); // codebehind However, I find it frustrating when pages need to ...

The callback function in AngularJS' $http is failing to trigger

$scope.submitNewUser = function() { $http({ method: 'POST', url: 'api/user/signup', data: {'user': $scope.user}, headers: {'Content-Type': ...

Having issues with importing momentjs by reference in TypeScript with amd configuration

I'm puzzled by the difference in behavior between these two snippets: import * as moment from "../Typings/moment"; One works, while this one doesn't: /// <reference path="../Typings/moment.d.ts" /> import * as moment from "moment"; It t ...

Unable to retrieve data from the database within PHP code

I have successfully built a shopping cart website utilizing SQL, HTML, and PHP. Below is the code snippet for the 'Add to Cart' button: <form method="post" action="cart.php" class="form-inline"> <input type="hidden" value="&apos ...

How to generate malformed JSON using Newtonsoft.Json - Is it possible to permit invalid objects?

Intentionally creating invalid JSON using Newtonsoft Json to incorporate an ESI include tag, which will retrieve two additional json nodes. This is the WriteJson method of my JsonConverter: public override void WriteJson(JsonWriter writer, object value, ...

Transform HTML content into a PDF document with page breaks

Currently, I am developing a function that involves an HTML template. The purpose of this function is to generate a dynamic template and convert it into a PDF. So far, I have been able to achieve this using the following code: var output = ''; ...

Tips for restoring lost data from localStorage after leaving the browser where only one data remains

After deleting all bookmark data from localStorage and closing my website tab or Chrome, I am puzzled as to why there is still one remaining data entry when I revisit the site, which happens to be the most recently deleted data. This is the code snippet I ...

The issue with calling Ajax on button click inside a div container is that the jQuery dialog box is

Here is the code for my custom dialog box: $("#manageGroupShow").dialog({resizable: false, draggable: false, position:['center',150], title: "Manage Group", width:"50%", modal: true, show: { effect:"drop", duration:1000, direction:"up" }, hide: ...

Traverse Through Nested JSON Data and Display it in an HTML Table using Vue

Struggling to find a way to loop through my data, I have JSON data presented like this: [ { "STATUS":"CUTTING INHOUSE", "STID":"1", "CATS":[ { "CAT":"ORIGINALS ", "ARTS":[ { "ARTNO":"GY8252", ...

Create a new object by extracting JSON data using JavaScript

I am attempting to extract various data elements from a JSON file and place them into an object. My ultimate goal is to then convert this object back to JSON format, containing only the desired data. I believe that structuring the object like this might w ...

Creating animated direction indicators in the "aroundMe" style with ngCordova

I am attempting to recreate a compass or arrow similar to the one featured in the AroundMe Mobile App. This arrow should accurately point towards a pin on the map based on my mobile device's position and update as I move. I have been struggling to fi ...

How does React retain and display the previous values even after they have been updated?

https://codesandbox.io/s/objective-night-tln1w?file=/src/App.js After updating the data in the dropdown, the console displays the correct values. However, the dropdown itself continues to show the previous values. It seems that there may be an error relat ...

Fetching Data Using Asynchronous API Calls

My goal is to retrieve all results consistently from the API, but I am encountering varying outcomes. The for loop seems to be skipping some requests and returning a random number of records. Can anyone provide assistance? I have experimented with using t ...

What is the best way to link my "dynamic sub-component" with AngularJS?

While I have a solid grasp on the fundamentals of AngularJS and can create basic CRUD applications, when it comes to applying this knowledge to real-world scenarios, I find myself struggling with how to integrate the concepts effectively. One specific cha ...

When attempting to modify an element in an array within a state-managed object, the input field loses focus

In attempting to address my issue, I have crafted what I believe to be the most concise code example. The main goal is to display a table on the page populated with exercise data retrieved from a database. This data is then assigned to an array of objects ...