Scraping JSON output from a table using a CasperJS loop

I am currently working on extracting data from a website using CasperJs. The information I need is stored in a table, and my goal is to generate a valid JSON file after scraping the site. The JSON file should include the company name, email address, website URL, and a brief description of the company's activities.

So far, I have managed to navigate to the webpage and extract some data, but I'm facing an issue where the email and website information are combined in one field. After doing some research, I learned how to select specific elements for extraction. However, I'm only able to retrieve the details from the first row of the table.

If anyone could provide guidance on how to iterate through all the rows or help me create a loop in this scenario, it would be greatly appreciated. Please keep in mind that I am not a professional developer; I am learning as I go.

Below is a snippet of my code:

insert code here...

Currently, the JSON output repeats the information from the first row because there is no loop involved. To capture data from every row, you can replace:

old code here...

with

new code here...

However, using this new code will result in capturing all the information from each row without targeting specific elements like H3 tags or links.

  • I can loop through the rows to extract information, but the results are messy

  • I can only retrieve details from the first row, but the presentation is clean

Thank you in advance for any assistance provided!

Answer №1

If you want to extract data from each element, it is recommended to utilize tr.querySelector instead of document.querySelector.

Here is a loop that effectively operates on the page:

var table_rows = document.querySelectorAll("tbody tr"); //or a more targeted selector
return Array.prototype.map.call(table_rows, function(tr) {
    return {
        name: tr.querySelector(".td-width h3").textContent,
        description: tr.querySelector(".td-width p").textContent,
        email: tr.querySelector('td span a[href^="mailto"]').textContent,
        website: tr.querySelector('td span a:not([href^="mailto"])').textContent
    };
});

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What could be causing this componentDidMount() to be invoked multiple times?

At the initial stage of a project, I set up a boilerplate React project using an npx command. Currently, I am fetching JSON data that contains information about various apps to be rendered on the screen. However, it seems like the data is being rendered mu ...

Unable to hear sound properly through Web Audio

I'm experimenting with playing a wav file using the AudioContext. I've noticed that it plays correctly when loaded with the <audio> tag (as demonstrated in this example on jsFiddle), but encounters issues when using AudioContext. var startB ...

the language of regular expressions expressed in strings

As a beginner in Javascript and regular expressions, I found myself stuck on how to create a route that matches all URLs starting with /user/.... Initially, I thought of using app.get(/user/, function(req, res){ /*stuff*/}); However, curiosity led me to ...

Using jQuery to alter the class of div elements in a specific sequence

Currently, I am working on a blog layout using Bootstrap and have encountered an issue. The specific layout requirement is to have the first two divs as col-md-3 (NORMAL), followed by the next 2 divs as col-md-6 (WIDE), then repeating this pattern for the ...

Troubleshooting JSON Multidimensional Problems with Codeigniter Rest Server

I am utilizing the Codeigniter rest server to establish an API. One of my clients is transmitting the following JSON array to my API: { "code": "TEST", "store": "DBNG0024", "total": "50.00", "items": [{ "code":"121", "descr":"Pizza ...

Automatically Access a JS/CSS File in the Developer Tools 'Sources' Section

I am aware that I can customize my own Panel in the Chrome Developer Tools, but I am curious if there is a way to click a button within my panel and have the Developer Tools open a particular script or stylesheet in the 'Sources' panel related to ...

Need help adding key:value pairs to a JavaScript dictionary? Here's how!

I am facing an issue with my JavaScript code. I have two HTML text input elements and a paragraph element. In my script, I have an empty JavaScript array and a function that is supposed to push the values of these two input elements as key-value pairs into ...

What is the method for reviewing the logs in Phpmyadmin?

Can logs (error logs, process logs) be checked in Phpmyadmin without using the console history option? Is it possible to view recent changes made to tables or databases in Phpmyadmin and identify who made those changes? ...

Using the jQuery unbind method allows the function to execute only once, however, the event will not be

On my main page, I have a radio button that passes a value via Ajax when clicked. The result is then checked by Ajax and the corresponding output is displayed for each question as correct/incorrect. I am facing an issue with the unbind event - it works fin ...

Error: Module not located in Custom NPM UI Library catalog

I have recently developed an NPM package to store all of my UI components that I have created over the past few years. After uploading the package onto NPM, I encountered an issue when trying to use a button in another project. The error message "Module no ...

Ways to verify if fields within an Embedded Document are either null or contain data in Node.js and Mongodb

I need to verify whether the elements in the Embedded Document are empty or not. For instance: if (files.originalFilename === 'photo1.png') { user.update( { userName: userName ...

What is the best way to establish a connection between the same port on Expressjs and Socket.io

I am currently using Express.js and Socket.io to develop a chat application. Initially, I created my project with Express-Generator and began by running the node ./bin/www script. However, I decided to remove the ./bin/www file and instead combined it wit ...

Navigating the Angular Element: A Guide to Clicking Buttons within Modal-Dialogs Using Protractor

I am currently creating an automation test for an angular application using the protractor framework. Test scenario: Click on the "Create PDF Report" button A modal-dialog window will appear Click on the "Run Report Now" button within the modal-d ...

Is there a way to consistently trigger the browser.webRequest.onBeforeRequest event in Mozilla Firefox when it is launched via a link?

Hello knowledgeable individuals. I am unable to solve this issue on my own. Here is the add-on I have created: 1) manifest.json: { "manifest_version": 2, "name": "Example", "version": "1.0", "description": "Example", "permissions": [ "tabs" ...

Could someone provide clarification on this particular line of Apex code for me?

I'm completely new to Oracle Apex and I could use some guidance in understanding the code snippet I came across in a tutorial about app creation. After digging around, I suspect it might be JavaScript, but I'm not entirely certain. The scenario ...

Eliminating blank or unspecified elements within an array

I'm struggling to remove empty or undefined elements from an array. Here's the code I've tried: function clean(item) { for (var i = 0; i < item.length; i++) { if (item[i] === undefined || item[i] == "") { item.spl ...

What is the correct way to format a JSON string in Java?

Currently, I am dealing with a jersey client that retrieves JSON data from a source, but I am having trouble parsing it into properly formatted JSON. The JSON string I receive through an HTTP request appears like this: { "properties": [ { ...

Is there a way to extract the numerical value from a string within an ID of a div and transform the rest of the string into a variable?

Looking to extract the final id of a div and convert it to a variable. <div class="margin_bot" id="itemRows2"> <p id="rowNum1">...</p> <p id="rowNum2">...</p> <p id="rowNum3">...</p> <p id="rowNum4"> ...

Encountering TypeError while manipulating JSON data

Update: I discovered that my issue stemmed from the fact that my Json had already been converted to a dictionary earlier without my knowledge. Additionally, I mistakenly used json.dumps() instead of .loads(). Initial Question: I have a segment of Json cod ...

Samsung S4 Android device experiencing interruption in HTML5 video playback

When using Android Webview to play html5 videos, including Youtube videos (using my own tags and Youtube embedded iFrames), I came across an issue with the Samsung Galaxy S4. The problem occurs in the following scenario: Play a video. Press 'back&ap ...