Preserving the information from a webpage by utilizing casperjs for scraping and saving table data

What is the most efficient way to preserve table data collected during a web scraping process with casperjs?

  1. Saving it as a file after serializing into a json object.

  2. Sending an ajax request to php and then storing it in a mysql database.

Answer №1

In my approach, I opt for the second scenario:

Initial step: Retrieve information from a globalInfo variable

var globalInfo;
casper.thenOpen("www.targetpage.cl/valuableInfo", function() {
    globalInfo = this.evaluate(function(){
       var domInfo = {};
       domInfo.title = "this is the info";
       domInfo.body  = "scrap in the dom for info";
       return domInfo;
   });
});

Next step: Navigate to a webpage to save the extracted data

casper.then(function(){
   casper.thenOpen("www.mipage.com/saveIntheDBonPost.php", {
      method: 'post',
      data:{              
          'title': ''+globalInfo.title,
          'body': ''+globalInfo.body
      }
   });
});

The URL

www.mipage.com/saveIntheDBonPost.php
processes the data using the $_POST parameter and stores it in a database.

Answer №2

To put it simply, think of CasperJS as a tool to gather data and then process it in another programming language. My recommendation would be to opt for the first choice - extract the data in JSON format and store it in a file for future analysis.

You can achieve this by utilizing the File System API offered by PhantomJS. Additionally, you can combine this with CasperJS's command-line interface to pass arguments to your script (such as specifying a temporary file for output).

The script to manage this process would involve:

  1. Generating a temporary file path (e.g., using `mktemp` on Linux).
  2. Executing your CasperJS script while passing the temporary file path as an argument.
  3. Retrieving the data, writing it to the specified file using the File System API, and concluding the script.
  4. Reading the file, performing tasks on the data (like storing it in a database), and deleting the temporary file afterward.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Ways to insert a line break using ajax

document.getElementById("msg").innerHTML += "<strike>b:</strike> "+ msgs[i].childNodes[1].firstChild.nodeValue; After retrieving the messages, I noticed that they are all displayed close to each other. Is there a way to display each message on ...

The issue of onClick failing to function when paired with the addEventListener for the

Looking into a react component for a profile button that opens a menu with three options: My Profile, Settings, and Logout. The issue seems to be with the onClick event on the a tags not working as expected (the console.log is not being printed). Interes ...

What is the process of transforming an xhr request into an angular $http request?

I've been successfully loading images using ajax with the following code. However, when trying to convert it into Angular and use $http, it's not working as expected. Original ajax code var xhr = new XMLHttpRequest(); xhr.open('GET', ...

Arranging images next to each other using CSS and HTML

I've been trying to arrange four images side by side, with two on the top row and two on the bottom. I want to ensure they stay consistent across all browser sizes except for mobile. Here is what I have attempted so far: #imageone{ position: absol ...

Tips for toggling the appearance of like and add to cart icons

I am attempting to create a simple functionality for liking and adding items to a cart by clicking on the icons, which should immediately change the icon's color when clicked. However, I am facing an issue where the parent div's link is also bein ...

Ways to access properties beyond the top-level object

I'm encountering issues with JSON parsing using the Jackson library { "userName": "<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a5c7c9c4c7c9c4c7c9c4e5c2c8c4ccc98bc6cac8">[email protected]</a ...

Why is it not possible to substitute table names in PDO prepared statements?

Here is an example that works correctly: $statement = $database->prepare('SELECT COUNT(1) WHERE EXISTS (SELECT * FROM ' . $table_name . ')'); $statement->execute(array()); However, the following code throws a syntax error: $state ...

Is there a way to extract the unicode/hex representation of a symbol from HTML using JavaScript or jQuery?

Imagine you have an element like this... <math xmlns="http://www.w3.org/1998/Math/MathML"> <mo class="symbol">α</mo> </math> Is there a method to retrieve the Unicode/hex value of alpha α, which is &#x03B1, using JavaScrip ...

Implement the Pydantic V2 JSON serialization logic within any class of your choosing

Currently, I am in the process of upgrading a Pydantic v1 codebase to Pydantic V2. The new version of Pydantic offers various methods for creating custom serialization logic for arbitrary Python objects (instances of classes that do not inherit from base P ...

Removing duplicate values in Vue after sorting

Explore <div v-for="todo in sortedArray"> <b-button block pill variant="outline-info" id="fetchButtonGap" v-model:value="todo.items[0].arrivalTime"> {{fromMilTime(todo.items[0].arrivalTime)}} < ...

Facing a Type Error while trying to utilize the Ember Embedded Records Mixin in conjunction with keyForAttribute

I've been struggling to deserialize data in Ember for a while now. Despite setting everything up correctly, I keep encountering the same error. I attempted to implement the EmbeddedRecords Mixin, but unfortunately, it hasn't been successful. Belo ...

Managing Data Types in AJAX Requests

Can you help me figure out why my AJAX call is not reaching success after hours of troubleshooting? It seems like the issue lies in the dataType that the AJAX call is expecting or receiving (JavaScript vs JSON). Unfortunately, I'm not sure how to addr ...

Angular implementation of a dynamic vertical full page slider similar to the one seen on www.tumblr

I'm determined to add a full-page slider to the homepage of my Angular 1.x app After testing multiple libraries, I haven't had much luck. The instructions seem incomplete and there are some bugs present. These are the libraries I've experi ...

React is failing to display identical values for each item being mapped in the same sequence

I have implemented some standard mapping logic. {MEMBERSHIPS.map((mItem, index) => ( <TableCell className="text-uppercase text-center" colSpan={2} padding="dense" ...

Setting up a recurring task with a while loop in a cron job

Discover numerous libraries dedicated to implementing cron jobs in NodeJS and Javascript, allowing for hosting on a server. Ultimately, cron jobs are simply repetitive tasks set to run at specific times/dates. This led me to ponder the distinction betwee ...

The element "Footer" cannot be found in the file path "./components/footer/Footer"

After attempting to run the npm start command, I encountered the following error. In my code, I am trying to import the file /components/footer/Footer.js into the file /src/index.js //ERROR: Failed to compile. In Register.js located in ./src/components/r ...

A guide to submitting forms within Stepper components in Angular 4 Material

Struggling to figure out how to submit form data within the Angular Material stepper? I've been referencing the example on the angular material website here, but haven't found a solution through my own research. <mat-horizontal-stepper [linea ...

Transform yaml strings into JSON entities

We are facing a challenge with two separate codebases that have different localization styles. One codebase uses yaml, while the other uses JSON. Currently, we are working on transitioning to the JSON-based codebase. However, with 20k yaml strings and sup ...

"Is there a way to retrieve the props that have been passed down to a

I am looking to have custom props created in the root layer of my React app: import React from 'react' import App, { Container } from 'next/app' export default class MyApp extends App { static async getInitialProps({ Component, rout ...

The $OnChange function fails to activate when passing an object by reference

Hi there, I've encountered a problem in my code that I'd like some help with. In my example, I have two components: Parent Component and Child Component. Both components share a field called rules. The Parent Component passes the rules field to ...