JavaScript for Web Scraping

I am attempting to use JavaScript to scrape the contents of a multi-page website and then export it to an excel or CSV file.

The issue I am facing is that I can only scrape the first page and I am struggling to export it to excel or CSV format.

Below is the code I have written thus far:

const PORT =8000
const axios = require('axios')
const cheerio = require('cheerio')
const express = require('express')

const app = express()
const url = 'https://www.taneps.go.tz/epps/viewAllAwardedContracts.do?d-3998960-p=1&selectedItem=viewAllAwardedContracts.do&T01_ps=100'
axios(url)
 .then(response => {
    const html = response.data
    const $ = cheerio.load(html)
    const articles = []
    $('#T01',html).each(function(){
        const contract = $(this).text()
        articles.push({
            contract
        })
        
    })
    console.log(articles)
   
 }).catch(err => console.log(err))



app.listen(PORT,() => console.log(`Server listening on port ${PORT}`))

I am seeking a way to scrape all pages and save the data into a CSV or Excel file.

Answer №1

Below is a potential resolution:

import axios from "axios"
import {load} from "cheerio"
import fs from "fs"


const convertDataToCsv = (arr) => {
    const array = [].concat(arr)
    return array.map(el => {
        return Object.values(el).toString()
    }).join('\n') + '\n'
  }

const fetchData = async (page) => {
    try {
        const response = await axios.get(`https://www.taneps.go.tz/epps/viewAllAwardedContracts.do?d-3998960-p=${page}&selectedItem=viewAllAwardedContracts.do&T01_ps=100`)
        const html = response.data
        const $ = load(html)
        const data = []
        $('#T01>tbody>tr').each((_idx, el) => {
            const tender_no = $(el).find('td:nth-child(1)').text()
                .replace(/(\s+)/g, '')
                .replace(/,/g, '.')
            const procuring_entity = $(el).find('td:nth-child(2)').text()
                .replace(/(\s\s+)/g, '')
            const supplier_name = $(el).find('td:nth-child(3)').text()
                .replace(/(\s\s+)/g, '')
            const award_date = $(el).find('td:nth-child(4)').text()
                .replace(/(\s\s+)/g, '')
            const award_amount = $(el).find('td:nth-child(5)').text()
                .replace(/(\s\s+)/g, '')
            data.push({
                "Tender No": tender_no, 
                "Procuring Entity": procuring_entity, 
                "Supplier Name": supplier_name, 
                "Award Date": award_date, 
                "Award Amount": award_amount
            })
        });
        return data
    } catch (error) {
        throw error;
    }
};

for (let i = 1; i <= 100; i++) {
    fetchData(i).then((data) => {
        console.log(`Page number: ${i}`)
        fs.appendFileSync("taneps.csv", convertDataToCsv(data))
    })
}

Resulting csv file taneps.csv

PA/009/2021-22/HQ/G/19-,Muhimbili National Hospital,BADRA ATHUMAN NDULLAH,11/10/2021 11:29:36,451350.00(TZS)
PA/058/2021-2022/G/22,Mkwawa University College of Education,IANMAC TECHNOLOGIES,12/11/2021 17:03:52,1413000.00(TZS)
PA/055/2021-2022/HQ/NC/08,Institute of Social Work,LUMINA INVESTMENTS LIMITED,11/11/2021 08:02:03,2343480.00(TZS)

Tested on Node v16.15.0 Utilized axios v1.1.3 and cheerio v1.0.0-rc.12

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

React and SASS - issue with checkbox component not being properly aligned with its label

I'm brand new to React and I'm currently in the process of converting a pure HTML page into a React component. How can I adjust the SASS stylesheet to match the original HTML layout? Here is my current React setup (the checkbox displays on the r ...

Issue with Material UI components: The Select component is collapsed and the autoWidth functionality is not

The Material UI (React) Select component is not expanding in width as expected, even with the autoWidth property. https://i.sstatic.net/h3H0V.png <FormControl margin="dense"> <InputLabel id="prefix-label">Prefi ...

Maintain scrolling at the bottom with React.js

Is there a way to make a div element increase in height through an animation without extending beyond the viewable area, causing the window to automatically scroll down as the div expands? I am looking for a solution that will keep the scroll position lock ...

Strategies for enhancing jQuery performance using the find() method

I have a form with an id="test", consisting of various select, input, and textarea fields. My goal is to iterate through each field, check if it's empty, and perform an action accordingly. var editTest= $('#test'); editGeneric(editTest); ...

I have a challenging JavaScript task in mind - extracting specific elements from a list and concealing the others

Currently, I'm pushing myself to achieve more than what I currently know how to do! ...

Learn how to combine pie and bar charts using Highcharts. Discover how to efficiently load JSON data and understand the different ways

I'm feeling a bit lost when it comes to loading json data into the Highcharts combo pie/bar chart. Below is an example code that's a work in progress. I just need some help understanding how to load the json and structure the data series correctl ...

What could be causing the React state not to update properly?

Having an issue with my Alice-Carousel in react. I'm fetching items from an API and updating the array for the carousel, but the value of items keeps coming back as undefined. Using CryptoState context to avoid prop drilling. import React from 'r ...

Modify the content within the <h2> Tag that needs updating with JavaScript

Looking for guidance on how to update the text within an HTML element using JavaScript. Any suggestions? The current setup is as follows: <h2 id="something">Text I want to change.</h2> I attempted to achieve this with: document.getElemen ...

Using a custom jQuery function within an Angular component class

I have a custom query function that I wrote in a JavaScript file located under the source folder (/src/assets/inlineedit.js) of my Angular application. Here is the content of the file: $.fn.inlineEdit = function(replaceWith, connectWith) { $(this).ho ...

"Troubleshooting a case where mongoDB's updateOne function is

I am in the process of removing certain references to objects within different sections (section1, section2, section3, others) in each file. Sample Document: { "_id": "64a3268474aa29e72b40c521", "name": "Test", ...

Add Text to HTML and Delete Added Content Upon Button Click

I have successfully implemented code that appends new content to the div "users". However, I am facing an issue with removing the appended content when a user clicks a button. Currently, the code only removes the "remove" button upon clicking. But I need ...

Executing iterations within the document.ready function using jQuery and d3.js

Here is how my data is structured: var IDData = JSON.stringify([["node/105173", "node/38180995", "Agent", "Customer", "1379644.0", 1, 264.0, "1374903"]...] Each row in the data follows the same format, although the length of the array arrays can vary. I ...

Guide on how to retrieve additional data from the API by pressing the "Load More" button

Hello, I am working on a project where I aim to display user data from an API called https://reqres.in/api/users?page=(the page number can be 1,2 or more) and present it in an HTML table using JavaScript with promises. Currently, I have successfully popula ...

Steps to invoke a function repeatedly for an animation

I found this code snippet while browsing a forum post about CSS animations. The question asked if it was possible to create a button that would restart the animation when clicked, even if it is in the middle of playing. They specifically requested no jQu ...

Testing an asynchronous generator function in Jest using unit tests

I need help writing a unit test for a generator function where I am struggling to properly mock a read stream object (ReadStream). Here is the function I'm trying to test: public async *readChunks(file: string, chunkSize: number): AsyncIterableIter ...

What is the optimal method for storing images on a website built with expressjs and mongodb?

I am currently diving into express, nodejs, and mongodb for a project I'm working on - creating a website for a clothing store. The client will be uploading new images every 2 days, which need to be displayed quickly as thumbnails on the website. User ...

Vanilla JS causing event to fire repeatedly

My code is written in Vanilla JS and I've encountered an issue with dynamically generated content. Each click on the dynamic content returns a different value than expected: First click: 1 Second click: 3 Third click: 6 Fourth click: 10 But it sh ...

Granting entry to the sub directory while utilizing Ghost

I want to organize a collection of individual pages to serve as examples that I can refer to in my Ghost blog. For example, if I am writing a tutorial on web development, I would like to include a link to a demo showcasing the final product. Therefore, if ...

using express in a NodeJs application to send a GET request

Within my server.js file, the following code is present: var fileSystemPath = path.join(__dirname, '../variable/public'); app.use('/VC', express.static(quickstartPath)); In addition to that, there is: app.get('/', function( ...

Vertically align the left div in the center, while maintaining the standard alignment of the right div

I have been experimenting with Bootstrap 4 and attempting to implement the Vertical alignment feature as outlined on the webpage: https://getbootstrap.com/docs/4.0/layout/grid/ Despite reviewing my code multiple times, I am unable to identify the mistake ...