Using RegEx in Google Apps Script to extract HTML content

Currently, I am working with Google Apps Script and facing a challenge. My goal is to extract the content from an HTML page saved as a string using RegEx. Specifically, I need to retrieve data in the following format:

<font color="#FF0101">
        Data that needs to be extracted
</font>

I am seeking guidance on which RegEx pattern to use for extracting data enclosed within <font> tags (both opening and closing). It is important to note that I only want to extract data from tags that include the specified color attribute and value as indicated in the code snippet above.

Answer №1

Forget about struggling with RegEx to parse HTML - Google Apps Script's XmlService can handle well-formed HTML text interpretation.

function myFunction() {
  var xml = '<font color="#FF0101">Data which is want to fetch</font>';
  var doc = XmlService.parse(xml);
  var content = doc.getContent(0).getValue();
  Logger.log( content );  // "Data which is want to fetch"
  var color = doc.getContent(0).asElement().getAttribute('color').getValue();
  Logger.log( color );    // "#FF0101"
}

Answer №2

JavaScript is a powerful tool, so there's no need to resort to using regex for HTML parsing.

var container = document.createElement('div');
container.innerHTML = "Insert your HTML content here";

var results = container.querySelectorAll("font[color='#FF0101']");
// Iterate through the `results` and extract desired information
// For example: results[0].textContent.replace(/^\s+|\s+$/g,'')

Answer №3

If JavaScript had full support, a DOM-based solution could be implemented.

var html = "<font color=\"#FF0202\">NOT THIS ONE</font><font color=\"#FF0101\">\n        Data which is want to fetch\n</font>";
var faketag = document.createElement('faketag');
faketag.innerHTML = html;
var arr = [];
[].forEach.call(faketag.getElementsByTagName("font"), function(v,i,a) {
    if (v.hasAttributes() == true) {
      for (var o = 0; o < v.attributes.length; o++) {
        var attrib = v.attributes[o];
        if (attrib.name === "color" && attrib.value === "#FF0101")     {
       arr.push(v.innerText.replace(/^\s+|\s+$/g, ""));
        }
      }
    }
});
document.body.innerHTML = JSON.stringify(arr);

However, as per the GAS reference:

Apps Script code runs on Google's servers and does not support browser-based features like DOM manipulation or the Window API.

To extract inner text of <font color="#FF0101"> tags, regex can be used:

function myFunction() {
  var doc = DocumentApp.getActiveDocument();
  var paras = doc.getParagraphs();
  var MyRegex = new RegExp('<font\\b[^<]*\\s+color="#FF0101"[^<]*>([\\s\\S]*?)</font>','ig');
  for (i=0; i<paras.length; ++i) {
    while (match = MyRegex.exec(paras[i].getText()))
    {
      Logger.log(match[1]); 
    }
  }
}

The regex matches any font tag with color attribute set to #FF0101. Regex may not be perfect for HTML parsing, consider using more reliable techniques.

<font\\b[^<]*\\s+color="#FF0101"[^<]*>([^<]*(?:<(?!/font>)[^<]*)*)</font>

To handle HTML data spread across multiple paragraphs:

function myFunction() {
  var doc = DocumentApp.getActiveDocument();
  var text = doc.getBody().getText();
  var MyRegex = new RegExp('<font\\b[^<]*\\s+color="#FF0101"[^<]*>([\\s\\S]*?)</font>','ig');
  while (match = MyRegex.exec(text))
  {
     Logger.log(match[1]); 
  }
}

Given this input:

<font color="#FF0202">NOT THIS ONE</font>
<font color="#FF0101">
         Data which is want to fetch
</font>

The result would be:

https://i.sstatic.net/ebDcZ.png

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What is the process for saving information to a database with JavaScript?

I am currently utilizing the Google Maps API for address translation, primarily through the use of a geocoder. I am interested in saving these results to a local database for future reference, as there are limitations on the total number and frequency of ...

Is it possible for Tinymce to provide me with precise HTML content that retains all styles (essentially giving me a true WYSIWYG

I find it puzzling how Tinymce is labeled as a WYSIWYG editor when what I see visually is not exactly what I get when I retrieve the HTML using getContent(). It seems more like "what you see is just what you see." Currently, when I use getContent() to get ...

Error: The function named 'setValues' has already been declared

Looking for some guidance with an error message that involves the 'setValues' declaration in my code. I've come across similar questions and answers, but I'm still unsure about what changes I need to make. Your input would be highly app ...

Client component in Next.js is automatically updated upon successful login

I am currently working on a Next.js + DRF website that requires authentication. I have set up my navbar to display either a "log in" or "log out" button based on a boolean prop passed from the server side to the client-side: export default async function R ...

In AngularJS, encountering difficulties when trying to append an object to the end of the scope due to persistent data updates

Whenever a user submits a form, the fields are stored in a variable called $scope.params. In order to keep track of all submitted data, I am attempting to save them in an object named $scope.history. My current approach involves using the following code: ...

Issue [ERR_MODULE_NOT_FOUND]: The module 'buildapp' could not be located within buildserver.js

I am currently working on a node project using typescript. The project's structure is organized in the following way: --src |-- server.ts |-- app.ts --build |-- server.js |-- app.js In server.ts: import { app } from &q ...

I am having trouble modifying the content of a div using Jquery append

I have a plan to use jQuery to change images. I have multiple image files named choose 01.png, choose 02.png, and so on. When an image is clicked, I want it to be replaced with the corresponding choose 01.png file. Once 5 images have been clicked and chang ...

Regular expression: Validate in PHP (on the server-side) or JavaScript (on the client-side)

I am working on a form that validates user input using PHP, JavaScript, and AJAX. I plan to use regex to validate each field, but I'm unsure about which method is best for checking it... In your experience, do you recommend using JavaScript or PHP fo ...

The module at 'D:Education odemonin odemon.js' could not be located by Node

I am just starting out with NodeJS My setup is on Windows 11 64 Bit system. Node, Nodemon (Global installation), and NPM are all properly installed and operational. However, when I execute the command npm run server It results in the following erro ...

Error Encountered: Unable to Locate Node Modules on Ubuntu Version 20.04.5

I am a Windows 10 user who utilizes WSL. Initially, I was working with Ubuntu 20.04.5 from the Microsoft Store and running node commands smoothly. However, my attempt to upgrade to Ubuntu Jammy (22.x) resulted in errors when trying to use nodejs after inst ...

What are some methods to make sure that functions in AngularJS do not run at the same time

I am new to the world of JavaScript and I have encountered a problem with one of my functions. The function is designed to retrieve cached resources, but if the resource is not found in the cache, it makes a call to the back-end. The issue arises when thi ...

Launch a new email window in Outlook from a server using C#

Currently, I am working on a feature where users can select multiple product links and have the option to email those links to someone. The functionality works perfectly fine on my local machine, but when deployed, it throws an exception as shown below: ...

Can you examine words for similarities, and should we also search for instances of plurals and -ing

I need to compare two lists of words: LIST1 and LIST2. My goal is to identify duplicates, including the plural form and the "-ing" form as well. For example: If LIST1 contains the word "account" and LIST2 contains the words "accounts, accounting", the com ...

"Regarding compatibility with different browsers - IE8, Firefox3.6, and Chrome: An inquiry on

Snippet of JavaScript Code: var httprequest = new XMLHttpRequest(); var time = new Date(); if (httprequest) { httprequest.onreadystatechange = function () { if (httprequest.readyState == 4) { alert("OK"); } }; httprequest.open("GET", ...

Ways to extract innerHTML content from a loaded element generated by using the .load() method

Check out the code snippet below: template.html : <div id="nav"> Welcome to Space </div> layout.html : <div id="content"> </div> $('#content').load("template.html #nav"); ale ...

Begin the search process with one click using the jQuery Autocomplete feature for Ajax

I'm encountering some issues with the Ajax autocomplete feature in jQuery. You can access the search function at this link: . On that website, there is a search form input field. When you type "top" into the search bar, you'll notice keywords ...

What causes Post back to be triggered upon closing the page but not when the user navigates away from it?

My code is set up to detect user navigation or closing of the page. It triggers a post back on close, but not when the user navigates away. Strangely, if I include an alert message, it is triggered at all times. Also, there is a timer on the page that is ...

What could be causing my Mocha reporter to duplicate test reports?

I've created a custom reporter called doc-output.js based on the doc reporter. /** * Module dependencies. */ var Base = require('./base') , utils = require('../utils'); /** * Expose `Doc`. */ exports = module.exports = ...

Add up the duplicate elements in two arrays

I have dynamically created two arrays with the same number of cells (where array.length is the same, representing a key and value association). Below are the arrays: barData.labels["Food", "Food", "Food", "Food", "Food", "Food", "Food", "Food", "Food", "F ...

Avoid unnecessary renders by only updating state if it has changed from the previous state

Is there a specific React lifecycle method that can trigger a re-render only when the updated state differs from the previous state? For instance, consider the code snippet below: class App extends Component { constructor() { super(); this.state ...