Struggling with capturing the start and end HTML tags in a JavaScript array

I am presented with this JavaScript array:

let a = [
    [0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "],
    [1, "<strong>"],
    [0, "the"],
    [1, "</strong>"],
    [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "],
    [-1,"and"],
    [1, "test"],
    [0, " scrambled it to make a type"],
    [1, "  added"],
    [0, "</p>"],
    [1, "<ul><li>test</li></ul>"]
];

The task at hand involves extracting specific groups from the array based on certain conditions:

Consider a subarray from the array provided as an example:

[1, "<strong>"],
[0, "the"],
[1, "</strong>"]

This subsection qualifies as a group if a[0] == 1 and a[1] represents the start of an HTML tag. Since a[1] contains <strong>, indicating the beginning of a valid HTML tag, we aim to extract all elements between the start and end tag.

For instance, one such group could be structured like this:

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

The goal is to identify groups based on the following criteria:

  1. If the first index of an element is 1 (i.e., a[i][0] == 1) and if a[i][1] marks the beginning of a valid HTML tag.
  2. If the first index of an element is 0 (i.e., a[i][0] == 0) and it's preceded and succeeded by conditions outlined in Step 1 and Step 3.
  3. If the first index of an element is 1 (i.e., a[i][0] == 1) and if a[i][1] indicates the end of a valid HTML tag.

All these rules combined constitute a group or JavaScript object.

Another scenario to consider:

[1,"<ul><li>test</li></ul>"]

In this case, the array item encapsulates the complete group

<ul><li>test</li></ul>
, which should also be included in the resulting array.

Edit


I have refined my methodology

let a = [
  [
    0,
    "<p><strong>Lorem Ipsum</strong> is simply dummy text of "
  ],
  [
    1,
    "<strong>"
  ],
  [
    0,
    "the"
  ],
  [
    1,
    "</strong>"
  ],
  [
    0,
    " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "
  ],
  [-1,
    "and"
  ],
  [
    1,
    "test"
  ],
  [
    0,
    " scrambled it to make a type"
  ],
  [
    1,
    "  added"
  ],
  [
    0,
    "</p>"
  ],
  [
    1,
    "<ul><li>test</li></ul>"
  ]
];


checkAndRemoveGroups(a, 1);


function checkAndRemoveGroups(arr, group) {

  let htmlOpenRegex = /<([\w \d \s]+)([^<]+)([^<]+) *[^/?]>/g;
  let groupArray = new Array();
  let depth = 0;

  //Iterate the array to find out groups and push the items

  for (let i = 0; i < arr.length; i++) {
    if (arr[i][0] == group && arr[i][1].match(htmlOpenRegex)) {
      depth += 1;
      groupArray.push({
        Index: i,
        Value: arr[i],
        TagType: "Open"
      });
    }
  }

  console.log(groupArray);

}

Answer №1

One approach is to utilize an array for both opening and closing tags, then evaluate the length of the array to determine if additional tags are needed to close the top tag.

function extractTags(inputString) {
    var regex = /<(\/?[^>]+)>/g,
        match,
        result = [];

    while ((match = regex.exec(inputString)) !== null) {
        // To prevent infinite loops with zero-width matches
        if (match.index === regex.lastIndex) {
            regex.lastIndex++;
        }
        result.push(match[1])
    }
    return result;
}

var tagsArray = [[0, "<p><strong>Lorem Ipsum</strong> is simply dummy text of "], [1, "<strong>"], [0, "the"], [1, "</strong>"], [0, " printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type "], [-1, "and"], [1, "test"], [0, " scrambled it to make a type"], [1, "  added"], [0, "</p>"], [1, "<ul><li>test</li></ul>"]],
    resultArray = [],
    nestedTags = [],
    tempTags,
    index = 0;

while (index < tagsArray.length) {
    if (tagsArray[index][0] === 1) {
        tempTags = extractTags(tagsArray[index][1]);
        if (!tempTags.length) {
            index++;
            continue;
        }
        resultArray.push([]); // Indicates a new group begins
        while (index < tagsArray.length) {
            tempTags.forEach(function (tagName) {
                if (tagName.startsWith('/')) {
                    if (nestedTags[nestedTags.length - 1] === tagName.slice(1)) {
                        nestedTags.length--;
                    }
                    return;
                }
                nestedTags.push(tagName);
            });
            resultArray[resultArray.length - 1].push(tagsArray[index]);
            if (!nestedTags.length) {
                break;
            }
            index++;
            tempTags = extractTags(tagsArray[index][1]);
        }
    }
    index++;
}

console.log(resultArray);
.as-console-wrapper { max-height: 100% !important; top: 0; }

Answer №2

I agree with Scott's point of view. There might be a more efficient method to achieve your desired results. While I understand that you are attempting to extract data from this array, there could be an alternative solution without having HTML nested within sub-arrays.

-- Apologies for my previous response as it seems like I misunderstood your requirements. Hence, I deleted the original answer. Let me delve deeper into this issue.

Are you sure this is the exact output you're looking for? It appears challenging to obtain [0,"the"] when each element is being compared against an HTML regex. Furthermore, every element is encapsulated in its own object, which doesn't align with your objectives.

let group = [
  {
    [1,"<strong>"],
    [0,"the"],
    [1,"</strong>"]
  },
  {
    [1,"<ul><li>test</li></ul>"]
  }
];  

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Uploading files seamlessly without the need for refreshing the page

On my HTML page, I have a form input that allows users to upload a file. Here is the code snippet: <form action = "UploadFile.jsp" method = "post" target="my-iframe" enctype = "multipart/form-data"> <input type = "file" name = "file" ...

What is the best way to integrate Angular 5 with various sources of JavaScript code?

I am fairly new to angular. I am looking to merge an external angular script file and on-page javascript code with angular 5. I understand that angular does not typically allow for the inclusion of javascript code in html components, but I believe there m ...

Unexpected Behavior Arises from Axios Get API Request

Below is a functional example in my CodePen showing what should be happening. Everything is working as intended with hard coded data. CodePen: https://codepen.io/anon/pen/XxNORW?editors=0001 Hard coded data: info:[ { "id": 1, "title": "Title one ...

Combine all the HTML, JavaScript, and CSS files into a single index.html file

So here's the situation - I am utilizing a C# console application to convert four XML files into an HTML document. This particular document will be circulated among a specific group of individuals and is not intended to be hosted or used as a web app; ...

utilizing class manipulation to trigger keyframe animations in react

Currently, I am delving into my very first React project. In my project, I have implemented an onClick event for one of the elements, which happens to be a button. The aim is to smoothly transition an image's opacity to 0 to indicate that the user ha ...

How do I use regex to grab all the text between two specific headers?

I need help extracting text between two specific headings. I attempted to create a regex for this purpose, but it's not quite capturing what I want. Currently, it includes the heading and paragraph, but misses the last heading. My Current Regex: /^& ...

The combination of PyMongo and Flask's Jsonify results in the presence of escape slashes

I am attempting to create a Flask response from a Mongodb collection: @app.route('/stories', methods = ['GET']) def get_stories(): stories = db.stories.find() json_docs = [json.dumps(doc, default=json_util.default) for doc in ...

Can you explain the ratio between 1 unit in Three.js and 1 unit in Oimo.js?

When using Three.js to create objects and combining it with a physics system like oimo.js, I've noticed that each system has its own sizing method. While Three.js has its own sizing system for object creation, oimo.js uses a different sizing system sp ...

Using TypeORM to Retrieve Data from Many-to-Many Relationships with Special Attributes

Hey there, I'm diving into the world of TypeORM and could really use some guidance. I've been attempting to set up many-to-many relationships with custom properties following the instructions provided here However, I've run into a few iss ...

Steps to resolve the Error: $injector:unpr Unknown Provider in AngularJS when establishing a connection to a Laravel database

Recently, I've delved into learning AngularJS and attempting to create a basic webpage that connects AngularJS with a backend Laravel database by following some tutorials. However, I keep encountering an error message stating: Error: $injector:unpr ...

Determine the total number of regular expression matches within an array

Still learning the ropes of php so bear with me if this sounds like a basic question. I'm in the process of creating an array based on a directory that contains images with different naming conventions. Below is a snippet of the code for constructing ...

Embedding an image into a component in React Native

Currently, I am utilizing a library called select multiple buttons and I have successfully customized it to my needs. However, I now have the requirement to include an image within this component. const multipleData = [button1, button2, button3, button4, ...

Pause the counter based on the data attribute containing multiple values

I have a collection of div elements, each with a unique data-attribute value. My goal is to display these values in the divs using JavaScript by incrementing a counter. const numbers = document.querySelectorAll(".number"); console.log(numbers); let c ...

How can I tailor the child count in ag grid to fit my needs?

Currently, I am using ag grid with React and have successfully implemented row grouping. However, the parent rows are displaying child row counts in numeric values. Is there a way to customize the style of the row count? Additionally, I am interested in ca ...

What is the method for turning off client-side validation for a specific field with jQuery when a checkbox is selected?

This is the HTML code used in my MVC Razor page. <form> <input id="inputLPR" asp-for="LicensePlateNo" class="form-control"/> <input type="checkbox" id="isEnableBypass"><label for=&qu ...

What is the best way to implement lazy loading for headless UI Dialog components in a React

Is there a way to implement lazy loading for the headless ui Dialog component while preserving transitions? Below is the current implementation that almost works: // Modal.js const Modal = ({ isOpen }) => { return ( <Transition show={isOpen ...

AngularJS: Filtering one array with another array

I have a data structure structured in the following way: $scope.data = [ { title: "Title1", countries: ['USA', 'Canada', 'Russia'] }, { title: "Title2", countries: ['France', 'Germany&apo ...

Determine the quantity of posts currently in the Div

Hello, I am facing an issue where I am trying to determine the number of currently displayed posts in the content area. Even though there are 3 posts currently displayed, when I check the length, it is returning 1. $(document).on("click", ".load-more", fu ...

Quoted strings within a CSV input are transformed into doubly-escaped characters

I am currently working on utilizing JQ for processing a CSV file that does not contain any column headings: cat "input.csv" "12345678901234567890","2019-03-19",12 Is there a more elegant and easily readable method to eliminate escaped quotes from the fir ...

Struggling to create a C# class structure for a JSON string

Currently, I am faced with the challenge of deserializing an object into a custom class. The string that will be passed to me has the following format: {nodename:"node1", version:"v1", PARM1:"p1", PARM2:"p2" ,…, PARAMN:"pn"}. From what I understand, I ...