I am currently in the process of developing a Chrome extension that needs to identify specific pages within a website, including the Log In / Sign In page, the Sign Up / Register page, the About page, and the Contact Us page.
My approach involves obtaining a list of elements on the page, which I have already accomplished. Now, I need to examine the innerHTML of each element to ensure it is a leaf node in the DOM and contains a portion of the keyword. I am attempting to accomplish this using a regex. Although I have successfully created a regex that extracts content between start or end tags of an element, it does not capture the innerHTML. Below is my progress so far, focusing on the About page:
var list = document.body.getElementsByTagName("*");
var aboutElement = /^[^<.+>].*About.*[^(<.+>]$/i;
for (var i = 0; i <= list.length; i++) {
if ((aboutElement.test(list[i].innerHTML)) || (aboutElement.test(list[i].alt))) {
list[i].click();
}
}
I am seeking guidance on how to modify the regex pattern to only match leaf nodes (nodes without child nodes) and avoid capturing content within start or end tags. I suspect that the current regex pattern may match everything in the innerHTML due to the .* section, so adjustments might be necessary. Any assistance or suggestions would be highly appreciated!