At the outset, it's important to differentiate between a hostname and a URL. For instance, www.domain.com
is actually a hostname, not a complete URL.
<a href="www.domain.com">
This specific line of code will not function correctly as it will attempt to locate a file named www.domain
with a .com
extension relative to the existing page.
Highlighting hostnames can be quite challenging due to the wide range of possible variations that may exist. While you could potentially identify strings like ‘www.something.dot.separated.words’, this method isn't foolproof given that many websites do not include the www.
prefix in their hostnames. It might be best to avoid attempting to highlight such hostnames altogether.
/\bhttps?:\/\/[^\s<>"`{}|\^\[\]\\]+/;
The pattern provided above offers a liberal approach for detecting HTTP URLs. Depending on your requirements, you may need to refine the pattern to exclude certain characters or verify legitimate URL endings such as periods or exclamation marks which are rarely part of a URL in practice.
If necessary, you could incorporate an alternative syntax that allows either standard URL patterns or the www.hostname
structure.
When implementing your preferred matching pattern, it's crucial to apply it only to text nodes within the page rather than directly on the underlying HTML markup. Running the pattern on the innerHTML
content can have detrimental effects including removing essential JavaScript references, events, or form data already present on the page.
Regular expressions are generally unreliable when handling HTML content. To circumvent this issue, leverage the browser's pre-parsed elements and text nodes instead of attempting to process HTML using regular expressions. Furthermore, refrain from modifying text inside <a>
elements as it would disrupt existing links and possibly invalidate the HTML structure.
// Function to convert plain text URLs into clickable links
function addLinks(element) {
var urlpattern= /\bhttps?:\/\/[^\s<>"`{}|\^\[\]\\]+/g;
findTextExceptInLinks(element, urlpattern, function(node, match) {
node.splitText(match.index+match[0].length);
var a= document.createElement('a');
a.href= match[0];
a.appendChild(node.splitText(match.index));
node.parentNode.insertBefore(a, node.nextSibling);
});
}
// Recursively search for text within element nodes
// Avoid processing link elements
//
function findTextExceptInLinks(element, pattern, callback) {
for (var childi = element.childNodes.length; childi-- > 0;) {
var child = element.childNodes[childi];
if (child.nodeType === Node.ELEMENT_NODE) {
if (child.tagName.toLowerCase() !== 'a')
findTextExceptInLinks(child, pattern, callback);
} else if (child.nodeType === Node.TEXT_NODE) {
var matches = [];
var match;
while (match = pattern.exec(child.data))
matches.push(match);
for (var i = matches.length; i-- > 0;)
callback.call(window, child, matches[i]);
}
}
}