After retrieving the HTML content from a website with my function, I am using String.prototype.match
along with a regex rule to extract email addresses from that page. However, the issue is that I am receiving a line that matches the regex but does not contain the actual email address, and it includes all of the appended DOM elements.
My current challenges are:
The regex is not correctly capturing the email part of
mailto:xxxx
.Unexpectedly, the entire DOM is being displayed in the console when I use
console.log(matches[0]);
const websiteEmailRegex = 'mailto:([^\?]*)';
let HTML = document.documentElement.outerHTML;
let matches = HTML.match(websiteEmailRegex);
if (matches) {
console.log('email', matches[0]);
}
Here is a snippet of what appears in the console:
mailto:<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="10797e767f5064716264716964717e6471633e7563">[email protected]</a>"> <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7c15121a133c081d0e081d05081d12081d0f52190f">[email protected]</a></a></p>
<p>No cogemos pedidos por mail.</p>
<p></p>
</div><!-- .entry-content -->
<footer class="entry-footer">
</footer><!-- .entry-footer -->
</div>
</article><!-- #post-## -->
</main><!-- #main -->
</div><!-- #primary -->
</div></div>
<footer id="colophon" class="site-footer" role="contentinfo" itemscope="" itemtype="http://schema.org/WPFooter">
<div class="container">
<div class="footer-t">
<div class="row">
<div class="three-cols">
<div class="col">
</div>
<div class="col center">
<section class="widget widget_contact_form">
</section>
</div>
<div class="col">
</div>
</div>
</div>
</div>
<div class="site-info">
<span>
©2020 <a href="http://tartaytantas.es/">Tartaytantas - Tartas y bizccochos a domicilio en Aravaca</a>. </span>
Bakes and Cakes | Desarrollado por <a href="https://rarathemes.com/" rel="nofollow" target="_blank">
Rara Theme </a>
Funciona gracias a <a href="https://wordpress.org/">WordPress.</a>
</div><!-- .site-info -->
</div>
</footer><!-- #colophon -->
<div class="overlay"></div>
<a href="javascript:void(0);" class="btn-top"><span>Arriba</span></a>
</div><!-- #acc-content -->
</div><!-- #page -->
<script type="text/javascript" src="https://secureservercdn.net/160.153.137.170/zm5.b57.myftpupload.com/wp-content/themes/bakes-and-cakes/js/owl.carousel.min.js