Struggling to apply a regex (with 1 variable) to compare against a HTML code page stored as text.
The HTML code is separated into an array, with each element representing a snippet like the one below. Each element showcases details of fictional Houses (name, square footage, etc). My objective is to identify and match just one of these houses by extracting the text between the first TD tags. Specifically, I am interested in the VALUE (digits) inside the last INPUT tag of the form.
<TR BGCOLOR=#D4C0A1>
<TD WIDTH=40%><NOBR>Luminous Arc 2</NOBR></TD>
<TD WIDTH=10%><NOBR>154 sqm</NOBR></TD>
<TD WIDTH=10%><NOBR>6460 gold</NOBR></TD>
<TD WIDTH=40%><NOBR>rented</NOBR></TD>
<TD><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
<FORM ACTION= METHOD=post><TR><TD>
<INPUT TYPE=hidden NAME=world VALUE=Olympa>
<INPUT TYPE=hidden NAME=town VALUE="Yalahar">
<INPUT TYPE=hidden NAME=state VALUE=>
<INPUT TYPE=hidden NAME=type VALUE=houses>
<INPUT TYPE=hidden NAME=order VALUE=>
<INPUT TYPE=hidden NAME=houseid VALUE=37010>
<INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>
</TD></TR></FORM></TABLE></TD></TR>
I formulated the following regular expression:
var regex = new RegExp(house + "[\\s\\S]+name=houseid value=([0-9]+)>", "i");
where house
refers to the house name (e.g., Luminous Arc 2
) and the essential piece I seek is the houseid 37010
.
I assumed this Regex would provide me with the needed result, yet houses[i].match(regex)
consistently returns null. No matches are found in the string.
I've experimented with various methods, even trying to convert the HTML string into a DOM Object for parsing TR tags (unsuccessful). Progress seems within reach, but I'm currently at a standstill.
If anyone can discern why my regex isn't functioning correctly, it would be greatly appreciated.
Kenneth