Hey everyone, I'm currently using BS4 to parse a webpage. However, the block of code returned by BS4 is written in JS as a string and it's not recognizing the URLs I'm trying to extract.
I have identified the part I need to extract in BS4:
var vd1="\x3c\x73\x6f\x75\x72\x63\x65\x20\x73\x72\x63\x3d\x27";
var vd2="\x27\x20\x74\x79\x70\x65\x3d\x27\x76\x69\x64\x65\x6f\x2f\x6d\x70\x34\x27\x3e";
var luu=pkl("uggc://navzrurnira.rh/tvs.cuc?vcqrgrpgrq");
// Other encrypted variables...
document.write("<video "+" class='vid' id='videodiv' width='100%' autoplay='autoplay' preload='none'>"+ vd1 +soienfu+ vd2 + vd1+iusfdb+ vd2 + vd1+ufbjhse+ vd2 +"Your browser does not support the video tag.</video> ");
But when I view this on the website's HTML, all I see is:
Your browser does not support the video tag.
My goal is to retrieve the video URL from this HTML block, which looks like this:
This is the code snippet I am using to achieve this:
import requests,bs4,re,sys,os
url="http://animeheaven.eu/watch.php?a=Fairy%20Tail&e=55"
mainsite="http://animeheaven.eu/"
r2=requests.get(url)
r2.raise_for_status()
soup2=bs4.BeautifulSoup(r2.text,"html.parser")
dlink=soup2.select("script")
However, I am facing issues parsing 'dlink' for the URL due to the JavaScript content. As I'm new to web scraping and not very familiar with JS, I'm struggling to figure out a solution.
# would extract standard url
mylink=re.compile(r"href='(.*)'")
downlink=mylink.search(str(dlink[3]))[1]