Greetings! Your attention to this post is greatly appreciated.
I recently set out to gather insights on a particular news article. Out of the staggering 11,000 comments attached to the news piece, I was able to acquire data from approximately 6,000 comments. For those interested, you can access the full list of comments through this link: (Don't worry if it's in Korean, as the content will be easily navigable for all).
Please note that this link leads to a mobile version of the webpage, and a specific code needs to utilized to reveal the entire comment thread:
driver.find_element_by_xpath("//span[@class='u_cbox_page_more']").click()
The challenge I encountered was the sluggish approach I took to extract the data. The process extended beyond an hour before I ultimately had to intervene. Here is the snippet of the code I employed:
content = []
name = []
r_time = []
comment_list = driver.find_elements_by_xpath("//ul[@class='u_cbox_list']/li")
for comment in comment_list:
try:
con = comment.find_element_by_xpath(".//span[@class='u_cbox_contents']").text
content.append(con)
except NoSuchElementException:
continue
name.append(comment.find_element_by_xpath(".//span[@class='u_cbox_nick']").text)
r_time.append(comment.find_element_by_xpath(".//span[@class='u_cbox_date']").text)
I have a multitude of news articles lined up for extraction, and waiting around for each crawl operation is not feasible. There must be a more efficient method to obtain the necessary information. I dabbled with Java Script but couldn't locate a Selenium-compatible solution written in Python. Unfortunately, my knowledge of JavaScript is limited.
If there exists an alternative approach and someone could furnish me with a working example, I am eager to learn and adapt swiftly. Any guidance or assistance provided would be immensely appreciated.
Thank you for dedicating your time and expertise to aid in this endeavor. Your invaluable support is anticipated and warmly welcomed.