What is the quickest method for retrieving li data using selenium?

Greetings! Your attention to this post is greatly appreciated.

I recently set out to gather insights on a particular news article. Out of the staggering 11,000 comments attached to the news piece, I was able to acquire data from approximately 6,000 comments. For those interested, you can access the full list of comments through this link: (Don't worry if it's in Korean, as the content will be easily navigable for all).

Please note that this link leads to a mobile version of the webpage, and a specific code needs to utilized to reveal the entire comment thread:

driver.find_element_by_xpath("//span[@class='u_cbox_page_more']").click()

The challenge I encountered was the sluggish approach I took to extract the data. The process extended beyond an hour before I ultimately had to intervene. Here is the snippet of the code I employed:

content = []
name = []
r_time = []

comment_list = driver.find_elements_by_xpath("//ul[@class='u_cbox_list']/li")
              
for comment in comment_list:
    try:
        con = comment.find_element_by_xpath(".//span[@class='u_cbox_contents']").text
        content.append(con)
    except NoSuchElementException:
        continue

    name.append(comment.find_element_by_xpath(".//span[@class='u_cbox_nick']").text)        
    r_time.append(comment.find_element_by_xpath(".//span[@class='u_cbox_date']").text)

I have a multitude of news articles lined up for extraction, and waiting around for each crawl operation is not feasible. There must be a more efficient method to obtain the necessary information. I dabbled with Java Script but couldn't locate a Selenium-compatible solution written in Python. Unfortunately, my knowledge of JavaScript is limited.

If there exists an alternative approach and someone could furnish me with a working example, I am eager to learn and adapt swiftly. Any guidance or assistance provided would be immensely appreciated.

Thank you for dedicating your time and expertise to aid in this endeavor. Your invaluable support is anticipated and warmly welcomed.

Answer №1

I have managed to decrease the time it takes to retrieve comments from this page to around 17 minutes (11 minutes - clicking on show more link, 6 minutes - fetching data).

Code:

driver = webdriver.Chrome()
driver.get('https://n.news.naver.com/mnews/article/comment/023/0003390153?sid=102')

content = []
name = []
r_time = []

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "u_cbox_page_more")))    # need for click by JS

while True:
    try:
        driver.execute_script("document.querySelector(\".u_cbox_paginate[style=''] .u_cbox_page_more\").click(); window.scrollTo(0,document.body.scrollHeight);")
        # WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "u_cbox_page_more"))).click()
        # WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".u_cbox_page_more"))).click()
        # WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='u_cbox_page_more']"))).click()
    except:
        break

comment_list = driver.find_elements_by_xpath("//ul[@class='u_cbox_list']/li")

for comment in comment_list:
    try:
        con = driver.execute_script("return arguments[0].querySelector('.u_cbox_contents').innerText;", comment)
        content.append(con)
    except Exception:
        continue

    name.append(driver.execute_script("return arguments[0].querySelector('.u_cbox_nick').innerText;", comment))
    r_time.append(driver.execute_script("return arguments[0].querySelector('.u_cbox_date').innerText;", comment))

Bonus. In the code above you can see 4 different methods for displaying all comments. I conducted a comparison:

|---------------------|------------------|
|    locator type     |       time, s    |
|---------------------|------------------|
|          JS         |        656.9     |
|---------------------|------------------|
|       class name    |        728.1     |
|---------------------|------------------|
|         css         |        736.5     |
|---------------------|------------------|
|        xpath        |        774.3     |
|---------------------|------------------|

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Scraping the Web: Combining Selenium Webdriver, Beautifulsoup, and Dealing with Error 416

Currently, I am using selenium webdriver in Python along with Proxy for web scraping purposes. My goal is to scrape more than 10,000 pages from a single website utilizing this method. The main issue I am facing is that with this proxy, I am only able to ...

What is the method for choosing elements within an iframe using Xpath?

I am in the process of developing a Selenium test to assess the functionality of our extensions on AOL mail. Although I have been successful in logging into AOL and composing an email, I am encountering issues with selecting elements within the editor it ...

Delaying between typed characters in Selenium SendKeys can be achieved by implementing a small pause

I have encountered an issue while using an actions chain to input text, where duplicate characters appear. I suspect this might be due to the lack of delay. How can I fix this problem? Here is an example code snippet: big_text = "Lorem ipsum dolor sit ...

Create a relative xpath expression that targets every item in the list

I am currently utilizing the website for my project. Within this site, I am specifically using Demo table 2 as a reference point. My main goal is to create an xpath that will extract the list of details found in the structure column of a Web table and dis ...

Not receiving connections on localhost port 3000

Our team has successfully created a basic Express Node website https://i.stack.imgur.com/5fwmC.png We attempted to run the app using DEBUG=express_example:* npm start https://i.stack.imgur.com/NI5lR.png We also tried running it with node DEBUG=express_ ...

What is the most efficient way to update a counter when a button is clicked in React and display the result on a different page?

Just delving into the world of React and Javascript, I decided to challenge myself by creating a Magic 8 Ball application. Currently, I have set up two main pages: The magic 8 ball game page A stats page to showcase information about the magic 8 ball qu ...

What could be the reason I am unable to choose data properties from the dropdown options?

There are two different dropdowns for clothing types and colors. When a type of clothing is selected from the first dropdown, JSON data will fill the second dropdown with options based on the first dropdown selection. After selecting an option from the se ...

What is the best way to efficiently load all of my web applications within the web application that I am currently developing?

https://i.stack.imgur.com/IAjIW.png Greetings! I am currently a novice Web Developer and here is my current situation: I am working on developing 3 web applications, with one additional application that will load all three of them. Please refer to the im ...

Having issues sending multiple variables to PHP through Ajax

Trying to pass three variables through the URL using Ajax - one constant and two from a date picker. The constant passes fine, but the date variables are only passing as their names. Here's the code for the date pickers: <tr> ...

Verifying the functionality of a custom directive in Angular 2 (Ionic 2) through unit

In my ionic application, I developed a custom directive specifically for text masking, aimed at formatting phone numbers within input fields. The core functionality of this directive revolves around utilizing ngControl to facilitate the retrieval and assig ...

Exploring the intricacies of Knockout JS mapping nested models using fromJS function

I am struggling with understanding how to effectively utilize the Knockout JS Mapping Plugin. My scenario involves nested models, and currently I am only using the ko.mapping.fromJS() in the parent model. However, I have noticed that the computed values ar ...

What is the best way to adjust the placement of a component to remain in sync with the v-model it is connected to?

I am encountering an issue with 2 sliders in my project. I have set it up so that when the lower slider's value is greater than 0, the top slider should automatically be set to 5. I am using a watcher function for this purpose. However, if I manually ...

When utilizing AJAX XMLHttpRequest, the concatenated response text from Symfony's StreamedResponse becomes apparent

Below is the code for a controller that returns Line 1 as soon as the endpoint is called and then two seconds later it returns Line 2. When accessing the URL directly at http://ajax.dev/app_dev.php/v2, everything works as expected. /** * @Method({"GET"}) ...

Customizing hyperlink styles with JavaScript on click

Hey there! I'm experimenting with something new. I've managed to change the background color of each link after it's clicked, but now I'm facing a challenge: How can I revert the original style when another link is clicked? Here's ...

Refreshing Data on Vuetify Range Slider

My goal is to update the value as the slider position changes. [codepen]https://codepen.io/JakeHenshall/pen/WLezNg <div id="app"> <v-app id="inspire"> <v-card flat color="transparent"> <v-subheader>Tick labels</v-subheade ...

What is the best way to combine key-value pairs objects into a single object using JavaScript?

I am faced with the challenge of creating a new object that combines keys from a specific array (const props = []) and values from existing objects. If a key does not exist in an object, I aim to populate it with null or placeholder values. Here is my cur ...

Connect Angular ngx-datatable accountid to a specific details page

My datatable is displaying static data with account numbers and other details, including a column for actions such as viewing a row. When I click on the button, an alert shows me the specific details. userdetails.component.ts rows: any = [ { id: 0 ...

Explore the versatile Bootstrap Table for class

Recently, I created a table with the following structure: <table id="table" class="table table-bordered table-hover"> <thead> <tr> <th data-field="id" class="hidden">ID</th> <th data-fie ...

An elusive melody that plays only when I execute the play command

I am currently working on creating a music Discord bot using the yt-search library, however, I am encountering an issue where it returns undefined when trying to play a song and joins the voice channel without actually playing anything. My approach is to u ...

"By implementing an event listener, we ensure that the same action cannot be

function addEventListenerToElement(element, event, handlerFunction) { if(element.addEventListener) { element.addEventListener(event, function(){ handlerFunction(this.getAttribute("src")); }, false); } } //initialize the function addEve ...