I am currently utilizing a well-known word cloud library from this source: https://github.com/jasondavies/d3-cloud
My code is based on a replica of this block: http://bl.ocks.org/blockspring/847a40e23f68d6d7e8b5
In my dataset, I am aiming to limit the maximum number of words displayed in the word cloud. While the library provides functions for rotation, font size, spiral method, and more, it doesn't seem to have a direct way to determine the maximum word count to show.
To optimize computation, I believe feeding a subset of the original word count would be more efficient. However, I'm unsure if the word_count object is sorted by frequency before being processed by cloud.js as there are no apparent .sort
calls.
If cloud.js does sort the word_count object by frequency or tf-idf, I would need to wait until after the list is generated to return the top k words, indicating that the entire text file has been iterated through.
Despite potential speed improvements in visualization, limiting the display to the top k (most frequent) words, let's say 20 excluding common words, might not significantly impact the underlying algorithm performance.
Visualizing it, larger font sizes indicate higher frequencies, which aligns with choosing the top k as the largest font-size words.
If anyone experienced with this visualization type can guide me on adjusting the code to return the top k words, I would greatly appreciate it.
Note: I initially posted this query on GitHub but was redirected here due to relevance. I've attempted to clarify and provide sufficient context, albeit feared it may still be considered too vague for stack overflow. Thank you for understanding.
Appreciatively,