Contextualizing the Word Cloud

Word Clouds can provide a powerful visual representation of text data. When it comes to customer review data, word clouds can quickly showcase the most prominent keywords people are using and their associated sentiment. The basic word cloud works by selecting the most frequent keywords in a set of data (in our case customer reviews or surveys) for a given client or industry. The font size of text represents the frequency with which a keyword has been mentioned in these reviews. The color of text (ranging from dark green for positive to dark red for negative) represents the average sentiment of these mentions. This way users can gain immediate insight into what customers are talking about the most and their relative sentiment with respect to those topics without having to read through each customer review one by one. Our goal in the data science labs was to experiment with different ways of depicting a Word Cloud to maximize the insight this tool could provide.

ZingChart

I began this exploration by selecting ZingChart to help me build initial word clouds on top of our data. It is a fairly easy to use JavaScript library with dozens of built-in responsive chart types. The library itself can be easily integrated with my work stack Vue.js and the syntax is straightforward. I was able to build the cloud simply using their pre-defined keyword attributes including “words”, “rotate” and “color”. The default CSS setting for the cloud is quite nice in that it does not require any additional styling.

The cloud below showcases customer review data relating to staff professionalism in a hospital. The word “nurse” indicates that a lot of customers evaluated their nurses in reviews and they had somewhat positive experiences. People were even more positive about the “staff” in general. On the other hand, there were various complaints around “bills”, “ultrasounds”, and “rude” and “unprofessional” staff.

But what are the main insights with respect to these reviews? Are we to conclude that experiences with the nurses were not as positive as with the rest of the staff? How positive were these experiences compared to nurse services in other hospitals? Do people mention the nurses and overall staff this frequently in other hospitals as well? Unfortunately, this single cloud cannot answer those questions.

Towards digging into that, we wanted to figure out how to get more context into the word cloud. However, when it comes to specific customization, the free version of ZingChart will not do the trick. For example, the only data I could pass in for a tooltip when hovering over the word is the word itself and its count. I would have to pay technical support fees to extend this feature. Because of this I decided to explore other libraries. In particular I decided to explore what I could do with the popular data visualization tool D3.

D3

D3.js is a dynamic, interactive and data-driven JavaScript library for producing data visualizations. The D3 word cloud library I used was d3-cloud, created by Jason Davies. It uses HTML5 canvas and sprite masks to achieve near-interactive speeds. The layout algorithm runs asynchronously, which makes it possible to animate words as they are placed without stuttering. The syntax of the library is fairly straightforward. The library uses d3.layout.cloud() to construct a cloud, start() and stop() for layout algorithms, and font(), size() functions to specify the attributes. The cloud is highly customizable. One function I particularly enjoy is random() – it sets initial position and clockwise/counterclockwise direction of the spiral of each word, which means it can print out two clouds with the same order for words.

This possibility was particularly intriguing to us, as it provided an option to layout a benchmark word cloud side-by-side with the initial word cloud. If both clouds could have exactly the same text and text position, the two could easily align side by side to provide a quick visual benchmark for each word. Thus, users could not only gain general insights of their own customer reviews, but also compare easily to various benchmarks such as industry averages or performance over a longer time period.

In the cloud below, I implemented this for one hospital’s reviews over two time periods, the last 3 months and the last 24 months. Immediately you can see a value to the benchmark. With the two clouds side-by-side (more recent reviews on the left) you can see that feedback about “doctors” and care related to “children” is in fact getting worse. You can also quickly see that sentiment around waiting (“wait”, “hour”, “minute”) is in fact getting more negative compared to some of the other words showing up in yellow here.

This is a big improvement, and I feel this is a good way to add context to a word cloud. However, we do run into some difficulties with this approach. In the case above, all of the words are in pretty much the same place, but this is primarily because the relative frequencies of the words are very similar in the two data sets. In some cases however, this is not true. In a followup post, next month, I will go through one of these cases in more depth and discuss the custom cloud we had to build to overcome these issues.