Steam Reviews Tool

Context

Before starting to explain the tool in itself, let me introduce you the context in which I created it.
As my master’s degree second year project started, I was asking myself: “How could I introduce our users in the loop of production while we don’t have a game yet?”. After some reflexions, I remembered a Talk by Emmanuelle Marévéry I had heard at Game UX Summit 2019. It was relating to how the text mining could actually tells a lot about what users say about a game. I decided to dig this topic further and I started to work on that. I had no affinity at all with python ans therefore, I decided to use R to create the tool. I started to search for a way to get all the reviews available on Steam. I found the Steam API which is actually quite bad when it comes to request the reviews: I successfully get them, but there were something like 1% of them. Apparently the API doesn’t keep all the reviews, stuck there, I had to find another way. I started to dig deeping the webscraping in R. I was using Rvest but unfortunatly for me, the way that Steam display the reviews got me stuck again. The endless loading which actualize each time we go to the bottom of the page forestalled me to get them once again. But I’m not a quitter. l asked help to a collegue that explained to me how to get all the requesting URLs endlessly. A good hundred of hours later, I had them all, ready to text mine everything.

Histogram

First of all, I wanted that my team could access to the top twenty words most used throughout all the reviews. Therefore they could see what were the hotspots of our concurrents even though they couldn’t know why words were used. The following analyse will be about Hyper Light Drifter, as it has been tag like a game that inspire our game.

Global Reviews

After that, I decided to apply the same process but before, I filtered good and bad reviews so it would give a valence to the top twenty words most used. As you can see, the tool gives the number of reviews used to do the top twenty. It brings some relativeness to the valence.

Recommended Reviews

Not Recommended Reviews

Wordcloud

In order to be useful and understandable to my team, I use visual graphics like wordclouds to represent the occurency of the words. As I did earlier, I splitted my wordclouds in 3 parts, global/ recommended / not recommended reviews and here they are:

Global Reviews

Recommended Reviews

Not Recommended Reviews

As you could have guessed, the colour and size of the words are displayed according to their occurency.

Bi-grams Graphic

Now that we had a good vision of the big trends concerning Hyper Light Drifter, I wanted to go further and give sens to these trends. I managed to do a graph of Bi-grams (it basically scans all the reviews and returns the most used couple of words). For the example, I picked a global graph but I divided them into 3 part in the tool: global / recommended / not recommended reviews and for each of them, the .pdf is split in 3 layers to enhance the visualization (from the highest occurency to the lowest).

The tool create a directory and put the files inside it, I have nothing to do but to change the Steam link in the code.
The way Steam website is coded makes the tool break sometimes. It can’t gather all the reviews but it often gather at least more than 5k reviews which is already a good number.
The goal of this tool is to give an effortless overall reviews of what has been said about games and allowing to find potential hot-spot to focus on in further qualitative researches.

To conclude, you can download and have an eye on the files of Hyper Light Drifter here: