Studying the Twitch Gamers Dataset

Utkarsh Mishra
4 min readApr 25, 2023

Twitch is a popular platform for gamers and content creators to stream their games and interact with their audience. The Twitch Gamers Dataset is a valuable resource for researchers and analysts who want to understand the Twitch ecosystem and its impact on the video game industry. However, the dataset is quite large, which can make it challenging to work with. In this blog, we will discuss our experience working with the Twitch Gamers Dataset on Neo4j, and how we were able to display the top ten nodes with the highest connections and the most views.

To begin, we had to split the Twitch Gamers Dataset into smaller parts in order to make it more manageable. We broke up the data into multiple files, each containing a portion of the total dataset. Once we had done this, we uploaded the data to Neo4j and used scripting on the Neo4j terminal to read all the split datasets. This allowed us to import the data into Neo4j in a more manageable way.

Once the data was imported into Neo4j, we were able to begin analysing it. Our first step was to display the data of the nodes with the query “MATCH (n) RETURN n”. This query returned a large amount of data, which we were then able to analyse in more detail.

One of the key metrics that we were interested in was connections. Connections represent the number of interactions between users on the Twitch platform, including following, subscribing, and chatting. We wanted to identify the top ten nodes with the highest number of connections in our dataset.

To do this, we used the following query:

match (s)-[]->(t) return s.numeric_id, size(collect(t)) as connections order by connections desc limit 10

This query returns all nodes in the dataset, orders them by connections in descending order, and then limits the results to the top ten nodes. This allowed us to quickly identify the most connected games and streamers in our dataset.

We found that the top ten nodes with the highest number of connections were dominated by popular games such as League of Legends, Dota 2, and Fortnite. However, there were also a few streamers who made it into the top ten, including popular streamer Ninja.

Once we had identified the top ten nodes with the highest number of connections, we were able to analyse them in more detail. For example, we could look at the characteristics of the top games and streamers to identify commonalities and trends. We could also compare the top ten nodes to the rest of the dataset to see how they differed in terms of factors such as viewer demographics and streaming frequency.

Next, we were interested in identifying the top ten nodes with the most views. Views represent the number of people who have watched a stream or video on Twitch. To do this, we used the following query:

match (n) return n.numeric_id, n.views as gamers order by n.views desc limit 10

This query returns all nodes in the dataset, orders them by views in descending order, and then limits the results to the top ten nodes. This allowed us to quickly identify the most viewed games and streamers in our dataset.

We found that the top ten nodes with the most views were dominated by popular games such as Fortnite, Minecraft, and Grand Theft Auto V. However, there were also a few streamers who made it into the top ten, including popular streamer xQc.

Once we had identified the top ten nodes with the most views, we were able to analyse them in more detail. For example, we could look at the characteristics of the top games and streamers to identify commonalities and trends. We could also compare the top ten nodes to the rest of the dataset to see how they differed in terms of viewer demographics and streaming frequency.

Overall, our experience working with the Twitch Gamers Dataset on Neo4j was a positive one. Neo4j is a powerful tool for analysing graph databases, and it allowed us to easily explore and manipulate the Twitch Gamers Dataset. The ability to use queries to filter and sort the data allowed us to quickly identify patterns and trends within the data.

However, there were some challenges that we faced when working with the dataset. The size of the data made it challenging to work with at times, and we had to split it into multiple files in order to import it into Neo4j. Additionally, we had to spend some time familiarizing ourselves with Neo4j and learning how to use the queries to manipulate the data.

Despite these challenges, we were able to gain valuable insights from the Twitch Gamers Dataset. By identifying the top ten nodes with the highest connections and most views, we were able to gain a better understanding of the most popular games and streamers on the platform. This information could be useful for game developers, streamers, and advertisers who want to target specific audiences on Twitch.

In conclusion, the Twitch Gamers Dataset is a valuable resource for researchers and analysts who want to understand the Twitch ecosystem and its impact on the video game industry. Working with the dataset on Neo4j allowed us to easily explore and manipulate the data, and the ability to use queries to filter and sort the data allowed us to quickly identify patterns and trends. Despite some challenges, we were able to gain valuable insights from the data, which could be useful for a variety of stakeholders.

Thankyou,

Utkarsh Mishra

--

--