’Sonic games’ Web Scraping Project

Project Overview: Web Scraping Sonic Game Data

In today's digital age, data is of growing importance. Data drives decisions, insights, and inspires innovations.

In this project, I explored web scraping and data collection. Recognizing the popularity and history of the Sonic the Hedgehog franchise, I decided to go on a mission to extract and organize a structured dataset of Sonic games.

With Python, BeautifulSoup, and other web scraping tools, this project aims to pull game titles, release dates, developers, platforms, and other relevant details from various online sources.

Eventually, what started out as a simple curiosity of mine, transformed into a rich set of data that offered a fascinating look at the evolution of the Sonic gaming franchise.

Below, I detail the steps I took to complete my project, and do my best to delve into the technical aspects.

Step 1: Create our PostgreSQL tables

Before we begin scraping data or populating our database, we need set up our database schema (or, in other words, define the way we will store our data).

For this project, I designed two tables using SQL:

  • The consoles table will hold information about various gaming consoles, identified by a unique c_id.

  • The videogames table will store details of individual video games, each identified by a unique videogame_id, and will include attributes such as title, developer, publisher, and release date.

By defining these schemas upfront, we ensure that our data has a structured place to reside once we start the data collection process.

Step 2: Gather the video game lists

Here is the main homepage of the Sonic games history website. This is our starting point. Most of the orange links on this page contain a list of video games. For example:

  • List of arcade games

  • List of LCD games

  • List of games on miscellaneous platforms

  • List of 1990s games

  • List of 2000s games

  • etc.

To be able to look at the lists inside of these links, we need to filter out the HTML we don’t need, so that we can collect the orange link URLs we do need.

We will use the Beautiful Soup 4 module in Python to do this. After we collect these URLs, we are ready to move on to the next step.

(source: https://sonic.fandom.com/wiki/Lists_of_games)

Step 3: Gather the video game URL’s

If we click on any of the orange links from the main homepage, we’ll get a list of video games, like the one we see above! (this image shows the list of 1990’s Sonic games). These are all better lists with more data, BUT we can still get more information.

Under the ‘Video game’ column, we see a list of video game title links, highlighted in orange. For example:

  • Sonic the Hedgehog

  • Sonic the Hedgehog (8-bit)

  • Sonic Eraser

  • Waku Waku Sonic Patrol Car

  • Sonic the Hedgehog 2 (8-bit)

  • etc.

These links contain more information about each individual video game, which is exactly what we need.

We will use Python to gather these particular links, so we can move to the next step, and finally start scraping data.

(source: https://sonic.fandom.com/wiki/List_of_1990s_games)

Step 4: Scrape the video game data

This is one of the many webpages we will be web scraping valuable video game data from. Specifically, we will be collecting the:

  • Video Game Title

  • Developer(s) of the game

  • Publisher(s) of the game

  • Release date of the game

  • Platform(s) that the game exists on

After we collect all this data, and clean off all the leading whitespace, we will upload the data to neat, and organized tables set up on PostgreSQL.

(source: https://sonic.fandom.com/wiki/Sonic_the_Hedgehog_(1991))

Running the Web Scraping Program

Below, I present a clip of my web scraping program successfully running in Python. As the program scrapes the data from each Sonic game, cleans the data, and uploads that data to the database, the program is also printing out the game data it has “on-hand” before it uploads that data to the database.

The Results

After we run the Python program and scrape the Sonic games websites, here is the data we collect on our database

This is our videogames table in PostgreSQL, filled with the game title data we scraped.

Here is where we store the game title names, whether or not they have a developer (1=has a developer, 0=doesn’t have a developer), the publisher, and the release date for each game.

This is our consoles table in PostgreSQL, filled with all the consoles we viewed while our web scraper program was running.

Here we see a variety of consoles that the Sonic games were produced for, ranging from the Nintendo GameCube, to the Sega CD, the Playstation 2, among other consoles.

To Conclude…

This project allowed us to effectively utilize web scraping techniques to gather valuable data on Sonic games. By cleaning and preprocessing our data, we ensured that our findings were not only accurate but also well-organized within a neat and structured database.

Having this work done, our ‘hypothetical data analysts’ now have a well-made dataset at their fingertips. With it, they are able to make informed insights, compelling visuals, and possess a deeper understanding of the trends and patterns within the Sonic games realm.

This journey—from data extraction to its storage—has been both challenging and rewarding. It reaffirms the importance of data in driving decisions and insights in our modern world.

Previous
Previous

WebRTC Project (Zoom Video Clone)

Next
Next

Raspberry Pi Smart Robot