Covid-19 data visualization with Ruby, Python, Rails, Chartkick, and PostgreSQL.
Join the DZone community and get the full member experience. As a developer, you will be seeing numerous articles on Big Data, containers, complex algorithms, caching, etc. But the reality is that a lot of us still have to solve simple problems, especially if one is a freelance programmer or working with small companies. A simple use case is that of data, coming in spreadsheets or CSV files, have to be visualized in a simple dashboard. You and the customer agree to build it as a web application. There are plentiful ways, from PHP to the Java-based Metabase, of implementing the solution. Since I have experience with Ruby on Rails (RoR or just Rails) and it has extensive easy-to-use libraries, it’s my first go-to choice to build a web application real quick. The application I describe in this article uses COVID-19 data. In addition to Rails for the server side runtime, I have used Python and Ruby, to extract, translate, and load a PostgreSQL database. For the data visualization, I used the JavaScript library Chartkick. Thus, it is a polyglot solution. I used two data sources. The country-level data are from the GitHub COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. For my country, India, the state-level data are from covid19india.org. There are two main differences in the input data. The global data from CSSE gives the daily cumulative data up until that date in a CSV file. The covid19india data comes as daily incremental data in JSON. The data format I wanted was: place, confirmed, deaths, and recovered. For the global data, place is the country name, and for India, it is the state name. In order to process the input data, I wrote two scripts in Python and three in Ruby; their functionality is described below. gdc.py: This Python program takes the CSSE daily CSV file as the input. These files are named date wise, for example,11-10-2020.csv. Up until 21st March 2020, these input files had these values in the second, fourth, fifth, and sixth fields. Files from 22nd March have these values in the fourth, eighth, ninth, and tenth fields. Therefore, depending on the file date, the program selects the correct indices to extract values. Some countries have two words, with a comma in between. These are mapped to a single word for proper splitting of the line with comma as separator. For example: “Gambia, The” in the input is changed to Gambia in the output. The USA data are not aggregated. The same is the case with some other countries. So, as each line is processed, the first time a country is encountered, it creates a key in a dictionary called country_data and inserts the values as an array. Subsequently, if the same country’s line comes, the program adds the values in the array corresponding to that country’s key.