Get the "Applied Data Science Edge"!

The ViralML School

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

Web Work

Python Web Work - Prototyping Guide for Maker

Use HTML5 Templates, Serve Dynamic Content, Build Machine Learning Web Apps, Grow Audiences & Conquer the World!

Hot off the Press!

The Little Book of Fundamental Market Indicators

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!


Sign up for my newsletter and get my free intro class:

COVID-19: Track, Map, and Animate the Coronavirus with Python & Basemap

Introduction

If you want to stay on top of the coronavirus, one great way to do so it to plot its progression over time and around the world. We're going to do just that using Python and we'll even animate the progress, as shown here. This is a great way to get involved and keep an eye on this developing story.



If you liked it, please share it:

Code

ViralML-Track-Map-Animate-Coronavirus-Progress
In [3]:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import datetime, time, requests
from time import sleep

# you will need to pip install Basemap - https://matplotlib.org/basemap/users/installing.html
from mpl_toolkits.basemap import Basemap
In [4]:
import matplotlib.pyplot as plt
from IPython.display import Image
Image(filename='Covid-19.png', width='80%')
Out[4]:

Get Data from Kaggle

Novel Corona Virus 2019 Dataset:

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

In [57]:
covid_19_data = pd.read_csv('novel-corona-virus-2019-dataset/covid_19_data.csv')
covid_19_data['ObservationDate'] = pd.to_datetime(covid_19_data['ObservationDate'])
covid_19_data = covid_19_data.sort_values('ObservationDate', ascending=True)
print('Shape:', covid_19_data.shape)
print('Date min:', np.min(covid_19_data['ObservationDate']), 'Date max:', np.max(covid_19_data['ObservationDate']))
# replace NaN Provinces with string
covid_19_data['Province/State'] = covid_19_data['Province/State'].fillna('No_Province')
covid_19_data.tail()
Shape: (3395, 8)
Date min: 2020-01-22 00:00:00 Date max: 2020-03-04 00:00:00
Out[57]:
SNo ObservationDate Province/State Country/Region Last Update Confirmed Deaths Recovered
3289 3290 2020-03-04 No_Province United Arab Emirates 2020-03-03T23:43:02 27.0 0.0 5.0
3290 3291 2020-03-04 No_Province Iceland 2020-03-04T19:33:03 26.0 0.0 0.0
3291 3292 2020-03-04 No_Province Belgium 2020-03-04T12:33:03 23.0 0.0 1.0
3313 3314 2020-03-04 Snohomish County, WA US 2020-03-04T19:53:02 8.0 1.0 0.0
3394 3395 2020-03-04 Travis, CA (From Diamond Princess) US 2020-02-24T23:33:02 0.0 0.0 0.0

Brief Data Prep and Exploration

In [58]:
# how many NaNs?
count_nan = len(covid_19_data) - covid_19_data.count()
count_nan
Out[58]:
SNo                0
ObservationDate    0
Province/State     0
Country/Region     0
Last Update        0
Confirmed          0
Deaths             0
Recovered          0
dtype: int64
In [59]:
# how many countries do we have?
countries =list(set(covid_19_data['Country/Region']))
print('Unique Country/Regio found:', str(len(countries)))
countries
Unique Country/Regio found: 90
Out[59]:
['Saudi Arabia',
 'Algeria',
 'Portugal',
 'Austria',
 'Kuwait',
 'North Ireland',
 'United Arab Emirates',
 'Tunisia',
 'Azerbaijan',
 'Norway',
 'Hungary',
 'Vietnam',
 'Romania',
 'San Marino',
 'Afghanistan',
 'Bahrain',
 'Chile',
 'Argentina',
 'Singapore',
 'France',
 'Nigeria',
 'Philippines',
 'Belgium',
 'Nepal',
 'Iraq',
 'Ukraine',
 'Japan',
 'Senegal',
 'Germany',
 'Croatia',
 'Saint Barthelemy',
 'Iran',
 'Pakistan',
 'US',
 'Latvia',
 'Thailand',
 'Egypt',
 'Morocco',
 'Georgia',
 'Gibraltar',
 'Lithuania',
 'Qatar',
 'Faroe Islands',
 'Liechtenstein',
 'Indonesia',
 'Greece',
 'Dominican Republic',
 'Brazil',
 'Ireland',
 'Ecuador',
 'Finland',
 'Canada',
 'UK',
 'Denmark',
 'Others',
 'Czech Republic',
 'Italy',
 'Switzerland',
 'Armenia',
 'Luxembourg',
 'North Macedonia',
 'Netherlands',
 'Lebanon',
 'Poland',
 'Cambodia',
 'Estonia',
 'Jordan',
 'Sweden',
 'Belarus',
 'Hong Kong',
 'Spain',
 'South Korea',
 'Monaco',
 'Iceland',
 'Ivory Coast',
 'Russia',
 'Mexico',
 'Taiwan',
 ' Azerbaijan',
 'Malaysia',
 'Colombia',
 'New Zealand',
 'Australia',
 'Oman',
 'Israel',
 'India',
 'Andorra',
 'Sri Lanka',
 'Mainland China',
 'Macau']
In [60]:
# how many province/states do we have?
zones =list((set(covid_19_data['Province/State'])))
print('Unique Province/State found:', str(len(zones)))
Unique Province/State found: 99

Use openstreetmap Rest API to get lat/lon for each country

In [61]:
def get_lat_lon(zone, 
                output_as = 'center'):
    # thanks openstreetmap! 
    # create url
    url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search?q=',
                             zone,
                             '&format=json&polygon=0')
    # send out request
    response = requests.get(url).json()[0]

    # parse response to list
    if output_as == 'boundingbox':
        lst = response[output_as]
        output = [float(i) for i in lst]
    if output_as == 'center':
        lst = [response.get(key) for key in ['lon','lat']]
        output = [float(i) for i in lst]
        
    return output
In [11]:
geo_centers_lon = []
geo_centers_lat = []
total_ctry = len(countries)
counter_ = 0
for ctry in countries:
    if counter_ % 10 == 0: print(total_ctry - counter_)
    time.sleep(0.2)
    centroid = [None, None]
    try:
        centroid = get_lat_lon(ctry, output_as='center')

    except:
        print('Could not find:', ctry)
        
    geo_centers_lon.append(centroid[0])
    geo_centers_lat.append(centroid[1])
        
     
    counter_ += 1
90
80
70
60
50
40
30
20
10
In [62]:
# Add geos back to data frame
full_lats = []
full_lons = []
for i, r in covid_19_data.iterrows():
    country = r['Country/Region']
    index_list = countries.index(country)
    full_lats.append(geo_centers_lat[index_list])
    full_lons.append(geo_centers_lon[index_list])
     
# add to data frame
covid_19_data['Longitude'] = full_lons
covid_19_data['Latitude'] = full_lats
covid_19_data.head(10)
Out[62]:
SNo ObservationDate Province/State Country/Region Last Update Confirmed Deaths Recovered Longitude Latitude
0 1 2020-01-22 Anhui Mainland China 1/22/2020 17:00 1.0 0.0 0.0 72.833570 19.140625
21 22 2020-01-22 Ningxia Mainland China 1/22/2020 17:00 1.0 0.0 0.0 72.833570 19.140625
22 23 2020-01-22 Qinghai Mainland China 1/22/2020 17:00 0.0 0.0 0.0 72.833570 19.140625
23 24 2020-01-22 Shaanxi Mainland China 1/22/2020 17:00 0.0 0.0 0.0 72.833570 19.140625
24 25 2020-01-22 Shandong Mainland China 1/22/2020 17:00 2.0 0.0 0.0 72.833570 19.140625
25 26 2020-01-22 Shanghai Mainland China 1/22/2020 17:00 9.0 0.0 0.0 72.833570 19.140625
26 27 2020-01-22 Shanxi Mainland China 1/22/2020 17:00 1.0 0.0 0.0 72.833570 19.140625
27 28 2020-01-22 Sichuan Mainland China 1/22/2020 17:00 5.0 0.0 0.0 72.833570 19.140625
20 21 2020-01-22 Macau Macau 1/22/2020 17:00 1.0 0.0 0.0 113.551414 22.175760
28 29 2020-01-22 Taiwan Taiwan 1/22/2020 17:00 1.0 0.0 0.0 120.835363 23.598298
In [63]:
covid_19_data[covid_19_data['Province/State'] == 'Shanghai']
Out[63]:
SNo ObservationDate Province/State Country/Region Last Update Confirmed Deaths Recovered Longitude Latitude
25 26 2020-01-22 Shanghai Mainland China 1/22/2020 17:00 9.0 0.0 0.0 72.83357 19.140625
63 64 2020-01-23 Shanghai Mainland China 1/23/20 17:00 16.0 0.0 0.0 72.83357 19.140625
91 92 2020-01-24 Shanghai Mainland China 1/24/20 17:00 20.0 0.0 1.0 72.83357 19.140625
132 133 2020-01-25 Shanghai Mainland China 1/25/20 17:00 33.0 0.0 1.0 72.83357 19.140625
179 180 2020-01-26 Shanghai Mainland China 1/26/20 16:00 40.0 1.0 1.0 72.83357 19.140625
228 229 2020-01-27 Shanghai Mainland China 1/27/20 23:59 53.0 1.0 3.0 72.83357 19.140625
280 281 2020-01-28 Shanghai Mainland China 1/28/20 23:00 66.0 1.0 4.0 72.83357 19.140625
331 332 2020-01-29 Shanghai Mainland China 1/29/20 19:30 96.0 1.0 5.0 72.83357 19.140625
385 386 2020-01-30 Shanghai Mainland China 1/30/20 16:00 112.0 1.0 5.0 72.83357 19.140625
443 444 2020-01-31 Shanghai Mainland China 1/31/2020 23:59 135.0 1.0 9.0 72.83357 19.140625
504 505 2020-02-01 Shanghai Mainland China 2/1/2020 6:05 169.0 1.0 10.0 72.83357 19.140625
572 573 2020-02-02 Shanghai Mainland China 2020-02-02T05:53:02 182.0 1.0 10.0 72.83357 19.140625
639 640 2020-02-03 Shanghai Mainland China 2020-02-03T07:03:12 203.0 1.0 10.0 72.83357 19.140625
707 708 2020-02-04 Shanghai Mainland China 2020-02-04T06:33:02 219.0 1.0 12.0 72.83357 19.140625
777 778 2020-02-05 Shanghai Mainland China 2020-02-05T06:23:04 243.0 1.0 15.0 72.83357 19.140625
848 849 2020-02-06 Shanghai Mainland China 2020-02-06T06:53:07 257.0 1.0 25.0 72.83357 19.140625
920 921 2020-02-07 Shanghai Mainland China 2020-02-07T06:14:15 277.0 1.0 30.0 72.83357 19.140625
992 993 2020-02-08 Shanghai Mainland China 2020-02-08T06:33:02 286.0 1.0 41.0 72.83357 19.140625
1064 1065 2020-02-09 Shanghai Mainland China 2020-02-09T06:33:01 293.0 1.0 44.0 72.83357 19.140625
1136 1137 2020-02-10 Shanghai Mainland China 2020-02-10T06:03:13 299.0 1.0 48.0 72.83357 19.140625
1208 1209 2020-02-11 Shanghai Mainland China 2020-02-11T06:23:02 303.0 1.0 52.0 72.83357 19.140625
1281 1282 2020-02-12 Shanghai Mainland China 2020-02-12T06:23:08 311.0 1.0 57.0 72.83357 19.140625
1354 1355 2020-02-13 Shanghai Mainland China 2020-02-13T06:13:15 315.0 1.0 62.0 72.83357 19.140625
1428 1429 2020-02-14 Shanghai Mainland China 2020-02-14T04:13:11 318.0 1.0 90.0 72.83357 19.140625
1503 1504 2020-02-15 Shanghai Mainland China 2020-02-15T03:13:06 326.0 1.0 124.0 72.83357 19.140625
1579 1580 2020-02-16 Shanghai Mainland China 2020-02-16T02:53:02 328.0 1.0 140.0 72.83357 19.140625
1654 1655 2020-02-17 Shanghai Mainland China 2020-02-17T23:53:02 333.0 1.0 161.0 72.83357 19.140625
1729 1730 2020-02-18 Shanghai Mainland China 2020-02-18T05:03:08 333.0 1.0 177.0 72.83357 19.140625
1804 1805 2020-02-19 Shanghai Mainland China 2020-02-19T04:43:02 333.0 2.0 186.0 72.83357 19.140625
1880 1881 2020-02-20 Shanghai Mainland China 2020-02-20T06:03:03 334.0 2.0 199.0 72.83357 19.140625
1956 1957 2020-02-21 Shanghai Mainland China 2020-02-21T05:53:01 334.0 2.0 211.0 72.83357 19.140625
2041 2042 2020-02-22 Shanghai Mainland China 2020-02-22T06:33:17 335.0 3.0 227.0 72.83357 19.140625
2125 2126 2020-02-23 Shanghai Mainland China 2020-02-23T03:13:07 335.0 3.0 249.0 72.83357 19.140625
2210 2211 2020-02-24 Shanghai Mainland China 2020-02-24T07:03:06 335.0 3.0 261.0 72.83357 19.140625
2300 2301 2020-02-25 Shanghai Mainland China 2020-02-25T06:33:02 336.0 3.0 268.0 72.83357 19.140625
2395 2396 2020-02-26 Shanghai Mainland China 2020-02-26T23:53:01 337.0 3.0 272.0 72.83357 19.140625
2496 2497 2020-02-27 Shanghai Mainland China 2020-02-27T04:03:12 337.0 3.0 276.0 72.83357 19.140625
2602 2603 2020-02-28 Shanghai Mainland China 2020-02-28T04:53:03 337.0 3.0 279.0 72.83357 19.140625
2716 2717 2020-02-29 Shanghai Mainland China 2020-02-29T06:23:03 337.0 3.0 287.0 72.83357 19.140625
2835 2836 2020-03-01 Shanghai Mainland China 2020-03-01T07:13:07 337.0 3.0 290.0 72.83357 19.140625
2960 2961 2020-03-02 Shanghai Mainland China 2020-03-02T04:03:13 337.0 3.0 292.0 72.83357 19.140625
3101 3102 2020-03-03 Shanghai Mainland China 2020-03-03T03:43:02 338.0 3.0 294.0 72.83357 19.140625
3252 3253 2020-03-04 Shanghai Mainland China 2020-03-04T04:13:08 338.0 3.0 298.0 72.83357 19.140625

Plot Infection Counts by Country using Basemap

You may need to install Basemap on your machine:

https://matplotlib.org/basemap/users/installing.html

In [64]:
def plot_world_map(virus_data, date, save_to_file_name = ''):
    # Set the dimension of the figure
    #plt.figure(figsize=(16, 8))
    # Set the dimension of the figure
    my_dpi=96
    plt.figure(figsize=(2600/my_dpi, 1800/my_dpi), dpi=my_dpi)

    # Make the background map
    m=Basemap(llcrnrlon=-180, llcrnrlat=-65,urcrnrlon=180,urcrnrlat=80)
    m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
    m.fillcontinents(color='grey', alpha=0.3)
    m.drawcoastlines(linewidth=0.1, color="white")
    
    total_cases = np.sum(virus_data['Confirmed'])

    # Add a point per position
    m.scatter(virus_data['Longitude'], 
              virus_data['Latitude'], 
              s = virus_data['Confirmed'] * 8, # play around with the size or use np.log if you dont like the big circles
              alpha=0.4, 
              c=virus_data['labels_enc'], 
              cmap="Set1")

    plt.title(str(date) + ' Confirmed Covid-19 Cases: ' + str(int(total_cases)) + '\n(circles not to scale)', fontsize=50)
    
    if save_to_file_name != '':
        plt.savefig(save_to_file_name)
        
    plt.show()
    
    
In [65]:
# Create color map
# prepare a color for each point depending on the continent.
covid_19_data['labels_enc'] = pd.factorize(covid_19_data['Country/Region'])[0]
covid_19_data['labels_enc']
Out[65]:
0        0
21       0
22       0
23       0
24       0
        ..
3289    23
3290    62
3291    31
3313     3
3394     3
Name: labels_enc, Length: 3395, dtype: int64
In [67]:
date = '2020-03-04' 

virus_up_to_today = covid_19_data[covid_19_data['ObservationDate'] <= date]

# simplify data set
virus_up_to_today = virus_up_to_today[['Country/Region','Province/State', 'labels_enc', 'Confirmed',
                     'Deaths', 'Recovered',
                     'Longitude', 'Latitude']]


# get totals by province then by country as these are cumulative values by province first then by country and not all countries have provinces

# group by country and sum/mean values
virus_up_to_today=virus_up_to_today.groupby(['Country/Region', 'Province/State', 'labels_enc']).agg({'Confirmed':'last', 
                           'Deaths':'last',
                           'Recovered':'last',
                           'Longitude':'mean',
                          'Latitude':'mean'}).reset_index()



# group by country and sum/mean values
virus_up_to_today=virus_up_to_today.groupby(['Country/Region', 'labels_enc']).agg({'Confirmed':'sum', 
                           'Deaths':'sum',
                           'Recovered':'sum',
                           'Longitude':'mean',
                          'Latitude':'mean'}).reset_index()

# map out confirmed cases
plot_world_map(virus_up_to_today, str(date)[0:10])
 
In [69]:
# build time lapse with accumulator count by country
dates = sorted(list(set(covid_19_data['ObservationDate'])))
dates
Out[69]:
[Timestamp('2020-01-22 00:00:00'),
 Timestamp('2020-01-23 00:00:00'),
 Timestamp('2020-01-24 00:00:00'),
 Timestamp('2020-01-25 00:00:00'),
 Timestamp('2020-01-26 00:00:00'),
 Timestamp('2020-01-27 00:00:00'),
 Timestamp('2020-01-28 00:00:00'),
 Timestamp('2020-01-29 00:00:00'),
 Timestamp('2020-01-30 00:00:00'),
 Timestamp('2020-01-31 00:00:00'),
 Timestamp('2020-02-01 00:00:00'),
 Timestamp('2020-02-02 00:00:00'),
 Timestamp('2020-02-03 00:00:00'),
 Timestamp('2020-02-04 00:00:00'),
 Timestamp('2020-02-05 00:00:00'),
 Timestamp('2020-02-06 00:00:00'),
 Timestamp('2020-02-07 00:00:00'),
 Timestamp('2020-02-08 00:00:00'),
 Timestamp('2020-02-09 00:00:00'),
 Timestamp('2020-02-10 00:00:00'),
 Timestamp('2020-02-11 00:00:00'),
 Timestamp('2020-02-12 00:00:00'),
 Timestamp('2020-02-13 00:00:00'),
 Timestamp('2020-02-14 00:00:00'),
 Timestamp('2020-02-15 00:00:00'),
 Timestamp('2020-02-16 00:00:00'),
 Timestamp('2020-02-17 00:00:00'),
 Timestamp('2020-02-18 00:00:00'),
 Timestamp('2020-02-19 00:00:00'),
 Timestamp('2020-02-20 00:00:00'),
 Timestamp('2020-02-21 00:00:00'),
 Timestamp('2020-02-22 00:00:00'),
 Timestamp('2020-02-23 00:00:00'),
 Timestamp('2020-02-24 00:00:00'),
 Timestamp('2020-02-25 00:00:00'),
 Timestamp('2020-02-26 00:00:00'),
 Timestamp('2020-02-27 00:00:00'),
 Timestamp('2020-02-28 00:00:00'),
 Timestamp('2020-02-29 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-03-02 00:00:00'),
 Timestamp('2020-03-03 00:00:00'),
 Timestamp('2020-03-04 00:00:00')]
In [70]:
image_file_name_counter = 0
for date in dates:
    virus_up_to_today = covid_19_data[covid_19_data['ObservationDate'] <= date]
    
    # simplify data set
    virus_up_to_today = virus_up_to_today[['Country/Region','Province/State', 'labels_enc', 'Confirmed',
                         'Deaths', 'Recovered',
                         'Longitude', 'Latitude']]


    # get totals by province then by country as these are cumulative values by province first then by country and not all countries have provinces

    # group by country and sum/mean values
    virus_up_to_today=virus_up_to_today.groupby(['Country/Region', 'Province/State', 'labels_enc']).agg({'Confirmed':'last', 
                               'Deaths':'last',
                               'Recovered':'last',
                               'Longitude':'mean',
                              'Latitude':'mean'}).reset_index()



    # group by country and sum/mean values
    virus_up_to_today=virus_up_to_today.groupby(['Country/Region', 'labels_enc']).agg({'Confirmed':'sum', 
                               'Deaths':'sum',
                               'Recovered':'sum',
                               'Longitude':'mean',
                              'Latitude':'mean'}).reset_index()
     
    # map out confirmed cases
    file_to_save_name = 'movie/anim_' + str(image_file_name_counter) + '.png'
    plot_world_map(virus_up_to_today, str(date)[0:10], file_to_save_name)
    
  
    image_file_name_counter += 1