Free Udemy Class - Exploring Population Pyramids and Building Data Science Web Apps
Introduction
Join me for this free Udemy class where we will explore Population Pyramids using Python and we'll even build a web application where you'll be able to choose a country, a continent, even the whole world, and choose the a year, it can the current one, a past one, or even peer into the future and it will build these automatically.
Code
import matplotlib.pyplot as plt
from IPython.display import Image
Image(filename='population pyramid thumb.png', width='80%')
Let's talk about population pyramids! Almost as mysterious as the pyramids of Giza where you have to dig deep to find the hidden treasures... well not really, but you definitely should dig and I'll show you how to dig using python and where to get curated and accurate data from the United Nations that covers most countries around the world! You'll even be able to dial back in time or dial forward into the future! Exciting!
Sign up for my newsletter so you can get my free classes!
It breaks down a population, for whatever zone, a town, country, continent, world, into even age buckets and charts them into a bar plot. Traditionally males go on the left and females on the right, and I even stereotyped the coloring, and I don't like stereotypes but as I have less than an hour to prep these videos... there can only be so much imagination spent.
As you can see, it forms a sideways distribution by gender and together with a sort of pyramid. A healthy society should form a pyramid, a bit like a pyramid scheme, with more younger people joining the workforce and taking care of a lesser amount of older people - but that isn't always the case.
There are 3 overaching groups:
Expansive
A healthy zone where there are more people joining the workforce and able to take care of their elders - challenges are having affordable education and enough jobs.
Constrictive
There are too many old people and not enough young to take care of them, this is going to be a burden on the working-age folks to take care of the older ones, you'll need a lot of healthcare, geriatric and palliative type care that is very time consuming and expensive.
Stationary - a sort of equilibirum.
Let's dig into the code and we won't cover too much of the analysis but if you take this on, you can take this very far and understand things better understand birth rates, fertility, aging populations, the types of pressures a society will face, human migrations, etc.
# Source: http://darkroom.baltimoresun.com/2016/10/building-human-towers-in-catalonia-spain/#1
Image(filename='Spain-Catalonia-Human-Towers-Castellers-10032016-18.jpg', width='80%')
Image(filename='us_pop_pyr_1970.png', width='80%')
Population Pyramid¶
https://en.wikipedia.org/wiki/Population_pyramid
"A population pyramid also called an "age-sex-pyramid", is a graphical illustration that shows the distribution of various age groups in a population (typically that of a country or region of the world), which forms the shape of a pyramid when the population is growing.[1] Males are conventionally shown on the left and females on the right, and they may be measured by the raw numbers or as a percentage of the total population. This tool can be used to visualize and age of a particular population.[2] It is also used in ecology to determine the overall age distribution of a population; an indication of the reproductive capabilities and likelihood of the continuation of a species."
Get the data - U.N. Department of Economic and Social Affairs - Population Dynamics¶
https://population.un.org/wpp/Download/Standard/CSV/
Medium variant, annual, from 1950 to 2100 (CSV, 113.05 MB)
# Current top bottom up, top up down
import numpy as np
import pandas as pd
pop_df = pd.read_csv('WPP2019_PopulationByAgeSex_Medium.csv')
print(pop_df.shape)
pop_df.head()
Plotting Population Pyramids¶
sorted(list(set(pop_df['Location'] ) ) )
len(list(set(pop_df['Location'])))
print(np.min(pop_df['Time']), np.max(pop_df['Time']))
pop_df_tmp = pop_df[(pop_df['Location']=='Japan') & (pop_df['Time']==2020)]
pop_df_tmp = pop_df_tmp.sort_values('AgeGrpStart',ascending=True)
pop_df_tmp.head()
country = 'United States of America'
year = 1970
pop_df_tmp = pop_df[(pop_df['Location']==country) & (pop_df['Time']==year)]
pop_df_tmp = pop_df_tmp.sort_values('AgeGrpStart',ascending=True)
y = range(0, len(pop_df_tmp))
x_male = pop_df_tmp['PopMale']
x_female = pop_df_tmp['PopFemale']
# max xlim
max_x_scale = max(max(x_female), max(x_male))
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(10, 8))
fig.patch.set_facecolor('xkcd:Beige')
plt.figtext(.5,.9,country + ": " + str(year), fontsize=15, ha='center')
axes[0].barh(y, x_male, align='center', color='lightblue')
axes[0].set(title='Males')
axes[0].set(xlim=[0,max_x_scale])
axes[1].barh(y, x_female, align='center', color='pink')
axes[1].set(title='Females')
axes[1].set(xlim=[0,max_x_scale])
axes[1].grid()
axes[0].set(yticks=y, yticklabels=pop_df_tmp['AgeGrp'])
axes[0].invert_xaxis()
axes[0].grid()
plt.show()
def plot_poulation_pyramid(country, year):
pop_df_tmp = pop_df[(pop_df['Location']==country) & (pop_df['Time']==year)]
pop_df_tmp = pop_df_tmp.sort_values('AgeGrpStart',ascending=True)
y = range(0, len(pop_df_tmp))
x_male = pop_df_tmp['PopMale']
x_female = pop_df_tmp['PopFemale']
# max xlim
max_x_scale = max(max(x_female), max(x_male))
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(10, 8))
fig.patch.set_facecolor('xkcd:Beige')
plt.figtext(.5,.9,country + ": " + str(year), fontsize=15, ha='center')
axes[0].barh(y, x_male, align='center', color='lightblue')
axes[0].set(title='Males')
axes[0].set(xlim=[0,max_x_scale])
axes[1].barh(y, x_female, align='center', color='pink')
axes[1].set(title='Females')
axes[1].set(xlim=[0,max_x_scale])
axes[1].grid()
axes[0].set(yticks=y, yticklabels=pop_df_tmp['AgeGrp'])
axes[0].invert_xaxis()
axes[0].grid()
plt.show()
Qatar - Highest Male Ratio¶
country = 'Qatar'
year = 2020
plot_poulation_pyramid(country, year)
Sierra Leone¶
country = 'Estonia'
year = 2020
plot_poulation_pyramid(country, year)
Libya¶
country = 'Libya'
year = 2020
plot_poulation_pyramid(country, year)
Europe¶
country = 'Europe (48)'
year = 2020
plot_poulation_pyramid(country, year)
Japan¶
country = 'Japan'
year = 2095
plot_poulation_pyramid(country, year)
Spain¶
country = 'Spain'
year = 2020
plot_poulation_pyramid(country, year)
Make a Matplotlib Animation¶
country='World'
counter = 0
for yr in list(range(1950,2100,1)):
year = yr
pop_df_tmp = pop_df[(pop_df['Location']==country) & (pop_df['Time']==year)]
pop_df_tmp = pop_df_tmp.sort_values('AgeGrpStart',ascending=True)
y = range(0, len(pop_df_tmp))
x_male = pop_df_tmp['PopMale']
x_female = pop_df_tmp['PopFemale']
# max xlim
max_x_scale = max(max(x_female), max(x_male))
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(10, 8))
fig.patch.set_facecolor('xkcd:Beige')
plt.figtext(.5,.9,country + ": " + str(year), fontsize=30, ha='center')
axes[0].barh(y, x_male, align='center', color='lightblue')
axes[0].set(title='Males')
axes[0].set(xlim=[0,max_x_scale])
axes[1].barh(y, x_female, align='center', color='pink')
axes[1].set(title='Females')
axes[1].set(xlim=[0,max_x_scale])
axes[1].grid()
axes[0].set(yticks=y, yticklabels=pop_df_tmp['AgeGrp'])
axes[0].invert_xaxis()
axes[0].grid()
plt.savefig('movie/anim_' + str(counter) + '.png')
counter += 1
# make a video out of it...
# if you need help installing FFMPEG:
# https://github.com/adaptlearning/adapt_authoring/wiki/Installing-FFmpeg
ffmpeg -framerate 10 -i "anim_%d.png" -pix_fmt yuv420p out.mp4
Building our Population Pyramid Building Web Application¶
We are going to base our web portal on a great and fully resposive HTML template from the w3Schools:
https://www.w3schools.com/w3css/w3css_templates.asp
Make sure you go through the Flask tutorial if you haven't already done so and get yourself a PythonAnywhere account as shown in the previous class.
Create a new folder called 'flask_app_financial_portal' and upload the following files in their appropriate folders:
udemy-population-pyramids
├── flask_app.py
├── WPP2019_PopulationByAgeSex.csv
└── templates
└── build-a-pyramid.html
# Create a smaller version to fit through the 100 MB limit on PythonAnywhere
pop_df = pd.read_csv('WPP2019_PopulationByAgeSex_Medium.csv')
pop_df = pop_df[['Location','Time', 'AgeGrp', 'AgeGrpStart', 'PopMale', 'PopFemale']]
pop_df.to_csv('WPP2019_PopulationByAgeSex.csv')
pop_df.head()
flask_app.py¶
#!/usr/bin/env python
from flask import Flask, render_template, flash, request, jsonify, Markup
import matplotlib
import matplotlib.pyplot as plt
import io, os, base64
import numpy as np
import pandas as pd
# global variables
app = Flask(__name__)
pop_df = None
location_list = None
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
@app.before_first_request
def startup():
global pop_df, location_list
# load and prepare the data
pop_df = pd.read_csv(BASE_DIR + '/WPP2019_PopulationByAgeSex.csv')
location_list = sorted(list(set(pop_df['Location'])))
def get_poulation_pyramid(country, year):
pop_df_tmp = pop_df[(pop_df['Location']==country) & (pop_df['Time']==year)].copy()
pop_df_tmp = pop_df_tmp.sort_values('AgeGrpStart',ascending=True)
return(pop_df_tmp)
@app.route("/", methods=['POST', 'GET'])
def build_pyramid():
plot_to_show = ''
selected_country = ''
country_list = ''
selected_year = ''
if request.method == 'POST':
selected_country = request.form['selected_country']
selected_year = int(request.form['selected_year'])
pop_df_tmp = get_poulation_pyramid(selected_country, selected_year)
y = range(0, len(pop_df_tmp))
x_male = pop_df_tmp['PopMale']
x_female = pop_df_tmp['PopFemale']
# max xlim
max_x_scale = max(max(x_female), max(x_male))
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(12, 10))
fig.patch.set_facecolor('xkcd:Beige')
plt.figtext(.5,.9,selected_country + ": " + str(selected_year), fontsize=15, ha='center')
axes[0].barh(y, x_male, align='center', color='lightblue')
axes[0].set(title='Males')
axes[0].set(xlim=[0,max_x_scale])
axes[1].barh(y, x_female, align='center', color='pink')
axes[1].set(title='Females')
axes[1].set(xlim=[0,max_x_scale])
axes[1].grid()
axes[0].set(yticks=y, yticklabels=pop_df_tmp['AgeGrp'])
axes[0].invert_xaxis()
axes[0].grid()
img = io.BytesIO()
plt.savefig(img, format='png')
img.seek(0)
plot_url = base64.b64encode(img.getvalue()).decode()
plot_to_show = Markup('<img src="data:image/png;base64,{}" style="width:100%;vertical-align:top">'.format(plot_url))
return render_template('build-a-pyramid.html',
plot_to_show = plot_to_show,
selected_country = selected_country,
location_list = location_list,
selected_year = selected_year)
if __name__=='__main__':
app.run(debug=True)
build-a-pyramid.html¶
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="UTF-8">
<title>Population Pyramids</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<link rel="stylesheet" href="https://www.w3schools.com/w3css/4/w3.css">
</head>
<body bgcolor="black">
<div class="w3-center w3-padding">
<H1><font color='lightblue'>ViralML</font> <font color='white'>Population</font> <font color='pink'>Pyramids</font></H1>
<FORM id='submit_content' method="POST" action="{{ url_for('build_pyramid') }}">
<SELECT class="selectpicker" name="selected_country">
<option value="" selected></option>
</SELECT>
<SELECT class="selectpicker" name="selected_year" >
<option value="" selected></option>
<option value="1950">1950</option>
<option value="1960">1960</option>
<option value="1970">1970</option>
<option value="1980">1980</option>
<option value="1990">1990</option>
<option value="2000">2000</option>
<option value="2010">2010</option>
<option value="2020">2020</option>
<option value="2030">2030</option>
<option value="2040">2040</option>
<option value="2050">2050</option>
<option value="2060">2060</option>
<option value="2070">2070</option>
<option value="2080">2080</option>
<option value="2090">2090</option>
</SELECT>
<button type="submit" form="submit_content" value="Submit">Build</button>
</FORM>
</div>
</body>
</html>
Show Notes
(pardon typos and formatting -these are the notes I use to make the videos)
Join me for this free Udemy class where we will explore Population Pyramids using Python and we'll even build a web application where you'll be able to choose a country, a continent, even the whole world, and choose the a year, it can the current one, a past one, or even peer into the future and it will build these automatically.