Get the "Applied Data Science Edge"!

The ViralML School

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

Web Work

Python Web Work - Prototyping Guide for Maker

Use HTML5 Templates, Serve Dynamic Content, Build Machine Learning Web Apps, Grow Audiences & Conquer the World!

Hot off the Press!

The Little Book of Fundamental Market Indicators

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!

How to Map the World's Organic Farms & Farmer's Markets with Python and OpenStreetMap

Introduction

Let's talk about food production and food distribution by exploring 2 great and socially impactful data sets. We'll start by looking at the percentage of farming area dedicated to organic agriculture around the world and will follow with the location of farmer's markets around the United States.



If you liked it, please share it:

Code

Not Food Deserts, But Farmer's Market Desert
In [5]:
from IPython.display import Image
Image(filename='who-produces-good-food.png', width='80%')
Out[5]:

Healthy Food - World Agriculture - Organic VS Non & Locations of Farmers Markets in the US and

Let's talk about food production and food distribution by exploring 2 great and socially impactful data sets. We'll start by looking at the percentage of farming area dedicated to organic agriculture around the world and will follow with the location of farmer's markets around the United States.

Food is a big deal and can tell us a lot about how we think about health, the amount of money we're willing to spend on diets and how we support our local economies.

While we explore those data sets, I'll also show you how a cool function using openstreetmap to easily get geo coordinates using a country's name and how to get some quick stats by leveraging the ubiquitous pandas "groupby" function. Are you ready?

Welcome to the ViralML Show, my name is manuel amunategui, your host and promoter of pporting ml to the web so everybody gets to enjoy what we do, not just those that can navigate a Jupyter notbook. Sign up for my newlsetter and give this video some love.

Have you heard the term "Food Desert"? It's not a misspelling.

The United States Government defines a food desert as “a low income census tract where a substantial number or share of residents has low access to a supermarket or large grocery store.”

This is a big deal, rural areas and poor urban neighborhoods don't have access to quality foods especially fruits and vegetables and instead are stuck with high-calorie food sources such as junk food and fast food. That can lead to health issues such as diabetes and to those who have a disease, very difficult to manage them with poor food offerings.

Global Organic Farming

"There is no universally accepted definition of organic farming , but most consider it to be a specific production system that aims to avoid the use of synthetic and harmful pesticides, fertilizers, growth regulators, and livestock feed additives." https://www.thebalancesmb.com/the-definition-of-organic-farming-2538081

Organic Standards were published in 2002. And requires the producer to register and be certified on a regular basis that they are following stantarrds...

Whether you don't like pesticides or genetically modified seeds in your food or your environment is besides the point. Organic farming is over 2x more profitable for farmers and it as a much lower impact on the enviroment. It also requires more labour thus can have a huge impact on the farming community.

"Producers, distributors and marketers of organic products must register with their local control body before they are allowed to market their food as organic after they have been inspected and checked, they will be awarded a certificate to confirm that their products meet organic standards all operators are checked at least once a year to make sure that they are continuing to follow the rules" https://ec.europa.eu/info/food-farming-fisheries/farming/organic-farming/organics-glance_en

In [6]:
Image(filename='Organic_farming_area_EU_and_EFTA.rev.jpg', width='60%')
Out[6]:

Global Farming Land - Percentage Allocated to Ogranic Farming

Research Institute of Organic Agriculture

https://www.fibl.org/en/themes/statistics-info.html

https://statistics.fibl.org/world/area-world.html?tx_statisticdata_pi1%5Bcontroller%5D=Element2Item&cHash=f367262839ab9ca2e7ac1f333fbb1ca2

Organic agriculture world-wide and in Europe

(source: https://www.fibl.org/en/themes/statistics-info.html)

"The main results of the latest survey on certified organic agriculture world-wide show (data end of 2017) that nearly 70 million hectares of agricultural land are managed organically. Growth was noted for all important indicators: Area, producers and retail sales.

On a global level, the organic agricultural land area increased by 11.7 million hectares or 20 percent compared with 2016.

The highest shares of organic agricultural land are in Liechtenstein (37.9 percent) and Samoa (37.6 percent).

There were over 2.8 million producers, and the countries with the highest numbers of producers are India, Uganda and Mexico.

The market research company Ecovia Intelligence estimates the global market for organic food to have reached 97 billion US dollars in 2017 (approximately 90 billion euros). The United States is the leading market with 40 billion euros, followed by Germany (10 billion euros), France (7.9 billion euros), and China (7.6 billion euros)."
In [11]:
# FIBL Research Institute of Organic Agriculture
import pandas as pd
glb_organic_df = pd.read_csv('global-organic-area-2017.csv', sep='\t')
glb_organic_df.head()
Out[11]:
Country Year Organic area share of total farmland [%]
0 Afghanistan 2017 0
1 Albania 2017 0.05
2 Algeria 2017 0
3 Andorra 2017 0.01
4 Argentina 2017 2.28
In [12]:
glb_organic_df.shape
Out[12]:
(165, 3)
In [10]:
# https://gis.stackexchange.com/questions/212796/get-lat-lon-extent-of-country-from-name-using-python
def get_boundingbox_country(country, output_as='boundingbox'):
    """
    get the bounding box of a country in EPSG4326 given a country name

    Parameters
    ----------
    country : str
        name of the country in english and lowercase
    output_as : 'str
        chose from 'boundingbox' or 'center'. 
         - 'boundingbox' for [latmin, latmax, lonmin, lonmax]
         - 'center' for [latcenter, loncenter]

    Returns
    -------
    output : list
        list with coordinates as str
    """
    import requests
    # create url
    url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search?country=',
                             country,
                             '&format=json&polygon=0')
    response = requests.get(url).json()[0]

    # parse response to list
    if output_as == 'boundingbox':
        lst = response[output_as]
        output = [float(i) for i in lst]
    if output_as == 'center':
        lst = [response.get(key) for key in ['lon','lat']]
        output = [float(i) for i in lst]
    return output


import time
ctry_geo_centers_lon = []
ctry_geo_centers_lat = []
total_ctry = len(glb_organic_df['Country'])
counter_ = 0
for ctry in glb_organic_df['Country']:
    if counter_ % 10 == 0: print(total_ctry - counter_)
    time.sleep(0.2)
    centroid = [None, None]
    try:
        centroid = get_boundingbox_country(ctry, output_as='center')

    except:
        print('Could not find:', ctry)
        
    ctry_geo_centers_lon.append(centroid[0])
    ctry_geo_centers_lat.append(centroid[1])
        
     
    counter_ += 1
     
165
155
Could not find: Bolivia (Plurinational State of)
145
Could not find: Channel Islands
135
125
115
Could not find: French Guiana (France)
Could not find: French Polynesia
Could not find: Guadeloupe (France)
105
Could not find: Iran (Islamic Republic of)
95
85
75
Could not find: Martinique (France)
Could not find: Mayotte
65
55
Could not find: Panama
45
Could not find: Philippines
Could not find: Puerto Rico
Could not find: Réunion (France)
35
25
15
Could not find: United States Virgin Islands
5
In [13]:
glb_organic_df['longitude'] = ctry_geo_centers_lon
glb_organic_df['latitude'] = ctry_geo_centers_lat
glb_organic_df = glb_organic_df.dropna()

# clean up Organic area share of total farmland [%]
glb_organic_df = glb_organic_df[pd.to_numeric(glb_organic_df['Organic area share of total farmland [%]'], errors='coerce').notnull()]

# cast 'Organic area share of total farmland [%]' col to numeric 
glb_organic_df['Organic area share of total farmland [%]'] = glb_organic_df['Organic area share of total farmland [%]'].astype(float)
In [14]:
glb_organic_df.head()
Out[14]:
Country Year Organic area share of total farmland [%] longitude latitude
0 Afghanistan 2017 0.00 66.238514 33.768006
1 Albania 2017 0.05 19.999962 41.000028
2 Algeria 2017 0.00 2.999983 28.000027
3 Andorra 2017 0.01 1.573203 42.540717
4 Argentina 2017 2.28 -64.967282 -34.996496
In [15]:
glb_organic_df = glb_organic_df.sort_values('Organic area share of total farmland [%]', ascending=False)
glb_organic_df
Out[15]:
Country Year Organic area share of total farmland [%] longitude latitude
86 Liechtenstein 2017 37.86 9.553153 47.141631
7 Austria 2017 24.00 13.199959 47.200034
43 Estonia 2017 20.52 25.331908 58.752378
143 Sweden 2017 18.81 14.520858 59.674971
130 Sao Tome and Principe 2017 18.03 6.964872 0.887550
73 Italy 2017 15.39 12.674297 42.638426
84 Latvia 2017 14.81 24.753764 56.840649
144 Switzerland 2017 14.48 8.231974 46.798562
160 Uruguay 2017 13.03 -56.020153 -32.875555
35 Czech Republic 2017 12.24 15.474954 49.816700
49 Finland 2017 11.37 25.920916 63.246778
135 Slovakia 2017 9.98 19.452865 48.741152
136 Slovenia 2017 9.52 14.480837 45.813311
139 Spain 2017 8.94 -4.838065 39.326234
6 Australia 2017 8.77 134.755000 -24.776109
39 Dominican Republic 2017 8.73 -70.302803 19.097403
38 Denmark 2017 8.64 10.333328 55.670249
47 Faroe Islands 2017 8.44 -7.032297 62.044872
148 Timor-Leste 2017 8.23 125.837576 -8.515198
55 Germany 2017 8.21 10.423447 51.083420
87 Lithuania 2017 8.08 23.750000 55.350000
161 Vanuatu 2017 7.96 168.106915 -16.525507
122 Portugal 2017 6.97 -7.889626 40.033265
11 Belgium 2017 6.39 4.666715 50.640281
50 France 2017 6.29 1.888334 46.603354
32 Croatia 2017 6.15 17.011895 45.564344
140 Sri Lanka 2017 6.04 80.713785 7.555494
57 Greece 2017 5.02 21.987713 38.995368
34 Cyprus 2017 5.00 33.145128 34.982302
150 Tonga 2017 4.81 -175.202642 -19.916082
... ... ... ... ... ...
132 Senegal 2017 0.08 -14.452961 14.475061
102 Myanmar 2017 0.08 95.999965 17.175050
103 Namibia 2017 0.08 17.323111 -23.233550
127 Rwanda 2017 0.07 30.064436 -1.964663
28 Colombia 2017 0.07 -73.783892 2.889443
54 Georgia 2017 0.06 44.028738 41.680971
16 Bosnia and Herzegovina 2017 0.06 17.596147 44.305348
142 Suriname 2017 0.06 -56.077119 4.141303
1 Albania 2017 0.05 19.999962 41.000028
62 Guinea-Bissau 2017 0.05 -14.900021 12.100035
80 Kosovo 2017 0.04 20.902123 42.586958
92 Mali 2017 0.03 -2.290024 16.370036
100 Morocco 2017 0.03 -7.336248 31.172821
101 Mozambique 2017 0.03 34.914498 -19.302233
163 Zambia 2017 0.03 27.559916 -14.518624
164 Zimbabwe 2017 0.02 29.746841 -18.455496
95 Mauritius 2017 0.02 57.570357 -20.275945
44 Eswatini 2017 0.02 31.399132 -26.562481
131 Saudi Arabia 2017 0.01 42.352833 25.624262
91 Malaysia 2017 0.01 102.265682 4.569375
81 Kuwait 2017 0.01 47.497948 29.273396
22 Cameroon 2017 0.01 13.153581 4.612552
3 Andorra 2017 0.01 1.573203 42.540717
108 Niger 2017 0.00 9.323843 17.735621
70 Iraq 2017 0.00 44.174977 33.095579
61 Guinea 2017 0.00 -10.708359 10.722623
53 Gambia 2017 0.00 -15.490046 13.470062
20 Burundi 2017 0.00 29.887058 -3.363436
2 Algeria 2017 0.00 2.999983 28.000027
0 Afghanistan 2017 0.00 66.238514 33.768006

149 rows × 5 columns

In [18]:
# you will need to pip install Basemap - https://matplotlib.org/basemap/users/installing.html
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

plt.figure(figsize=(16, 8))

# Make the background map
m=Basemap(llcrnrlon=-180, llcrnrlat=-65,urcrnrlon=180,urcrnrlat=80)
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.3)
m.drawcoastlines(linewidth=0.1, color="white")
 
# prepare a color for each point depending on the continent.
glb_organic_df['labels_enc'] = pd.factorize(glb_organic_df['Country'])[0]
 
# Add a point per position
m.scatter(glb_organic_df['longitude'], glb_organic_df['latitude'], 
          s=glb_organic_df['Organic area share of total farmland [%]']*100, alpha=0.4, 
          c=glb_organic_df['labels_enc'], cmap="Set1")
 

plt.title('Organic area share of total farmland [%]' , fontsize=50) 
Out[18]:
Text(0.5, 1.0, 'Organic area share of total farmland [%]')

Farmer's Market

know your food, farmer, buy local, buy from small farmers - supporting local agriculture. This doesn't mean the food is certified organic (for those producing more than $5,000 worth of products annually) or that that it's pescticide free but it is supporting your local economy and those that chose to work the land.

Farmers Markets Directory and Geographic Data

https://catalog.data.gov/dataset/farmers-markets-geographic-data

Longitude and latitude, state, address, name, and zip code of Farmers Markets in the United States

In [19]:
import pandas as pd
farmers_market = pd.read_csv('Farmers Markets Directory and Geographic Data.csv')
farmers_market.head()
Out[19]:
FMID MarketName Website Facebook Twitter Youtube OtherMedia street city County ... Coffee Beans Fruits Grains Juices Mushrooms PetFood Tofu WildHarvested updateTime
0 1018261 Caledonia Farmers Market Association - Danville https://sites.google.com/site/caledoniafarmers... https://www.facebook.com/Danville.VT.Farmers.M... NaN NaN NaN NaN Danville Caledonia ... Y Y Y N N Y Y N N 6/20/2017 10:43:57 PM
1 1018318 Stearns Homestead Farmers' Market http://www.StearnsHomestead.com StearnsHomesteadFarmersMarket NaN NaN NaN 6975 Ridge Road Parma Cuyahoga ... N N Y N N N N N N 6/21/2017 5:15:01 PM
2 1009364 106 S. Main Street Farmers Market http://thetownofsixmile.wordpress.com/ NaN NaN NaN NaN 106 S. Main Street Six Mile NaN ... N N N N N N N N N 2013
3 1010691 10th Steet Community Farmers Market NaN NaN NaN NaN http://agrimissouri.com/mo-grown/grodetail.php... 10th Street and Poplar Lamar Barton ... N N Y N N N N N N 10/28/2014 9:49:46 AM
4 1002454 112st Madison Avenue NaN NaN NaN NaN NaN 112th Madison Avenue New York New York ... N N N N N N N N N 3/1/2012 10:38:22 AM

5 rows × 59 columns

In [92]:
list(farmers_market)
Out[92]:
['FMID',
 'MarketName',
 'Website',
 'Facebook',
 'Twitter',
 'Youtube',
 'OtherMedia',
 'street',
 'city',
 'County',
 'State',
 'zip',
 'Season1Date',
 'Season1Time',
 'Season2Date',
 'Season2Time',
 'Season3Date',
 'Season3Time',
 'Season4Date',
 'Season4Time',
 'x',
 'y',
 'Location',
 'Credit',
 'WIC',
 'WICcash',
 'SFMNP',
 'SNAP',
 'Organic',
 'Bakedgoods',
 'Cheese',
 'Crafts',
 'Flowers',
 'Eggs',
 'Seafood',
 'Herbs',
 'Vegetables',
 'Honey',
 'Jams',
 'Maple',
 'Meat',
 'Nursery',
 'Nuts',
 'Plants',
 'Poultry',
 'Prepared',
 'Soap',
 'Trees',
 'Wine',
 'Coffee',
 'Beans',
 'Fruits',
 'Grains',
 'Juices',
 'Mushrooms',
 'PetFood',
 'Tofu',
 'WildHarvested',
 'updateTime']
In [22]:
farmers_market.shape
Out[22]:
(8790, 59)
In [20]:
# raw plot of geo-coordinates
# map the ip addresses
import matplotlib
import numpy as np
import matplotlib.pyplot as plt


fig, ax = plt.subplots(figsize=(16, 8))

ax.set_facecolor('xkcd:black')

plt.plot(farmers_market['x'], 
         farmers_market['y'],
         linestyle='none', marker='.', color='white')

plt.title('US Farmers Market', fontsize=30)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid()
plt.show()
In [21]:
farmers_market_tmp = farmers_market[(farmers_market['x'] > -130) 
                                    & (farmers_market['x'] < -50)
                                    & (farmers_market['y'] > 25)]

fig, ax = plt.subplots(figsize=(16, 8))

ax.set_facecolor('xkcd:black')

plt.plot(farmers_market_tmp['x'], 
         farmers_market_tmp['y'],
         linestyle='none', marker='.', color='white')

plt.title('US Farmers Market', fontsize=30)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid()
plt.show()
In [23]:
# groupby Latitude
farmers_market_tmp = farmers_market_tmp[['FMID','x']]
farmers_market_tmp['x'] = farmers_market_tmp['x'].round(0)
farmers_market_tmp = farmers_market_tmp.sort_values('x')
farmers_market_rez = farmers_market_tmp[['FMID','x']].groupby('x').count().rename(columns={'FMID':'FMID_counts'}).reset_index()
farmers_market_rez.shape
Out[23]:
(58, 2)
In [24]:
farmers_market_rez.head()
Out[24]:
x FMID_counts
0 -124.0 45
1 -123.0 193
2 -122.0 321
3 -121.0 140
4 -120.0 77
In [25]:
# plot historgram
fig, ax = plt.subplots(figsize=(16, 8))
# plt.plot(farmers_market_rez['x'], farmers_market_rez['FMID_counts'])
plt.bar(farmers_market_rez['x'], farmers_market_rez['FMID_counts'], align='center', alpha=0.5)
plt.grid()

plt.xlabel('Longitude')
plt.ylabel('Count')
plt.title('Farmers Market Count by Longitude', fontsize=30)
Out[25]:
Text(0.5, 1.0, 'Farmers Market Count by Longitude')
In [155]:
Image(filename='1200px-Map_of_USA_with_state_names.png', width='90%')
# souce wikipedia commons 
# https://commons.wikimedia.org/wiki/File:Map_of_USA_with_state_names.svg
Out[155]:

Show Notes

(pardon typos and formatting -
these are the notes I use to make the videos)

Let's talk about food production and food distribution by exploring 2 great and socially impactful data sets. We'll start by looking at the percentage of farming area dedicated to organic agriculture around the world and will follow with the location of farmer's markets around the United States.