Get the "Applied Data Science Edge"!

The ViralML School

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

Web Work

Python Web Work - Prototyping Guide for Maker

Use HTML5 Templates, Serve Dynamic Content, Build Machine Learning Web Apps, Grow Audiences & Conquer the World!

Hot off the Press!

The Little Book of Fundamental Market Indicators

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!

GDELT - World Events at Your Finger Tips and for Free with Google's BigQuery!

Introduction

The Global Database of Events, Language and Tone (GDELT) Contains Some 40 Years of News Worldwide Sources Begging for Your Questions, Analysis, and Discoveries - Learn How to Maximize Your Querying. MORE: Blog or code: http://www.viralml.com/video-content.html?fm=yt&v=yL4JZogjf8U Signup for my newsletter and more: http://www.viralml.com Connect on Twitter: https://twitter.com/amunategui My books on Amazon: The Little Book of Fundamental Indicators: Hands-On Market Analysis with Python: Find Your Market Bearings with Python, Jupyter Notebooks, and Freely Available Data: https://amzn.to/2DERG3d Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud: https://amzn.to/2PV3GCV Grow Your Web Brand, Visibility & Traffic Organically: 5 Years of amunategui.github.Io and the Lessons I Learned from Growing My Online Community from the Ground Up: https://amzn.to/2JDEU91 Fringe Tactics - Finding Motivation in Unusual Places: Alternative Ways of Coaxing Motivation Using Raw Inspiration, Fear, and In-Your-Face Logic https://amzn.to/2DYWQas Create Income Streams with Online Classes: Design Classes That Generate Long-Term Revenue: https://amzn.to/2VToEHK Defense Against The Dark Digital Attacks: How to Protect Your Identity and Workflow in 2019: https://amzn.to/2Jw1AYS CATEGORY:DataScience HASCODE:ViralML-GDELT-Worl-Events-at-Your-Finger-Tips.html



If you liked it, please share it:

Code

build-two-billion-dollar-discord-pager
In [3]:
from IPython.display import Image
Image(filename='viralml-book.png')
Out[3]:

GDELT - World Events at Your Finger Tips and for Free!

The Global Database of Events, Language and Tone (GDELT) Contains Some 40 Years of News Worldwide Sources Begging for Your Questions, Analysis, and Discoveries - Learn How to Maximize Your Querying Potential

This is going to be a quick flyover GDELT. It’s a phenomenal resource that I don’t think enough data scientists and analysts know about or if they know about it, don’t realize how easy it is to work with.

GDELT and BQ are phenomenal tools that aren’t used enough in my opinion - its up-there cool like Google Trends and Trending Searches (https://trends.google.com/trends/trendingsearches/daily?geo=US).

Google's BigQuery provides free access to the GDELT database along with 1TB of free BigQuery processing every month. This is great and very generous, but keep in mind that 1TB with GDELT on BQ can go fast! So, to make this last, I’ll show you ways of querying only small subsets of data using _PARTITIONTIME and also a cool Chrome plugin a colleague turned me onto that will estimate query costs.

What is GDELT?

"The Global Database of Events, Language and Tone is one of the largest datasets on the planet. It is the quantitative database of human society, relying on thousands of news sources from every corner of the globe dating back to 1979." (See https://www.gdeltproject.org/)

"The GDELT 2.0 Event Database is a global catalog of worldwide activities (“events”) in over 300 categories from protests and military attacks to peace appeals and diplomatic exchanges. Each event record details 58 fields capturing many different attributes of the event. The GDELT 2.0 Event Database currently runs from February 2015 to present, updated every 15 minutes and is comprised of 326 million mentions of 103 million distinct events as of February 19, 2016. This dataset uses machine translation coverage of all monitored content in 65 core languages, with a sample of an additional 35 languages hand translated. It also expands upon GDELT 1.0 by providing a separate MENTIONS table that records every mention of each event, along with the offset, context and confidence of each of those mentions." (See: https://console.cloud.google.com/marketplace/details/the-gdelt-project/gdelt-2-events)

Let's Get Querying!

Here is a very simple query and simple visualiztion. Let's pull the longitude and lattitude of the latest 1,000 news events in the US:

In [ ]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
    SELECT
        ActionGeo_Lat,
        ActionGeo_Long
    FROM
        `gdelt-bq.gdeltv2.events_partitioned`
    WHERE
        _PARTITIONTIME >= TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 2 day))
    AND _PARTITIONTIME <= TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
    AND ActionGeo_CountryCode='US'
    ORDER BY DATEADDED DESC
    LIMIT 1000
In [ ]:
# load exploratory data and plot it
news_geo_df = pd.read_csv('/Users/manuel/Downloads/results-20181108-172523.csv')
plt.scatter(news_geo_df['ActionGeo_Long'], news_geo_df['ActionGeo_Lat'], s=1)
plt.grid()
In [4]:
Image(filename='chart.png')
Out[4]:

_PARTITIONTIME & BigQuery Mate

Here are two tips to make sure you get to query BQ without depleting your free 1TB too quickly:

  • Only use _PARTITIONTIME on GDELT tables and limit the time scope to only what you want to see
  • Use a Google Chrome plugin like BigQuery Mate to translate your querying estimates into dollars - ain't mistaking those!

Show Notes

(pardon typos and formatting -
these are the notes I use to make the videos)

The Global Database of Events, Language and Tone (GDELT) Contains Some 40 Years of News Worldwide Sources Begging for Your Questions, Analysis, and Discoveries - Learn How to Maximize Your Querying. MORE: Blog or code: http://www.viralml.com/video-content.html?fm=yt&v=yL4JZogjf8U Signup for my newsletter and more: http://www.viralml.com Connect on Twitter: https://twitter.com/amunategui My books on Amazon: The Little Book of Fundamental Indicators: Hands-On Market Analysis with Python: Find Your Market Bearings with Python, Jupyter Notebooks, and Freely Available Data: https://amzn.to/2DERG3d Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud: https://amzn.to/2PV3GCV Grow Your Web Brand, Visibility & Traffic Organically: 5 Years of amunategui.github.Io and the Lessons I Learned from Growing My Online Community from the Ground Up: https://amzn.to/2JDEU91 Fringe Tactics - Finding Motivation in Unusual Places: Alternative Ways of Coaxing Motivation Using Raw Inspiration, Fear, and In-Your-Face Logic https://amzn.to/2DYWQas Create Income Streams with Online Classes: Design Classes That Generate Long-Term Revenue: https://amzn.to/2VToEHK Defense Against The Dark Digital Attacks: How to Protect Your Identity and Workflow in 2019: https://amzn.to/2Jw1AYS CATEGORY:DataScience HASCODE:ViralML-GDELT-Worl-Events-at-Your-Finger-Tips.html