Sign up for my newsletter and get my free intro class:
5 Global Data Sources That Every Data Scientist Should Know About!
Here are 5 great, global, realtime or regularly curated data sources that every analyst, data scientist, programmer, whatever you call yourself in this field, should know about!
I’ve covered each in walkthroughs or classes, so I’ll point them out if you want to get started really quickly.
Number 1 — OpenWeather
Get weather forecasts from around the world and much, much more. It’s free to sign-up and use as long as you don’t exceed the threshold of 60 calls per minute — perfect for an MVP/POC project!
There should be hundreds of potential applications popping up in your head about what you could do with this highly customizable worldwide weather forecasting API, right? Help your community, your neighboring communities, anybody needing weather data — here’s your chance.
I covered this in one of my vlogs. I even show how to pull the weather icons for each forecast.
Number 2 — GDELT
GDELT, The Global Database of Events, Language and Tone, is a true mammoth of a data source. It contains over 40 years of news and each year is about 2.5TB of data. Crazy big!
The entire GDELT database is 100% free and open. You can download raw files or use their GDELT Analysis Service. Another option is Google Big Query, but be careful, keep an eye on the meter if you go that route.
I did a brief vlog on GDELT and pulled a few events using Google Big Query. I also show how to run queries using time partitions that can drastically reduce the amount of data moved around and therefore cost.
If you are looking to correlate the stock market or other time-series data to the news, this is it.
Number 3 — Global Health Observatory
Global Health Observatory (GHO) data from the World Health Organization (WHO) is a phenomenal global health statistics resource. It contains over 1000 indicators for 194 countries over a large period of time!
C’mon people, this is the perfect data set to build that health-awareness application you always wanted to build — the world needs you!
I create a free course using GHO’s world-wide life expectancy data. Simply enter your country, gender, and age, and it will tell you how many more years of life you can expect (on average).
Take the class here, it's free and use GHO data:
Number 4 —Realtime USGS Earthquake Hazards Data
The United States Geological Survey (USGS) Earthquake Hazards offers an incredible real-time data source reporting earthquakes from around the world and their Richter scale.
It is free to use and very intuitive. Check out their Earthquakes map :
If you’re interested in applying it to a web-based data science project, check out the Applied Machine Learning Track at the ViralML School where we use the USGS data to forecast earthquakes and plot the results on a Google Map — really cool!
You can find it at the ViralML School:
Number 5 — IEX Cloud Market Data
And last but certainly not least, the IEX Cloud service and it’s free stock market data.
It is free to use as long as you keep it under their messaging threshold of 50k messages per month.
I did a brief introduction to this service in one of my live webinars:
And I also use it in the free class “Learn How to Create and Sell Your Machine Learning Products Online and For Free”.
Conclusion — No More Excuses
If you’re passionate about data science and want to make a difference and get noticed, these are for you. Let me know what you come up with!