Get the "Applied Data Science Edge"!

The ViralML School

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

Web Work

Python Web Work - Prototyping Guide for Maker

Use HTML5 Templates, Serve Dynamic Content, Build Machine Learning Web Apps, Grow Audiences & Conquer the World!

Hot off the Press!

The Little Book of Fundamental Market Indicators

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!


Sign up for my newsletter and get my free intro class:
If you liked it, please share it:


What They Didn’t Teach You at Data Science School and How to 10x Your Career
Originally published at ViralML.com

What They Didn’t Teach You at Data Science School and How to 10x Your Career

Originally published at ViralML.com

Art: Lucas Amunategui

Even though I had a stellar CV, I was turned down for the position. I had, what I thought at the time, plenty of machine learning and modeling experience. I applied for a position in Holland — I was ready for something new and wanted to show my family the world.

Become a Full-Pipeline Data Scientist by Being Both a Data Scientist and a Full Stack Developer Squeezed into One

Unfortunately, they told me they wanted a data scientist with more commercial and “full pipeline” delivery experience — “please, let the next candidate in on your way out”. This was a few years back when things were less competitive, it is that much more important today.

Full Pipeline Data Scientist

And they were partly right. I built internal models for a hospital that may not have qualified as “commercial” but I certainly built my share of pipelines. The problem is that most data scientists today have less applied skills than I had back then. Our educational system cranks them out that way. Some don’t even know what pipeline experience means, and if they do, they may not know how to implement one.

Full-pipeline experience is synonymous with being a data scientist and full-stack developer squeezed into one. Some will argue that these positions are very different and would be better accomplished by different team members. But in most cases, on smaller teams, in fast startups, and more importantly for intuitive data science solutions, a data scientist should do it all, or at the very least, design it all and have others implement it. And, in the era of “A.I., the human job pillager”, the more useful you are, the longer you’ll survive.

It Ain’t Real Until it Reaches your Customer’s Plate

It is critical for a data scientist to not only understand the data, the model, and how to explain the output, they have to understand how it is going to be consumed by end-users. Whether a medical staff needing life-saving prognostics or a customer asking for clothing recommendations, today’s data scientists need to understand how their output will be digested. This is critical, some will be lost, not have a clue what to do with a percentage — like “60% positive loan recipient”? You will need to work with business experts and translate any tunable parameter into a language that makes sense to those using it. And no, an end-user will never need an AUC score…

Being a data scientist is a wonderful profession but there is a troubling gap in the teaching material when trying to become one. Data science isn’t about statistics and modeling, it is about fulfilling human needs and solving real problems. Not enough material tackles the big picture. That’s what is missing in this profession’s educational syllabus. If you build first then talk to your customer, your pipeline will be flawed, and your solution will miss its target.

Some Ideas to Get You Started

Here are two applied data science pipelines, starting from an idea and leading to actionable insights:

Today’s Models are Complex

Gone are the days of the single model/prediction in spreadsheet solution, today, it can be a choreography of multiple models working asynchronously or synchronously feeding one model’s prediction into another. A customer dashboard may contain one or many outputs, may have complex tunable parameters, may have visuals reaching into the internals of your models. You need to be involved until the end, it is your responsibility — you need to understand the “full pipeline”.

Follow my latest projects at ViralML.com, and amunategui.github.io and be sure to sign up for my newsletter!

Originally published at ViralML.com