Sign up for my newsletter and get my free intro class:
What They Didn’t Teach You at Data Science School and How to 10x Your Career
Originally published at ViralML.com
Even though I had a stellar CV, I was turned down for the position. I had, what I thought at the time, plenty of machine learning and modeling experience. I applied for a position in Holland — I was ready for something new and wanted to show my family the world.
Become a Full-Pipeline Data Scientist by Being Both a Data Scientist and a Full Stack Developer Squeezed into One
Unfortunately, they told me they wanted a data scientist with more commercial and “full pipeline” delivery experience — “please, let the next candidate in on your way out”. This was a few years back when things were less competitive, it is that much more important today.
Full Pipeline Data Scientist
And they were partly right. I built internal models for a hospital that may not have qualified as “commercial” but I certainly built my share of pipelines. The problem is that most data scientists today have less applied skills than I had back then. Our educational system cranks them out that way. Some don’t even know what pipeline experience means, and if they do, they may not know how to implement one.
Full-pipeline experience is synonymous with being a data scientist and full-stack developer squeezed into one. Some will argue that these positions are very different and would be better accomplished by different team members. But in most cases, on smaller teams, in fast startups, and more importantly for intuitive data science solutions, a data scientist should do it all, or at the very least, design it all and have others implement it. And, in the era of “A.I., the human job pillager”, the more useful you are, the longer you’ll survive.
It Ain’t Real Until it Reaches your Customer’s Plate
It is critical for a data scientist to not only understand the data, the model, and how to explain the output, they have to understand how it is going to be consumed by end-users. Whether a medical staff needing life-saving prognostics or a customer asking for clothing recommendations, today’s data scientists need to understand how their output will be digested. This is critical, some will be lost, not have a clue what to do with a percentage — like “60% positive loan recipient”? You will need to work with business experts and translate any tunable parameter into a language that makes sense to those using it. And no, an end-user will never need an AUC score…
Being a data scientist is a wonderful profession but there is a troubling gap in the teaching material when trying to become one. Data science isn’t about statistics and modeling, it is about fulfilling human needs and solving real problems. Not enough material tackles the big picture. That’s what is missing in this profession’s educational syllabus. If you build first then talk to your customer, your pipeline will be flawed, and your solution will miss its target.
Some Ideas to Get You Started
Here are two applied data science pipelines, starting from an idea and leading to actionable insights:
- Let’s Talk Applied Data Science — Student Retention Modeling — Time to Step Up Your Predictive Game!
- Modeling for Actionable Insights with XGBoost - What Can You Do about Your Predictions?
Today’s Models are Complex
Gone are the days of the single model/prediction in spreadsheet solution, today, it can be a choreography of multiple models working asynchronously or synchronously feeding one model’s prediction into another. A customer dashboard may contain one or many outputs, may have complex tunable parameters, may have visuals reaching into the internals of your models. You need to be involved until the end, it is your responsibility — you need to understand the “full pipeline”.
Originally published at ViralML.com