Five reasons why you can’t just “get another data scientist onto it”

Have you ever been in meetings or discussions where there’s an analytical problem that’s proving to be challenging? Like, you can’t find a model that fits, or you’re having problems with the data formats, or the leaders are not happy with the timelines?

Sometimes I’ve been in those situations. It’s normal. Analytics is complex and things don’t always go as you had hoped and planned.

But it drives me mad when people think the solution to these problems is “let’s just get a(nother) data scientist onto this”. There is so much wrong with that way of thinking. It gives away a fundamental misunderstanding of what data science is, and betrays a lack of regard for the skills needed to practice data science well. Here are a few reasons why you can’t just “get another data scientist onto it”.

1. Data scientists are not an easy resource to find

Despite the explosion in the field, and the various education programs now being offered, good competent data scientists are not easy to come by. The market is highly competitive, and those who are most fluent in their craft are in high demand. If you are lucky enough to have strong data scientists in your organization, chances are that their plates are already very full.

“Getting another data scientist onto it’ can often mean pulling in resources that are not really very qualified to call themselves data scientists, just for the sake of a quick fix. Like someone who has done an online Python course once a few months back and can pronounce the words Machine Learning. There are lots of those folk out there, but involving them can end up delaying the effort rather than helping it. And if the intention was to provide more capacity to help existing data scientists, it could be counterproductive because they will have to spend a lot of time hand-holding someone with a very low knowledge base.

2. Data scientists are not jacks-of-all-trades

Data science is an exceptionally broad church. Sometimes you say you want a data scientist when it’s actually a data engineer that you need. Sometimes you need an NLP expert, or someone who can do GLMs. Sometimes what you need is more akin to a front end developer to build apps. There is no data scientist that can do everything well.

“Getting another data scientist onto it” is not prescriptive enough, and can result in HR bringing on board temporary or permanent employees whose backgrounds and skill sets are not a match for the problems being addressed.

3. Data scientists often need domain expertise

Similar to point 2, a data scientist is rarely effective if they don’t have a knowledge domain to support their technical skills. Data scientists who analyze financial risk would use an entirely different set of tools and approaches compared to those that might analyze talent and people, or clinical trial results.

Strong data scientists usually are supported by a level of experience and expertise in their specific domain. They know the typical data issues faced, the specific statistical phenomena that frequently occur. They can deal with them more fluently and quicker.

“Getting another data scientist onto it” risks putting people under pressure to assign someone without domain expertise for the sake of filling the needed capacity. This can be devastating to the quality of the work and the potential solution.

4. Data scientists need to integrate into the toolchain

So much good, efficient analytics is enabled by a strong toolchain. The languages, the version control, the agile tracking and the publishing and sharing norms are all part of a smooth data science operation. Many data scientists can come on the scene who are not familiar with the specific toolchain being used, or who are inexperienced in any toolchain.

“Getting another data scientist onto it” can put the existing workfow at risk or cause delays because a new resource is thrown into the effort who has no experience working with the tools being used. Combined with some of the issues above, this can be exceptionally risky.

5. The data scientist is not the only one who needs to engage with the problem

Solving organizational or business problems using analytics requires more than just data scientists. It requires engagement from experienced individuals from the business side who can help make judgments about what to model and not to model, and about what kinds of results would be most useful. It needs people who can help the data scientist by prioritizing which parts of the work are critical and which can be dropped if the data is poor or difficult to obtain.

“Getting another data scientist onto it” can often disguise a deeper problem that there is not enough engagement from the business side, and that the existing analytics professionals are not receiving sufficient guidance on priorities and outputs. Another data scientist won’t help with this.

Data scientists are not a universally-available magic cure to all problems. Throwing more onto a problem can often make the situation worse, not better. So next time you hear someone say “let’s just get a(nother) data scientist onto it”, don’t be afraid to pull them up on it!

Image from Despicable Me, Illumination Entertainment/Universal Pictures

Leave a Reply

%d bloggers like this: