Audio by Gail La Grouw. 8:50
I had a call from a business colleague the other day who reminded me of a conversation that we had some years back when analytics first burst onto the scene and the big dread was…where are we going to get all the data scientists from? At the time I casually suggested that I didn’t think it would be anywhere near the problem they were predicting, as I felt it was more likely that smart analytics vendors would engage top data scientists to build complete models into their programs that users would select via a guided set of questions.
I wasn’t convinced that the broad set of skills being bundled up as a ‘data scientists’ would ever be found in a single person anyway.
In the recent call he asked if I still felt the same. On the whole, my answer was yes, however now that we had the benefit of hindsight we can see how much was hype, and how much was real. So where are we today? The real question now is from a different perspective – ‘exactly what analytics resources does a business need?’ That starts with working out what will be done by machine, and what is needed from man.
Getting Past the Hype
Looking back from 2018, I feel comfortable in saying that I was pretty well on the mark in that analytics software vendors have built guided analytics into their programs, but not to the extent that I had hoped – and with very good reason. Every business is unique in the exact questions they want answered by analytics, so naturally it makes sense for vendors to focus on the most common analytics queries – largely those related to customer analytics, scoring [such as lead scoring or credit scoring] and fraud detection. These are all pretty standard models used widely by the vast majority of analytics users. So I was only half right in that prediction, however, I am hopeful that options will continue to improve.
So, getting back to the core question. Just what resources are really needed?
It’s easy to overcomplicate things when one doesn’t have a full understanding of the work and effort involved…and that holds true for data science. Most of us do not possess the real knowledge needed to judge what makes a good data scientist.
The good news is that for most businesses you don’t need a data scientist. You do however, need a data science team. But more on that in a moment. I guess a lot depends on exactly how we define a data scientist.
Calling SuperMan Statisticians!
For the most part, reading recruitment advertisements for data scientists are rather amusing. The set of skill, experience and personal attributes required conjure up a super intellect that has fantastic people skills and loves presenting…Hmm. Doesn’t sound like anyone I could name. For the most part – and this also applies to many technology related roles – the very talents that make a data scientist or IT specialist so good at their jobs do not correlate with those of a super sales person. And vice versa. You are looking at two diametrically opposed personality sets. But that seems to be the mix that most potential employers of data scientists seek. It’s just not realistic.
However, what is realistic is building a small team that collectively possess this set of attributes. All managers know that they get more collective value when teams work together, than working as individuals.
What is a Data Scientist Anyway?
Ask any titled ‘data scientist’ and they will tell you that their roles are a cross over of many different disciplines – and not all dubbed data scientists possess the same ‘prescribed’ set of skills. There is a broad range of capability in understanding data, then using programming to apply statistics and maths to analytical and AI tasks such as machine learning.
It Takes a Team
An ideal data science team consists of:
- Research lead – to define the business problem, and ask the probing questions needed to ensure the model develops within the right context
- One to three data analysts – who can convert business speak into mathematical functions, obtain and scrub data, then present results in an actionable business context
- Project manager – to manage the demands on the analysts by clearing operational and political hurdles to gain access to data.
And that’s it!
A cross functional data science team as suggested above doesn’t require every individual to be an expert – each person contributes their strengths to cover each other’s weaknesses. Data scientists typically come from an engineering, math, or statistics background – they are likely to share a similar way of thinking, and approach to their work. A team of pure data scientists would not be capable of delivering the value, and be prone to group think leading to blindspots.
To be effective, it’s crucial to understand how to build and manage these data science teams. Keeping data science teams small is the key – enough to gain value in perspective, but not enough to succumb to paralysis by analysis.
Where To for Hard Core Data Scientists?
Getting into the heavy prescriptive analytical models that demand deeper data science knowledge is typically only required by very large organisations. And, the reason is, that they are the only ones that have sufficient volumes of data. Of course, the exception is a smaller organisation that has an atypically high volume of transactional data. And why is this necessary? Because it takes large volumes of data to train a model to a point where it is statistically valid enough to rely on when making critical business decisions.
The Gold is In the Questions
A good data scientist relies on good questions – they need the business perspective of the research lead to provide the initial question, then probe further. It often takes 5 levels of questions to reach the real value. It is the role of the research lead to take the analytical outcome of each question, and ask why? Once all the ‘whys’ are exhausted, then you get to the real gold, the ‘what if’s’.
Fall in Love with Conflict
In amongst this process you can expect a fair level of conflict… and that’s exactly what you want, and need. No conflict only suggests there is too much group think going on. Members of the team need to respect the contribution of others and not see conflict as a personal attack, but rather as a critical part of the analytical process.
The Sense Comes from Insight
Analysis is not just about numbers and algorithms – it also requires a lot of sensemaking. And it needs a broader perspective beyond the role of a single analyst to generate the diversity needed to catalyse conflict and get to the gold. Only by accepting that true analysis is a sum of analytics + sensemaking are decision makers able to gain the benefit of both data + insight. A data analyst may have the skill to extract information from data, but insights are much tougher to get. And, insight only comes from asking more, and tougher questions.
Conclusion
Data science is still an emerging field, and as more capability, and more data is made accessible, we can expect the use of analytics to continue to evolve. Whilst I am still in awe of statistical genius which I don’t possess, as insight seekers we need to maintain a balance between the machinations of analytics, and the value of human insight.
Whilst there are many aspects of analytics that will eventually be totally machine automated, the critical insight needed for sensemaking will continue to be a human-oriented task.