What skills and tools do you need to be a data scientist?
Article published: February 4, 2021
Depending on where you look in the media, data science promises to be either the answer to all our problems or the cause of our downfall. Companies have taken notice responding to declarations like “Data is the New Oil” or that a data scientist is “The Sexiest Job of the 21st Century”. We’re also witnessing a proliferation of data science programs from academic institutions as well as a host of alternative education avenues, like data science MOOCs and bootcamps, to meet the demand for talent.
Despite all the media attention and educational programs, there is still a lot of confusion around what data science actually is. What are the skills and technologies needed for a data scientist? What can data scientists accomplish at varying levels of seniority?
We break through the noise and tackle these questions in this article.
Entry-level data scientist
While many managers believe they need that new cutting-edge ML technique, oftentimes the basics are enough to provide real value. Companies need people who can extract data from disparate systems, manipulate that data, and then present it in a way that is easy for others to understand. These are the core skills of entry-level data scientists.
Specifically, entry-level data scientists should be an expert in:
- Data management: Standard data management tools like Excel and SQL are a must. Entry-level data scientists should be able to join disparate pieces of information together, summarize them, and utilize the results to uncover novel insights. The majority of a data scientist’s time will be spent in data management so this is by far the most critical skill.
- Communication: BI tools like PowerBI and Tableau are useful to share and communicate the results of an analysis. But excel and PowerPoint alone could also do the trick. Either way, the entry-level data scientist must have the soft communication skills to translate results into actionable recommendations and insights.
- Analysis: Working knowledge of statistical concepts like hypothesis testing, statistical significance, and linear/logistic regression is key for success at the entry-level. In addition, skills in an analytics language as well, like Python or R, are important to quickly deploy and iterate upon these analysis techniques.
Mid-level data scientist
Once you move past the basics of data management and statistical modeling, the opportunities to leverage more advanced data science techniques start to surface. Mid-level data scientists can provide a huge benefit for companies that are interested in unlocking the hidden relationships within their data to create additional value for their teams or their customers. These data scientists will use algorithms that are more than enough to create reliable and high-quality data science products.
A mid-level data scientist should be well versed in:
- Data management: More complex models often require more data. Therefore programmatic techniques to handle big data will be important for mid-level data scientists. Skills in areas like distributed computing, Hadoop, Spark, and Scala, will be important at this stage. Also, because mid-level data scientists will often be extracting data from external systems, they should be knowledgeable in working with APIs.
- Machine Learning: Mid-level data scientists should be capable of independently creating a robust AI/ML solution. They should have knowledge of several unsupervised and non-linear supervised machine learning techniques. Companies will want to look for individuals who have experience in some of the popular machine learning software tools like scikit-learn, statsmodels, tensorflow (Python), or lmer (R).
- Teamwork in Development/Operations: Mid-level data scientists will help teams actually implement and deploy the models that they create. Therefore, they should have experience deploying code to production systems. They will need to have working knowledge in some of the tools used for software development as well like GitHub.
Senior data scientist
At the senior data scientist level, the fun and, unfortunately in some instances, the headaches begin. Senior-level data scientists will implement cutting edge machine learning models and practices into your company’s day-to-day operations. Similar to more junior levels, this is not limited to just model building. The senior level data scientist is a highly independent resource that will be capable in all aspects of creating and deploying advanced AI products to end customers.
Senior-level data scientists are expected to deeply understand the following:
- Data management: Senior data scientists will need to know how to work with both structured and unstructured data. They also need to have experience working with tools for large data sets, such as Hadoop and Spark. Experience working with other types of data like NoSQL and data lakes like Snowflake will also be crucial.
- Advanced Machine Learning: Knowledge in more advanced topics like deep learning or reinforcement learning separates senior data scientists from their more junior counterparts. Senior level data scientists should be well versed in tools like tensorflow and torch among others. These resources should understand how to use these tools and have knowledge in the limitations for each.
- Systems architecture: Identifying data sources, extracting data, creating data pipelines, building predictive models, and deploying those models to production are all tasks that the senior level data scientist may need to do. Therefore, knowledge in systems architecture, especially cloud-based architecture like Amazon Web Services or Google Cloud Platform, will be important at this level.
- Leadership: At the senior level, data scientists should be capable managers and leaders in your organization. The senior-level data scientist will be managing and developing junior talent and may need to manage other stakeholders, whether sales or IT teams, to ensure projects have the support they need to be successful. Leadership experience and capability will be a critical skill to look out for in these candidates.
At all levels though, probably the most important quality is a hunger to learn. Data science is a field that is continuing to evolve and as new methods and tools are created, it’s important to stay on top of these trends and adapt techniques as needed.
Check back soon for our follow up article discussing which level of data scientist is right for your business’s needs.
Thanks for reading!