What is a data scientist? The question is seemingly simple, and yet deceptively complex. A data scientist obviously works with data, but in what capacity? Is a data scientist more concerned with the collection of data, or data analysis? And what is the end goal for a data scientist? With data available on everything from digital advertisements to online purchases to personal health records, the breadth of fields requiring data scientists makes the definition even more elusive.
Last Tuesday, an organization called the New York Data Science Study group coordinated a data science career panel talk, titled, “What Does a Data Scientist Do?” The panel discussion was hosted in the offices of General Assembly—a company that offers a range of computer programming and coding classes. The talk featured five members of the data science community: Brad Willard, a Machine Learning Engineer at Squarespace; Steven Wood, a Technical Recruiter at Squarespace; Evan Estola, Meetup’s Lead Machine Learning Engineer; Kristian Kaufman, the Data Lead at Vimeo; Samer Zaben, a Senior Technical Recruiter at Vimeo; and Eric Xu, who moderated the discussion, and works as a Data Scientist at Outbrain. The premise of the talk was to together a group of working data scientists and ask them about their work, as a way of trying to solve the question: what is a data scientist?
The panel talk started off with each panel member discussing the methods and tools they use on a daily basis. There were variations among the panelists, but most of them talked about using the standard data science tools: neural networks, machine learning, and natural language processing.
But the real variations started to become apparent when the panelists began discussing what they use data science for. Everything from business intelligence, to product development, to customer insight, to marketing placement was cited as fertile ground for tomorrow’s data scientists.
The conversation then shifted towards what each of the panel members do on a daily basis, and it was possibly the most surprising topic of the evening. Most of the panelists talked about working in management positions, and said that because of where they are in their careers, they actually do not spend a significant portion of their days coding and programming. While these skills are certainly important, the time spent coding and programming seemed to be dwarfed by strategy meetings and discussions surrounding what to actually do with the data at hand.
At this point in the panel discussion, the question of “what does a data scientist do?” seemed more elusive than ever. But as the talk progressed, there emerged a clearer theme: that being a data scientist is not defined by the specific tools you use, but that being a data scientist is defined by a specific worldview. The moderator, Eric Xu, asked the panelists what they have done in the past when encountering managers who are reluctant to look at what a data set might be telling them. Evan Estola, from Meetup, responded: “A manager who absolutely refuses to consider the available data should not be employed in the modern business world.” He qualified the statement by saying that not every single decision should be made by data, but that data is an indispensable tool for making decisions.
By the end of the discussion, it became clear that being a data scientist is not about using a specific method, or knowing a certain programming language, or using data science for a specific end goal. While those skills are certainly important, what differentiates data scientists is the belief that data can, and should, be used to gain insight into ourselves, and into our world, and that human decisions are usually most effective when informed by, and supported by, data.