There are tons of definitions describing what is Data Science today. Some would say that Data Science is about extracting information from data, others that it is about building tools with data to solve problems, to uncover patterns, to form predictions. Data Science is the intersection of three major fields; Computer science, Statistics and Domain expertise. All the above are correct but for me Data Science is simply about giving people opportunities to learn better through data!

What is Data Science? Blog

But let’s dive a bit deeper in a sense of understanding what data science is and what a typical day may look like. For sure it is an extension of data analysis, a more scientific treatment of data. Data analysts and Data scientists are further differentiated by the type of roles they perform. Data analysts typically perform data migration and visualization roles that focus on describing the past whereas, Data scientists typically perform roles manipulating data and creating models to improve (predict) the future.

 Divider01

What does a typical day of a Data Scientist look like, what kind of activities can a Data Scientist do?

 A typical day may include activities as stat modeling and mathematics in R. Finding correlations between variables or just simply finding basic statistics about the data. It could also be data mining and data cleansing, which is really important. Story telling and explaining the findings from the data analysis using visuals could be another interesting and as important as all the tasks. Lastly but not least, is the domain research! A data scientist can spend a lot of time in trying to find the right people to address the right questions in order to understand the data he has in his hands.

To recap it basically comes down to this:

- Stat modeling & math (SPSS, R, Pyhton,…)
- Data mining (SQL, NoSQL, Hadoop/Hive,…)
- Domain research (Connect dots, ask useful questions)
- Story telling (Interpreting, explaining, visualizing)

Every Data scientist should know how to take data, clean it, filter it, mine it, visualize it and then validate it. It’s a long process.

Divider01

From SQL to feature engineering

I couldn’t say with certainty that there is a typical day. Every day is different. It depends on the project and the team you are working on. One day, you may be writing custom queries to answer complex business questions, another day addressing data quality issues and building statistical models that make decisions based on data.

But in general, a great deal of the time is spent on cleaning the data and getting data ready for analysis. So it’s all about dealing with data. It’s what we call feature engineering. So we try to transform raw data in a way that we can provide value, in order to be good enough to be trustworthy! So the most important thing with the data is to make sure that you can trust anything that comes out of it. That’s the biggest challenge!

As I mentioned before, it is really important to understand the data. Sometimes you have to gather data from external sources. This makes it even more difficult to understand the data.

What is a data scientist?

Divider01

I ♥ Data

To conclude, in my opinion there is no typical day. There is no day that is the same as the previous one. And because every day is different, you don’t know what you will face in the future.
On the other hand, as a data scientist you could predict how the next day would be... It’s not true! The only sure thing is that you have to love data in order to be a data scientist!

 

This blogpost has been translated to Dutch. You can find the Dutch version here.

Written by Agis Christopoulos, Data Scientist at Valid.

Leave a reply