How to carry out a data science project? (Part 1)






To be completed in a qualitative way, a data science project must follow a certain methodology  composed of 6 different steps. 

Step 1 : Project understanding  

In this step we’re looking to fully grasp the scope of the project and typically determine the following:  

      • The problem  
      • The potential solution(s)  
      • The necessary tools & techniques 

For this purpose, several questions could be asked: 

      • What is the objective of the project? 

      • How will this project add value? 

      • What data do we possess? What is the format of these data? 

      • Regression or classification problem? 

Step 2 : Data mining and processing 

In his own, this step is composed by 3 level:  

Data Mining: 

The data mining process identifies and extracts the useful information defined in Step 1. You have first to identify data sources, in a second step you have to access the storage space and in a third time you have to retrieved relevant data.  

Quality assessment:  

Having the data is not all, it is necessary to check them and judge their reliability. In this aim, you have to determine which data are usable data, if there is any missing or corrupt value. And you have to check also the consistency of the data. In other word, this step help to check the veracity of the data that are given, to find if there is any error. You can check it thanks to statistical tools, like QQ plot.  

Data cleaning:  

Real world data is often noisy & presents quality issues. The quality assessment step provides a clear overview of the discrepancies in the data. The data cleaning process deal with this discrepancy. This step has the aim to correct quality flaws, transform the data and remove those which are fault. 

Step 3: Data exploration  

Data exploration is the first step of the data analysis. The goal is to synthesize the main characteristics of these data. The purpose of this step isn’t to draw important conclusions, but to become familiar with the data, see general trends. It is also important for the detection of errors in the data. There is different pole in the data exploration: correlation analysis, descriptive statistics, data visualisation, dimensionality reduction. In each pole you can use different statistic tools as you can see in the diagram below.  

Manual or automatic methods are used to make data exploration. Manual methods give analysts the opportunity to take a first look and become familiar with the dataset. Automatic methods, on the other hand, allow to reorganize and delete unusable data.  

Data visualization tools are widely used in order to have a more global view of the dataset for a better understanding and to distinguish errors more easily. Moreover, to make this possible, the main programmatic tools used are the language R and Python. Indeed, their flexibility are highly appreciable by the data analysts.  


Catch up the 3 lasts steps in our next article.

© Valkuren

Unlock The Power – Sponsorship

At VALKUREN, we think nothing is impossible. Thanks to a little support, each people can unlock their power and success in their challenges. 


We are proud to support Nigel Bailly, a Belgium racing pilote with reduced mobility, in his challenge to take part, not only of the 24h Le Mans but also  the 24h Series with a 911 GT3 Cup MR.


A complete full challenging year!


Because we think it’s important for us to promote the inclusion of disability in life & sport. 


Follow him on social network in his incredible adventure!  Facebook Page


Link to his video presentation 

Inspiring Fifty – Belgium 2020

We are proud our CEO & Founder is named as Inpiring Fifty Women of the Year 2020 thanks to her career in the Tech Domain with Business Intelligence, Data Analytics & Artificial Intelligence.


At the end of 2014, she decided to create Valkuren, a new Data-Tech Company, specialized in leverage the business of all companies (from Start-up to Corporate) through the new technologies available to unlock the power of data (Cloud, Analytics & AI) with a specific aspect to keep human link & gender equality.


Valerie gives also  conferences speaking about bias, ethic in AI but also gives webinar on different data domains. As a role model for Women in Tech Brussels, she is always interested to share her knowledge & coach new motivated people on her domain.


Inspiring Fifty Belgium 2020

Analytics in Fashion

👓👗New York Fashion Week👗👓 has just ended and it should be noted, even on fashion, DataAnalytics is used for design but also sustainability and marketing.


🔸For example, Alibaba Group helps new chinese designers create collections to attract a very large group of people by analyzing data on consumers choices.


🔸But also, Google tries to build a new data tool to follow & mesure the environmental impact of collections from the raw matarials.


🔸At Valkuren, we help a Belgium fashion company to maximize their marketing effort to convert more people on their e-commerce platform with the social networks data.


👉 You want to understand how put in place these analyses in your company? Contact us ! 

DataAnalytics Fashion Marketing Sustainability Valkuren BigData