The first step of working with data is data acquisition. At an early stage we realised that data extraction from the whole range of sources our clients use, would be a key component of our everyday work. As companies grow, their data grows as well. It grows in volume, density (volume/time) and complexity. So, what might first seem as an easy-to-do, manual operation soon turns into a major, hard-to-handle big data process.
That is why, Valkuren came up with its solution. The data would be extracted and tabulated automatically by a workflow mechanism, Apache Airflow, on Amazon Web Services. This workflow would be a composition of DAGs (Directed Acyclic Graphs) we could switch on and off as needed. A DAG is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. Each DAG would represent one of the sources of data, for example, if a company extends its marketing campaign in social media such as Facebook and Instagram, there would be a DAG for Facebook and a separate one for Instagram; if the company sells using an online platform such as WooCommerce, a graph representing it would be introduced. Each DAG would be made up of the variety of processes in the data workflow. If we consider the example of Facebook, the graph would start with the data extraction (respectively for posts; page insights; etc.) from the social medium’s API, afterwards the data would be transformed to adapt to our visualization and analysis needs and finally it would be saved in tabulated form as required.
Each graph has 2 variants of running, either a one-time run or an incremental run. The one-time run DAGs were only used at the start of our automated work, whereas as the incremental DAGs run once a week so as to extract, transform and save the observations of the week and therefore incrementing our data volume.
However the automation of the workflow is not and will not be our only challenge in this developing field, that is why we are always changing, growing and improving, with the single purpose of unlocking the power of data.
Writing by Uendi Kodheli data scientist @Valkuren