05
JanData Science
Data Engineering
It has many smaller steps to refine extracted data, apply calculations on them and format into a shape.
As data can be extracted from different data sources, so integrate different data entities into a single data entity is very crucial and complex step for us.
This will include following steps
Data can be unstructured or in raw format so we have to define data verification steps also which will determine the accuracy of data.
This step will include following steps
Data cleaning is the most important part of this step. Which includes smaller steps
to exclude fake data, remove unwanted data and duplicated data.
In this step, basically we will have data values mapping Meta data where we identify
invalid values or short values and replace them with corresponding values.
This step will include following steps
Data generating is required in some cases like we have date field and wants to summarize data according to Month so we have to generate other data values (day, month & year) based on this data value so we can summarize data easily.
Basically this kind of split can be following;
Basically we can divide joins into three steps in a repeated mode
Summarizing data and generating new data will cover in this steps. We can give new identities for summarized entities also.
Comments