SQL Data Cleaning in Real world Applications
Data Cleaning refers to the massaging of raw data to make it usable and ready for analysis. T he process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled. Examples: Data could be in un-standardized formats like in a different units of currency and it requires to be normalized to ensure comparing equally across records. Data could be in string data types and it requires to cast each column in order to run computations. Data could be having different date formats and it requires to be standardized according to the country specifics. Removing irrelevant data Deduplicating the data or removing duplicates in other words Dealing with missing data Filtering data outliers Validating the data Most of the data scientists and analysts spend most of their time in prepping the data. Effective Data Cleaning Strategies Review t