Data Cleaning and Verification

Data Cleaning

From single person collections to multiple employees collecting and combining different data sources, consistency can often be dropped in favour of speed in completing a specific and current goal. Inconsistent and incomplete data can cause some reliability issues in the future when trying to reuse data for a different project.

T3Data has researched and developed a number of quick and efficient methods to cleaning up older or cluttered data sources, allowing for businesses to focus on getting insights from clean and ordered dataset. Simple changes such as removing redundant columns, resolving duplicate records, or handling missing values can make a significant difference to an overall dataset.

Any changes to a dataset are checked as part of the testing completed as standard at T3Data and differences to results noted for discussion.

Data Verification

Data can be stored in many different formats: currency, time, dates, decimal values, integers, etc. Each format can have different ways of presenting values and errors or inconsistencies in these formats can cause complications and errors when being used. For example: Dates stored in different configurations (e.g. yyyy-mm-dd, mm-dd-yyy, or dd-mm-yyyy) could result in a time series chart of collected data giving incorrect results.

As a separate service or in conjunctive to the data cleaning methods used, T3Data offers data verification services to ensure that a dataset is as concise and clear as possible. Some methods used might include:

  • Standardising values: Correcting spellings, formatting errors or removing alternate definitions of values (e.g. date formats)
  • Normalising numeric data: Scaling numerical data using techniques such as Z-score normalisation or min-max normalisation to better compare across datasets (e.g. sales amounts)
  • Validation: Ensuring that data entered is a valid value and meets specific criteria (e.g. postcodes)
service image

LET’S WORK TOGETHER