Skip to main content

Command Palette

Search for a command to run...

How to do manual data reconciliation using python and SQL

Updated
2 min read
How to do manual data reconciliation using python and SQL

Manual data reconciliation using python and SQL

Data reconciliation is an important part of the data migration. After the completion of data migration we have to compare data from the source data to destination data and here we are going to compare the record count of data which is one of the part of data reconciliation

Whenever we are doing data migration we have to building manifesto file which will have records of when, where and how much data got transferred. Here In this blog we are going to do manual data reconciliation using this manifesto file with these steps before starting reconciliation you have to migrate the data to BigQuery

  • We will create manifesto table in BigQuery using terraform,

  • Load manifesto CSV data to BigQuery using python script

  • Compare Manifesto record count to BigQuery record count data

1. Create a manifesto table using terraform

Here we are going to use the same dataset which we have in created previous blog Create a BigQuery dataset and table using terraform we will be deploying our manifesto table in the same dataset by using dataset id

main.tf

main.tf

variable.tf

variable.tf

terraform.tfvars

terraform.tfvars

2. Load manifesto data from CSV file to BigQuery using python

Here python script we can use to load data from a CSV file in the GCP bucket

load data python script

In this python script need to add your project id and your bucket name

3. Compare BigQuery table record count to manifesto record count

compare record count using SQL

We need to run this script on BigQuery

Output

After running the SQL query

Record count output

Conclusion

In this blog, we have created a BigQuery manifesto table, load CSV data to a table and then SQL query to compare the manifesto record to the BigQuery table. Here you can get all the scripts from the GitHub account and an Excel sheet from google Sheets.

Reference

Google Docs

Loading CSV data from Cloud Storage

tbl_manifesto Sheet1

GitHub - sudovazid/manifesto: This is terraform for manifesto file

More from this blog

V

Vazid | Blog

9 posts

Multi-Cloud DevOps Engineer | Skilled in AWS, GCP, Azure cloud environments.