Wednesday, December 18, 2019

Compare two SQL queries with pandas dataframe comparison

Compare the the two queries return the same data:
  1. query 1 from QA
  2. query 2 from PROD
Both queries use the same query statement:
SELECT *
FROM MY_HANA_VIEW
WHERE MY_CONDITION

1. Get data from DB
df_qa = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_qa)
df_prod = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_prod)

2. Sort by columns in place
columns_list = df_qa.columns.values.tolist()
df_qa.sort_values(by=columns,inplace=True)
df_prod.sort_values(by=columns, inplace = True)

3. Reset index in place
Drop the existing index and replace with the reset one
df_qa.reset_index(drop=True, inplace=True)
df_prod.reset_index(drop=True, inplace=True)

4. Assert frame equal
assert_frame_equal(df_qa, df_prod)

Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
SELECT c1, c2, c3
FROM MY_HANA_VIEW
WHERE MY_CONDITION
order by c1, c2, c3

No comments: