
Showing posts with label Data Analytics. Show all posts
Showing posts with label Data Analytics. Show all posts
Wednesday, December 18, 2019
Compare two SQL queries with pandas dataframe comparison
Compare the the two queries return the same data:
Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
- query 1 from QA
- query 2 from PROD
Both queries use the same query statement:
SELECT * FROM MY_HANA_VIEW WHERE MY_CONDITION
1. Get data from DB
df_qa = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_qa)
df_prod = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_prod)
2. Sort by columns in place
columns_list = df_qa.columns.values.tolist()
df_qa.sort_values(by=columns,inplace=True)
df_prod.sort_values(by=columns, inplace = True)
3. Reset index in place
Drop the existing index and replace with the reset one
df_qa.reset_index(drop=True, inplace=True)
df_prod.reset_index(drop=True, inplace=True)
4. Assert frame equal
assert_frame_equal(df_qa, df_prod)
Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
SELECT c1, c2, c3 FROM MY_HANA_VIEW WHERE MY_CONDITION order by c1, c2, c3
Subscribe to:
Posts (Atom)