Wednesday, December 18, 2019

Pandas dataframe to get column names as a list

Compare two SQL queries with pandas dataframe comparison

Compare the the two queries return the same data:
  1. query 1 from QA
  2. query 2 from PROD
Both queries use the same query statement:

1. Get data from DB
df_qa = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_qa)
df_prod = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_prod)

2. Sort by columns in place
columns_list = df_qa.columns.values.tolist()
df_prod.sort_values(by=columns, inplace = True)

3. Reset index in place
Drop the existing index and replace with the reset one
df_qa.reset_index(drop=True, inplace=True)
df_prod.reset_index(drop=True, inplace=True)

4. Assert frame equal
assert_frame_equal(df_qa, df_prod)

Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
SELECT c1, c2, c3
order by c1, c2, c3

Wednesday, December 4, 2019

Load Thomson Reuters LEI to HANA with SAP DataService + Python

This blog is to consume TR REST API using SAP DS and Python to load TR LEI information to HANA database:

1. Create a DataFlow with 3 objects
    SQL: query HANA view to get the LEI identifiers which will be put into the payload of REST API
    User Defined Base Transform: this is where the Python code accessing REST API and processing coming response
    Table: the database table to save the data

2. Qquery HANA view to get the LEI identifiers

3. Set the input for Python processing

4. Bring up the "User Defined Editor", here it's Python

5. Set up the output of Python processing

We'll use 'Per Collection' mode

Save the final data to Collection(the data records collection, this is the output data)

With a solution using Python in DS, it is flexible and powerful for data loading and processing.
The only thing I dislike the is the integrated Python editor in SAP DS.

Note that SAP DS 4.2 support Python 2.7 only. Also the default library accessing REST API is urllib/urllib2. I'd like to install 'pip' and then install 'requests' for REST API consumption.