
Wednesday, December 18, 2019
Compare two SQL queries with pandas dataframe comparison
Compare the the two queries return the same data:
Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
- query 1 from QA
- query 2 from PROD
Both queries use the same query statement:
SELECT * FROM MY_HANA_VIEW WHERE MY_CONDITION
1. Get data from DB
df_qa = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_qa)
df_prod = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_prod)
2. Sort by columns in place
columns_list = df_qa.columns.values.tolist()
df_qa.sort_values(by=columns,inplace=True)
df_prod.sort_values(by=columns, inplace = True)
3. Reset index in place
Drop the existing index and replace with the reset one
df_qa.reset_index(drop=True, inplace=True)
df_prod.reset_index(drop=True, inplace=True)
4. Assert frame equal
assert_frame_equal(df_qa, df_prod)
Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
SELECT c1, c2, c3 FROM MY_HANA_VIEW WHERE MY_CONDITION order by c1, c2, c3
Wednesday, December 4, 2019
Load Thomson Reuters LEI to HANA with SAP DataService + Python
This blog is to consume TR REST API using SAP DS and Python to load TR LEI information to HANA database:
1. Create a DataFlow with 3 objects
SQL: query HANA view to get the LEI identifiers which will be put into the payload of REST API
User Defined Base Transform: this is where the Python code accessing REST API and processing coming response
Table: the database table to save the data

2. Qquery HANA view to get the LEI identifiers

3. Set the input for Python processing

4. Bring up the "User Defined Editor", here it's Python

5. Set up the output of Python processing

We'll use 'Per Collection' mode

Save the final data to Collection(the data records collection, this is the output data)

With a solution using Python in DS, it is flexible and powerful for data loading and processing.
The only thing I dislike the is the integrated Python editor in SAP DS.
Note that SAP DS 4.2 support Python 2.7 only. Also the default library accessing REST API is urllib/urllib2. I'd like to install 'pip' and then install 'requests' for REST API consumption.
1. Create a DataFlow with 3 objects
SQL: query HANA view to get the LEI identifiers which will be put into the payload of REST API
User Defined Base Transform: this is where the Python code accessing REST API and processing coming response
Table: the database table to save the data

2. Qquery HANA view to get the LEI identifiers

3. Set the input for Python processing

4. Bring up the "User Defined Editor", here it's Python

5. Set up the output of Python processing

We'll use 'Per Collection' mode

Save the final data to Collection(the data records collection, this is the output data)

With a solution using Python in DS, it is flexible and powerful for data loading and processing.
The only thing I dislike the is the integrated Python editor in SAP DS.
Note that SAP DS 4.2 support Python 2.7 only. Also the default library accessing REST API is urllib/urllib2. I'd like to install 'pip' and then install 'requests' for REST API consumption.
Subscribe to:
Posts (Atom)