Wednesday, December 18, 2019

Pandas dataframe to get column names as a list


Compare two SQL queries with pandas dataframe comparison

Compare the the two queries return the same data:
  1. query 1 from QA
  2. query 2 from PROD
Both queries use the same query statement:
SELECT *
FROM MY_HANA_VIEW
WHERE MY_CONDITION

1. Get data from DB
df_qa = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_qa)
df_prod = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_prod)

2. Sort by columns in place
columns_list = df_qa.columns.values.tolist()
df_qa.sort_values(by=columns,inplace=True)
df_prod.sort_values(by=columns, inplace = True)

3. Reset index in place
Drop the existing index and replace with the reset one
df_qa.reset_index(drop=True, inplace=True)
df_prod.reset_index(drop=True, inplace=True)

4. Assert frame equal
assert_frame_equal(df_qa, df_prod)

Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
SELECT c1, c2, c3
FROM MY_HANA_VIEW
WHERE MY_CONDITION
order by c1, c2, c3

Wednesday, December 4, 2019

Load Thomson Reuters LEI to HANA with SAP DataService + Python

This blog is to consume TR REST API using SAP DS and Python to load TR LEI information to HANA database:

1. Create a DataFlow with 3 objects
    SQL: query HANA view to get the LEI identifiers which will be put into the payload of REST API
    User Defined Base Transform: this is where the Python code accessing REST API and processing coming response
    Table: the database table to save the data


2. Qquery HANA view to get the LEI identifiers


3. Set the input for Python processing


4. Bring up the "User Defined Editor", here it's Python


5. Set up the output of Python processing


We'll use 'Per Collection' mode


Save the final data to Collection(the data records collection, this is the output data)


With a solution using Python in DS, it is flexible and powerful for data loading and processing.
The only thing I dislike the is the integrated Python editor in SAP DS.

Note that SAP DS 4.2 support Python 2.7 only. Also the default library accessing REST API is urllib/urllib2. I'd like to install 'pip' and then install 'requests' for REST API consumption.

Tuesday, July 23, 2019

Cant' open lib libdemoabc.so

Sometimes your run into this error for Linux applications:
    Can't open lib /usr/abc/libdemoabc.so
The above error message is quite misleading. Actually the lib does exist. but some of its dependent libs are missing. Setting up the LD_LIBRARY_PATH solves the problem.

Watch files in a directory

Linux: inotifywatch in inotify-tools
Windows: FileSystemWatcher

Reference:
https://www.howtogeek.com/405468/how-to-perform-a-task-when-a-new-file-is-added-to-a-directory-in-linux/
https://gallery.technet.microsoft.com/scriptcenter/Powershell-FileSystemWatche-dfd7084b

Friday, May 10, 2019

k8s Liveness vs Readiness

Liveness: The kubelet uses liveness probes to know when to restart a Container. 
Readiness: The kubelet uses readiness probes to know when a Container is ready to start accepting traffic.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Wednesday, May 8, 2019

Share directory over HTTP with Python

cd the directory you want to share, and
# For Python >=2.4
python -m SimpleHTTPServer 8888
# For Python 3.x
python3 -m http.server 8888
Then you can access this http server with: http://your_host_ip:8888

Friday, May 3, 2019

PlantUML example



@startuml
hide footbox
title HANA CDN Service
database HANA
== Initialization ==
CDN -> Kafka: get Last PFCDIK of today
CDN <- Kafka: Last PFCDIK of today
note over CDN: set Last PFCDIK as 0 if no PFCDIK was found for today

|||

== Poll/Publish delta changes ==
loop every 1 minute
    CDN -> HANA: changes since Last PFCDIK of today
    CDN <- HANA: delta changes
    CDN -> Kafka: publish delta changes to topic
    CDN -> CDN: update local Last PFCDIK of today
end





Thursday, April 25, 2019

Base64 encoded string to Decimal

We are using kafka-connect-jdbc to streaming data out of HANA Views, and the decimal columns are now saved as base64 encoded strings in Kafka.

To decode it with Python:
    """Convert a base64 encoded string to decimal
       b64str: the base64 encoded string
       Example: 'ATFvqA==' -> 20017064,  'JA==' -> 36
    """
    def b64_string_to_decimal(b64str):
        decoded_bytes = base64.b64decode(b64str)
        decimal_value = decimal.Decimal(int.from_bytes(decoded_bytes, byteorder='big'))
        return decimal_value

To decode it with Java:
    /*
     * Convert a base64 encoded string to decimal
     * b64str: the base64 encoded string
     * Example: 'ATFvqA==' -> 20017064,  'JA==' -> 36
     */
    public BigDecimal base64StringToDecimal(String b64String) {
        BigDecimal bigDecimal = new BigDecimal(new
BigInteger(Base64.getDecoder().decode(b64String)));
        return bigDecimal;
    }