Monday, August 31, 2020

How to create target .csv file out of a SQL query in SAP Data Services

 1. After preparing the structure of your source to query transform. Right click on the out schema in query transform and select 'Create File Format':


2. Provide the details and use it as Flag File template:


3. Make it as target in the dataflow mapping:


4. Double click the target file and set up the file location and name:


There you go!

Thursday, April 16, 2020

Python to retrieve csv attachment from saved outlook msg file

1. import win32com.client
if you got this error:   ModuleNotFoundError: No module named 'win32com'
then install the module first:
    $ python3 -m pip install pywin32
  $ python3 -m pip install pypiwin32
2. Use absolute path
3. Outlook installed
4. Close(or Open and then Close) Outlook if you see error like:




for idx, file in enumerate(msg_files):
    # get file date 02-Apr-2020
    print(f'{idx+1}/{number_of_files}: {file}')
    date_str = re.split(' |\.', file)[-2]
    print(date_str)

    # Read attachments
    outlook = win32com.client.Dispatch('Outlook.Application').GetNamespace('MAPI')
    msg = outlook.OpenSharedItem(file)
    att = msg.Attachments

    for i in att:

        csv_file_name = os.path.join(csv_file_dir, f'{csv_file_prefix}-{date_str}.csv')
        i.SaveAsFile(csv_file_name)

To get error message from an error code:
import win32api
s = win32api.FormatMessage(-2147352565)
print(s)

s = win32api.FormatMessage(-2147287008)
print(s)

s = win32api.FormatMessage(-2147352567)
print(s)

Wednesday, December 18, 2019

Pandas dataframe to get column names as a list


Compare two SQL queries with pandas dataframe comparison

Compare the the two queries return the same data:
  1. query 1 from QA
  2. query 2 from PROD
Both queries use the same query statement:
SELECT *
FROM MY_HANA_VIEW
WHERE MY_CONDITION

1. Get data from DB
df_qa = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_qa)
df_prod = pd.read_sql_query(QUERY_DIM_LEI_RATING, engine_prod)

2. Sort by columns in place
columns_list = df_qa.columns.values.tolist()
df_qa.sort_values(by=columns,inplace=True)
df_prod.sort_values(by=columns, inplace = True)

3. Reset index in place
Drop the existing index and replace with the reset one
df_qa.reset_index(drop=True, inplace=True)
df_prod.reset_index(drop=True, inplace=True)

4. Assert frame equal
assert_frame_equal(df_qa, df_prod)

Note that you do not need step 2&3 if the columns have been ordered in SQL query statement, e.g.:
SELECT c1, c2, c3
FROM MY_HANA_VIEW
WHERE MY_CONDITION
order by c1, c2, c3

Wednesday, December 4, 2019

Load Thomson Reuters LEI to HANA with SAP DataService + Python

This blog is to consume TR REST API using SAP DS and Python to load TR LEI information to HANA database:

1. Create a DataFlow with 3 objects
    SQL: query HANA view to get the LEI identifiers which will be put into the payload of REST API
    User Defined Base Transform: this is where the Python code accessing REST API and processing coming response
    Table: the database table to save the data


2. Qquery HANA view to get the LEI identifiers


3. Set the input for Python processing


4. Bring up the "User Defined Editor", here it's Python


5. Set up the output of Python processing


We'll use 'Per Collection' mode


Save the final data to Collection(the data records collection, this is the output data)


With a solution using Python in DS, it is flexible and powerful for data loading and processing.
The only thing I dislike the is the integrated Python editor in SAP DS.

Note that SAP DS 4.2 support Python 2.7 only. Also the default library accessing REST API is urllib/urllib2. I'd like to install 'pip' and then install 'requests' for REST API consumption.

Tuesday, July 23, 2019

Cant' open lib libdemoabc.so

Sometimes your run into this error for Linux applications:
    Can't open lib /usr/abc/libdemoabc.so
The above error message is quite misleading. Actually the lib does exist. but some of its dependent libs are missing. Setting up the LD_LIBRARY_PATH solves the problem.

Watch files in a directory

Linux: inotifywatch in inotify-tools
Windows: FileSystemWatcher

Reference:
https://www.howtogeek.com/405468/how-to-perform-a-task-when-a-new-file-is-added-to-a-directory-in-linux/
https://gallery.technet.microsoft.com/scriptcenter/Powershell-FileSystemWatche-dfd7084b