Keep It Simple Stupid: 2018

Tuesday, October 30, 2018

parameter in Kotlin Primary constructor

var/val within constructor declares a property inside the class. When you do not write it, it is simply a parameter passed to the primary constructor, where you can access the parameters within the **init** block or use it initilize other properties. Constructor parameter is never used as a property.

SAP HANA: get max record for a group

1. With Rank node

2. With Aggregation/Join node

Performance:

Rank node wins.

Tuesday, May 29, 2018

2d array in python3

m = 5
n = 3

a = [[0 for x in range(n)] for y in range(m)]

Or a shorter version:
a = [[0]*n for y in range(m)]

Note: shortening this to something like the following does not really work since you end up with 5 copies of the same list, so when you modify one of them, they all change.
a = [[0]*n]*m
print(a)
a[1][2] = 3
print(a)

[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 3], [0, 0, 3], [0, 0, 3], [0, 0, 3], [0, 0, 3]]

You can use [0] * n since Python cannot create a reference to the value 0(it's not an object) and this produces [0,0,0]. Then if you pretend you had a variable x = [0,0,0] then

c1 = x * 5
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

c2 = [x] * 5
[[0, 0, 0], [0, 0, 3], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 22, 0], [0, 22, 0], [0, 22, 0], [0, 22, 0], [0, 22, 0]]

Thursday, April 19, 2018

download notebooks/training set/test set from Coursera

Go to the home of the coursera-notebook hub
Create a new python notebook
Execute !tar cvfz allfiles.tar.gz * in a cell
Download the archive !

Enjoy!

If the resulting archive is too big and you can't download it

Open the python notebook where you executed last command and execute the following in a cell:

!split -b 200m allfiles.tar.gz allfiles.tar.gz.part.

This will split the archive into 200Mb blocks that you can download without a problem (if there is still a problem reduce the size by changing 200m to a lower value)

Then when you have downloaded all the split files reunite them on your system using the following command line (in a linux environment, or use cmder if you are on Windows):

cat allfiles.tar.gz.part.* > allfiles.tar.gz

PS: This is in fact valid in any Jupyter-notebook hub

There is simpler way. Go to Notebook's file manager, click "New" then "Terminal", boom - you have a full terminal where you can run any commands you want (like tar).

https://github.com/coursera-dl/coursera-dl

Saturday, March 31, 2018

Break training data into slices for stochastic gradient decent:

import numpy as np
n = 100
training_data = list(range(n))
mini_batch_size = 10
np.random.shuffle(training_data)
mini_batches = [training_data[k:k+mini_batch_size]
    for k in range(0, n, mini_batch_size)]
mini_batches

[[90, 5, 70, 82, 58, 2, 16, 85, 12, 35],
[14, 54, 62, 39, 96, 73, 60, 80, 33, 89],
[20, 38, 76, 47, 65, 42, 71, 46, 93, 34],
[52, 64, 13, 92, 17, 49, 88, 63, 74, 23],
[43, 25, 10, 97, 48, 68, 95, 81, 24, 31],
[9, 32, 84, 83, 22, 87, 61, 26, 28, 99],
[0, 67, 30, 69, 72, 45, 79, 51, 40, 55],
[6, 15, 75, 66, 29, 3, 18, 77, 98, 21],
[53, 44, 50, 19, 91, 8, 11, 59, 27, 56],
[36, 94, 7, 57, 1, 37, 86, 78, 41, 4]]

Thursday, March 29, 2018

Indices in Python list

You may feel uncomfortable with Python indices at the beginning. But it is really convenient if you understand it. You'd love its simplicity actually:

>>> a = list(range(10))
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a[::2]
[0, 2, 4, 6, 8]
>>> a[1::2]
[1, 3, 5, 7, 9]
>>> a[::-2]
[9, 7, 5, 3, 1]
>>> a[1::-2]
[1]
>>> a[1:8]
[1, 2, 3, 4, 5, 6, 7]
>>> a[1:-2]
[1, 2, 3, 4, 5, 6, 7]
>>> a[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>> a[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> a[:100]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a[7:100]
[7, 8, 9]

Reference:

https://docs.python.org/3/tutorial/introduction.html#strings

Monday, March 26, 2018

Linux library naming conventions

[root@localhost lib]# ls -lrt |grep libodbc.so
-rwxr-xr-x 1 root root 1804447 May 29 2013 libodbc.so.2.0.0
lrwxrwxrwx 1 root root 16 Nov 6 2014 libodbc.so.2 -> libodbc.so.2.0.0
lrwxrwxrwx 1 root root 16 Nov 6 2014 libodbc.so -> libodbc.so.2.0.0

Real name: libodbc.so.2.0.0

SONAME: libodbc.so.2

Linker name: libodbc.so

gcc "-lodbc" will seek for libodbc.so(a link or a file).

The depended library is the SONAME: libodbc.so.2

Print SONAME of a shared library:

[root@localhost lib]# objdump -p libodbc.so |grep 'SONAME' |awk -F ' ' '{print $2}'

libodbc.so.2

[root@localhost lib]# readelf -d libodbc.so |grep soname

0x000000000000000e (SONAME) Library soname: [libodbc.so.2]

Reference:

Every shared library has a special name called the ``soname''. The soname has the prefix ``lib'', the name of the library, the phrase ``.so'', followed by a period and a version number that is incremented whenever the interface changes (as a special exception, the lowest-level C libraries don't start with ``lib''). A fully-qualified soname includes as a prefix the directory it's in; on a working system a fully-qualified soname is simply a symbolic link to the shared library's ``real name''.

Every shared library also has a ``real name'', which is the filename containing the actual library code. The real name adds to the soname a period, a minor number, another period, and the release number. The last period and release number are optional. The minor number and release number support configuration control by letting you know exactly what version(s) of the library are installed. Note that these numbers might not be the same as the numbers used to describe the library in documentation, although that does make things easier.

In addition, there's the name that the compiler uses when requesting a library, (I'll call it the ``linker name''), which is simply the soname without any version number.

http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html

makefile: execute a command and grep then awk

CP=cp
LIB_UNIXODBC=/usr/src/tpkgs/unixodbc/2.3.1/linux86w/lib/libodbc.so
RELEASE_DESTDIR=/bld/release/nsr/fb_mssql_linux/linux86w/source

$(CP) $(LIB_UNIXODBC) $(RELEASE_DESTDIR)/ddbda/odbc/$(shell objdump -p $(LIB_UNIXODBC) |grep
'SONAME' |awk -F ' ' '{print $$2}')

which equals to command line:

cp -f /usr/src/tpkgs/unixodbc/2.3.1/linux86w/lib/libodbc.so /bld/release/nsr/fb_mssql_linux/linux86w/source/ddbda/odbc/libodbc.so.2

Note:

1. shell to execute a command in a makefile

2. not like that in bash command line, the grep string is marked with single quotes.

3. there are double '$' in the awk statement.

Monday, March 5, 2018

pandas read_csv from https with Python 3.6.4

On Mac OSX, if you are using Python 3.6 and pandas to try to read a csv file via https:

california_housing_dataframe = pd.read_csv("https://storage.googleapis.com/mledu-datasets/california_housing_train.csv", sep=",")
california_housing_dataframe.describe()

you may get an error like:
urllib.error.URLError:

To fix this issue:

Open a terminal and take a look at:

/Applications/Python 3.6/Install Certificates.command

Python 3.6 on MacOS uses an embedded version of OpenSSL, which does not use the system certificate store. More details here.

(To be explicit: MacOS users can probably resolve by opening Finder and double clicking Install Certificates.command)

Or read https csv with a workaround:

from io import StringIO

import pandas as pd
import requests

url = "https://storage.googleapis.com/mledu-datasets/california_housing_train.csv"
s = requests.get(url).text
c = pd.read_csv(StringIO(s))
print(c.head())

Sunday, March 4, 2018

numpy.array vs numpy.asarray

Looking at the definition, you'll see the difference between them:

def asarray(a, dtype=None, order=None):
    return array(a, dtype, copy=False, order=order)

The main difference is that array (by default) will make a copy of the object, while asarray will not unless necessary.

The difference can be demonstrated by this example:

generate a matrix

>>> A = numpy.matrix(np.ones((3,3)))
>>> A
matrix([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

use numpy.array to modify A. Doesn't work because you are modifying a copy

>>> numpy.array(A)[2]=2
>>> A
matrix([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

use numpy.asarray to modify A. It worked because you are modifying A itself

>>> numpy.asarray(A)[2]=2
>>> A
matrix([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 2.,  2.,  2.]])

Saturday, March 3, 2018

Install Emacs on Mac OSX with brew

$ brew cask install emacs
Reference: https://www.gnu.org/software/emacs/download.html#macos

If you run into this error while access Desktop/Documents/Downlaods directory:

Here is the fix:

Tuesday, February 27, 2018

Python None vs Empty list

>>> a=[]
>>> b=None
>>> type(a),type(b)
(<class 'list'>, <class 'NoneType'>)
>>> not a, not b
(True, True)
>>> a is None, b is None
(False, True)
>>> a is not None, b is not None
(True, False)
>>> a,b
([], None)

Tuesday, February 20, 2018

best way to check if a list is empty in Python3

Do it with:

if not a:
    print("a is an empty list.")

instead of:

if len(a):
    print("a is an empty list.")

Reference: Official Python programming recommendations
See the discussions on Stack Overflow

/ vs // in python3

Code speaks:

>>> a=b=5
>>> a,b
(5, 5)
>>> type(a), type(b)
(<class 'int'>, <class 'int'>)
>>> a/=2
>>> b//=2
>>> a,b
(2.5, 2)
>>> type(a), type(b)
(<class 'float'>, <class 'int'>)
>>> alist = list(range(10))
>>> alist
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> alist[:a]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: slice indices must be integers or None or have an __index__ method
>>> alist[:b]
[0, 1]

Note if you are trying calculating the indices with '/' you would get trouble as showed above.