Skip to content

Python

python is an interpreted language, that looks a lot like pseudocode.

Syntax

The formal definition via BNF grammar is the following (from the official documentation)

compound_stmt ::=  if_stmt
                   | while_stmt
                   | for_stmt
                   | try_stmt
                   | with_stmt
                   | match_stmt
                   | funcdef
                   | classdef
                   | async_with_stmt
                   | async_for_stmt
                   | async_funcdef
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]

that in practice means that a compound statement is composed of a header starting with a specific keyword and ending with a colon, followed by a line of statements separated by semicolons or a list of statements indented with respect the correspective header. These two last cases form a suite.

Data Types

The standard type hierarchy

Name Description
None particular type used as a null that is only representative of its type
numbers.Number representation of numerical entities

https://docs.python.org/3/library/stdtypes.html?

container here means that it's not limited to a single data type but can have mixed types together. The opposite of container is flat.

hashable: object that has a hash value which never changes during its lifetime

Name Description Mutable Container
int, float and complex unlimited precision numerical types
range sequence of numbers
tuple built-in sequence
list built-in sequence
str text sequence
bytes bytes sequence
bytearray bytes sequence
set unordered collection of hashable objects
frozenset unordered collection of hashable objects that is hashable
dict mapping from hashable objects to an arbitrary object
memoryview view to internal data of objects

From a practical point of view, see here the time complexity associated with the data types.

Sequences/containers can implement a particular sub-type that is the iterator: in practice you tell the external world that your object supports iteration via the __iter__() method that returns the actual iterator.

The iterator must implement the __next__() method that returns the next element in the sequence. When the sequence has not more element, this method must raise StopIteration.

Related to this exists the generator type, roughly speaking a function that using the yield keyword allows to build an iterator. Take in mind that has other methods other the ones from the iterator protocols, like send(), throw() and close().

For example, directly from the documentation

>>> def echo(value=None):
...     print("Execution starts when 'next()' is called for the first time.")
...     try:
...         while True:
...             try:
...                 value = (yield value)
...             except Exception as e:
...                 value = e
...     finally:
...         print("Don't forget to clean up when 'close()' is called.")
...
>>> generator = echo(1)
>>> print(next(generator))
Execution starts when 'next()' is called for the first time.
1
>>> print(next(generator))
None
>>> print(generator.send(2))
2
>>> generator.throw(TypeError, "spam")
TypeError('spam',)
>>> generator.close()
Don't forget to clean up when 'close()' is called.

It exists also a generator expression

>>> sum(i*i for i in range(10))         # sum of squares 0, 1, 4, ... 81
285

Dictionary

A particular to keep in mind when interacting with dictionary is that the objects returned by dict.keys(), dict.values() and dict.items() are view objects. They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes.

Objects

Classes are not "subclasses" of type but instances of it

Name Description
__new__()
__init__()
__del__()
__hash__()
__str__()
__repr__()
__bytes__() Called by bytes to compute a byte-string representation of an object. This should return a bytes object.
__format__() Called by the format() built-in function, and by extension, evaluation of formatted string literals and the str.format() method, to produce a “formatted” string representation of an object.
__weak_ref__()
__slots__()
__copy__() Used to define the implementation of a copy used by the copy module

Take in mind that exists two convention for internal attributes on an object

  • if the name starts with _ is considered "internal"
  • if the name starts with __ is considered "private" but also the interpreter mangles the name so that __<name> becomes _<class name>__<name>

Structural pattern matching

Introduced in python 3.10 via PEP-638 PEP-636 and PEP-634

match <expression>:
    case <pattern> [guard]:
        <block>

where <expression> is whatever python expression returns something that might match with the <pattern> and optionally must "pass" the guard expression.

The simplest match is the "literal" matching, where you are trying to match a constant, a value; a more complex pattern matching is one that cause name bindings. When you use an identifier as pattern then on matching the value will be bounded to that name for the scope of the subsequent block.

Note: if you want to use a value coming from an attribute, to avoid the name binding you need to use a qualified name (an unqualified name is a name without dots).

Note: there is difference between (<pattern>) and [<pattern>] or (<pattern>,). The first is a group pattern, the second a sequence pattern.

Here some practical examples

    match op.opcode, *args:
        case Operatore.Store, Constant() as offset, Click(offset=(x, y)):
            ...

Operatore.Store is matching with a literal, Constant() is matching with a type and binding the parameter is matching with the name offset; the last one looks for an element of type Click that has a tuple of two elements associated with the attribute offset and binds this two elements to the name x and y.

Coroutines

Introduced with PEP 492, the syntax as indicated from the official documentation

async_funcdef ::=  [decorators] "async" "def" funcname "(" [parameter_list] ")"
                   ["->" expression] ":" suite
async_for_stmt ::=  "async" for_stmt
async_with_stmt ::=  "async" with_stmt

In the following code

async def read_data(db):
    data = await db.fetch('SELECT ...')

await suspends execution like yield from; it accepts only an "awaitable" (raises a TypeError doing otherwise)

Packaging

  • python-packaging.readthedocs.io
  • http://www.scotttorborg.com/python-packaging/index.html
  • http://nvie.com/posts/pin-your-packages/
  • http://tech.marksblogg.com/better-python-package-management.html
  • Value error Attempted relative import in non-package

Typing

For python3.7+, you can indicate that the function returns an istance of the enclosing class

from __future__ import annotations

class Position:
    def __add__(self, other: Position) -> Position:
            ...

Internals

Metaclasses and introspection

TESTS

pytest

@pytest.mark.parametrize('count', [
    0, 1, 6, 17,
])
def test_tree42(count):
    values = list(range(count))

    bt = XBinarySearchTree.from_array(values)

    assert list(bt.inorder_traversal()) == values
def test_myoutput(capsys):  # or use "capfd" for fd-level
    print("hello")
    sys.stderr.write("world\n")
    captured = capsys.readouterr()
    assert captured.out == "hello\n"
    assert captured.err == "world\n"
    print("next")
    captured = capsys.readouterr()
    assert captured.out == "next\n"
@pytest.mark.skip(reason="no way of currently testing this")
def test_the_unknown():
    ...

BEST PRACTICES

  • PEP8: Style Guide for Python Code
  • Design pattern in python
  • dict() vs {} (hint: {} is better)
  • http://excess.org/article/2011/12/unfortunate-python/
  • http://www.canonical.org/~kragen/isinstance/
  • http://www.artima.com/weblogs/viewpost.jsp?thread=236278
  • http://satyajit.ranjeev.in/2012/05/17/python-a-few-things-to-remember.html
  • http://net.tutsplus.com/tutorials/python-tutorials/behavior-driven-development-in-python/
  • Things you didn't know about Python: interesting presentation about Python internal and stuff.
  • Copying list, the right way
  • Make one archive python executable
  • HOWTO Create Python GUIs using HTML
  • Slides about functional versus imperative programming
  • MRO: from official documentation and a post about multiple inheritance (look at also the comments)
  • http://ozkatz.github.com/improving-your-python-productivity.html
  • http://ozkatz.github.com/better-python-apis.html
  • Lazy evaluation
  • https://speakerdeck.com/rwarren/a-brief-intro-to-profiling-in-python
  • http://pyvideo.org/video/1674/getting-started-with-automated-testing
  • http://hynek.me/talks/python-deployments/
  • http://pyrandom.blogspot.nl/2013/04/super-wrong.html
  • Python’s super() considered super!
  • http://www.huyng.com/posts/python-performance-analysis/
  • https://tommikaikkonen.github.io/timezones/
  • format()
  • pyformat.info/

Multithreading&Multiprocessing

import subprocess

with subprocess.Popen(['echo', 'Hello from the child!'], stdout=subprocess.PIPE) as proc:
  out, err = proc.communicate()
  print(out.decode('utf-8'))

  proc.kill()
import threading

def job():
    print("whatever")

thread = threading.Thread(target=job)
thread.start()

Exceptions

From the official documentation

try_stmt  ::=  try1_stmt | try2_stmt | try3_stmt
try1_stmt ::=  "try" ":" suite
               ("except" [expression ["as" identifier]] ":" suite)+
               ["else" ":" suite]
               ["finally" ":" suite]
try2_stmt ::=  "try" ":" suite
               ("except" "*" expression ["as" identifier] ":" suite)+
               ["else" ":" suite]
               ["finally" ":" suite]
try3_stmt ::=  "try" ":" suite
               "finally" ":" suite

The optional else clause is executed if the control flow leaves the try suite, no exception was raised, and no return, continue, or break statement was executed. Exceptions in the else clause are not handled by the preceding except clauses.

The finally clause is always executed, also in case the try has a return, break or continue and since the last return is what counts in a function, a return in the finally superseed the previous encountered one.

LIBRARIES

  • https://github.com/kennethreitz/envoy
  • https://github.com/kennethreitz/requests
  • http://www.nicosphere.net/clint-command-line-library-for-python/
  • Docopts command line arguments parser for Human Beings.
  • Get started with the Natural Language Toolkit
  • pdb++ pdb++, a drop-in replacement for pdb (the Python debugger)
  • napari/napari a fast, interactive, multi-dimensional image viewer for python
  • pydantic Data validation and settings management using python type annotations.
  • pySDR

Scientific

Numpy

Matplotlib

Scipy

Pandas

Interesting Stuffs

  • https://jordan-wright.github.io/blog/2014/10/06/creating-tor-hidden-services-with-python/

SANDBOX

  • http://wiki.python.org/moin/Asking%20for%20Help/How%20can%20I%20run%20an%20untrusted%20Python%20script%20safely%20%28i.e.%20Sandbox%29
  • Example of pypy-c-sandbox for launching random scripts
  • http://stackoverflow.com/questions/6655258/using-the-socket-module-in-sandboxed-pypy
  • http://pypy.readthedocs.org/en/latest/sandbox.html
  • http://blog.delroth.net/2013/03/escaping-a-python-sandbox-ndh-2013-quals-writeup/
  • Python "sandbox" escape

Instructions for pypy-2.1

$ cd pypy/goal
$ python ../../rpython/bin/rpython  -O2 --sandbox targetpypystandalone.py
$ PYTHONPATH=$PYTHONPATH:$PWD/../../ ../..//pypy/sandbox/pypy_interact.py pypy-c

DEBUG&Profiling

  • Performance analysis
  • CProfile
  • https://stripe.com/blog/exploring-python-using-gdb
  • scalene is a high-performance CPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do

IDE

  • http://blog.dispatched.ch/2009/05/24/vim-as-python-ide/
$ python -m shlex
kdkd
Token: 'kdkd'
34 5455
Token: '34'
Token: '5455'
$edx=34
Token: '$'
Token: 'edx'
Token: '='
Token: '34'

Time

  • http://www.saltycrane.com/blog/2008/11/python-datetime-time-conversions/
  • http://stackoverflow.com/questions/2775864/python-datetime-to-unix-timestamp

COOKBOOK

>>> a = [1,4,-1,0,13]
>>> a.sort()
>>> a
[-1, 0, 1, 4, 13]
>>> import operator
>>> x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
>>> sorted_x = sorted(x.iteritems(), key=operator.itemgetter(1))

Two's complement

>>> value = 0xb59395a9
>>> f"{ctypes.c_uint32(value).value:032b}"
'10110101100100111001010110101001'
>>> f"{ctypes.c_uint32(~value).value:032b}"
'01001010011011000110101001010110'

Getopt

import getopt, sys

def main():
    try:
        opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
    except getopt.GetoptError as err:
        # print help information and exit:
        print(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)
    output = None
    verbose = False
    for o, a in opts:
        if o == "-v":
            verbose = True
        elif o in ("-h", "--help"):
            usage()
            sys.exit()
        elif o in ("-o", "--output"):
            output = a
        else:
            assert False, "unhandled option"
    # ...

if __name__ == "__main__":
    main()

argparse

def argparse_vendor_product(value):
    vendor, product = tuple(value.split(":"))

    return int(vendor, 16), int(product, 16)


def parse_args():
    args = argparse.ArgumentParser(description='upload and run some code')

    args.add_argument(
        '--device',
        type=argparse_vendor_product,
        required=True,
        help="vendor:product of the device you want to interact with")
    args.add_argument('--binary', required=True)
    args.add_argument('--address', type=functools.partial(int, base=0))

    return args.parse_args()

PySerial

import serial
ser = serial.Serial('/dev/ttyUSB0')  # open serial port
print(ser.name)         # check which port was really used
ser.write(b'hello')     # write a string
ser.close()

Decorator

def trace(f):
    def _inner(*args, **kwargs):
        print ' # ', f.func_name
        return f(*args, **kwargs)
    return _inner

def challenge(count):
    def _challenge(x):
        def _inner(*args, **kwargs):
            print('[+] challenge %d' % count)
            return x(*args, **kwargs)
        return _inner
    return _challenge

DOCTESTS

def decript(cipher, key):
    """
    >>> a = [0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1]
    >>> b = [1, 1, 1, 1, 1, 1, 1]
    >>> decript(a, b) #doctest: +NORMALIZE_WHITESPACE
    [[1, 0, 1, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0]]
    """
    r = []
    for i in xrange(0, len(cipher) - len(key) + 1, 7):
        r.append(XOR(cipher[i:i + len(key)], key))

    return r
$ python -m doctest c1.py

SPHINX

It's possible to write the documentation along with the code.

http://sphinx.pocoo.org/markup/toctree.html#toctree-directive

  • https://wiki.python.org/moin/TimeComplexity

Maximum float

Source:

>>> infinity = float("inf")
>>> infinity
inf
>>> infinity / 10000
inf

Print out some docstring for documentation purpose

python -c 'from macro import matrixify;print(matrixify.__doc__.replace("\n    ", "\n"))' | rst2html

Logging

  • Documentation
  • http://victorlin.me/2012/08/good-logging-practice-in-python/
  • http://hynek.me/articles/taking-some-pain-out-of-python-logging/
  • Multi line formatting
  • http://victorlin.me/posts/2012/08/26/good-logging-practice-in-python

Remember that logger.basicConfig() attaches the stream handler by default, if you want to fine tune the logging you have to set it by yourself.

import logging
import os

logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(os.getenv("LOG") or "INFO")

It's possible to define a custom level like SUBDEBUG (http://stackoverflow.com/a/16955098/1935366)

import logging

SUBDEBUG = 5
logging.addLevelName(SUBDEBUG, 'SUBDEBUG')

def subdebug(self, message, *args, **kws):
    self.log(SUBDEBUG, message, *args, **kws) 
logging.Logger.subdebug = subdebug

logging.basicConfig()
l = logging.getLogger()
l.setLevel(SUBDEBUG)
l.subdebug('test')
l.setLevel(logging.DEBUG)
l.subdebug('test')

stream = logging.StreamHandler()
formatter = logging.Formatter('%(levelname)s - %(filename)s:%(lineno)d - %(message)s')

logger = logging.getLogger(__file__)
logger.setLevel(logging.DEBUG)
logger.addHandler(stream)
stream.setFormatter(formatter)

If you want that your logging string impact performance when the level is not used you should let the logger itself doing the formatting: the various logging functions accept a format string with the % style and a list of positional arguments like

logger.debug("this is a string: '%s'", string_to_log)

Flatten list

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]
for x in s:
  if x:
      return True
return False

return any(x)

Traceback

    try:
        _manage_object(pk, *args, **kwargs)
    except:
        obj = Object.objects.get(pk=pk)
        # get the exception context to reuse later
        exc_info = sys.exc_info()
        import traceback
        print traceback.print_tb(exc_info[2])

Read/write UTF8 files

Seems like that the builtin open() in python manage only ascii files

import codecs

def create_post(filepath, content):
    with codecs.open(filepath, 'w+', encoding='utf-8') as f:
        f.write(content)

Get first item of a nested list

>>> from operator import itemgetter
>>> rows = [(1, 2), (3, 4), (5, 6)]
>>> map(itemgetter(1), rows)
[2, 4, 6]
>>>

Extract URL from string

import re

myString = "This is my tweet check it out http://tinyurl.com/blah"

print re.search("(?P<url>https?://[^\s]+)", myString).group("url")

Routing from REGEXs

In [1]: import re

In [2]: c = re.compile(r'^w::(?P<type>\w+)::(?P<id>\d*)::')

In [3]: s = 'w::w::1::'

In [5]: m = c.match(s)

In [6]: m.groupdict()
Out[6]: {'id': '1', 'type': 'w'}

Add file into a tarfile from a string

def elaborate_archive(filepath, **kwargs):
    tar_src = tarfile.open(filepath, mode='a')

    version_file = StringIO.StringIO(kwargs['version'])

    version_tarinfo = tarfile.TarInfo('VERSION')
    version_tarinfo.size = len(version_file.buf)
    tar_src.addfile(version_tarinfo, version_file)
    tar_src.close()

pandas

$ pip install pandas
import pandas as pd

You can read data from a CSV

df = pd.read_csv("/path/to/data")

or create manually one

df = pd.DataFrame({
    "column 1": [data1, data2, ..., dataN],
    "column 2": [...],
    ...
})

To have general information about the DataFrame

df.info()

A nice feature is the filtering

df[(df.duration > = 200) & (df.genre == "Drama")]

It's possible to plot directly

df.plot(x='GE', y=['TOTALE_19', 'TOTALE_20'], figsize=(20, 10))