Cython and PBS integration

Links

Overview

A hook in PBS is a block of Python code that PBS executes at certain events (and sometimes periodically). For supporting hooks execution, PBS has a hooks interface which is also implemented using Python.

Cython is a programming language that is designed to be a superset of Python.

This document attempts to explain how the PBS hooks interface is designed and how Cython can help (or not) in making the hooks infrastructure simpler and making the hooks infrastructure faster.

Introduction to Cython

  1.  The term Cython could mean one of the two things, depending on the context it is used in.

    1. A compiled programming language, designed to be a superset of the Python programming language.

    2. An optimising static compiler for both Python and the extended Cython language.

  2. What is Cython used for

    1. Typically Cython is used for writing CPython extension modules in C, to speed up the execution of Python code.

    2. Wrapping of external C libraries

    3. Embedding CPython into existing applications.
      Cython allows to write Python code that can call back and forth from and to C natively that can interact efficiently with
      large data sets.

  3. Design

    1. The Cython language is close to Python, actually a superset that allows static typing of variables and class atrributes, or
      in other words allows the programmer to declare C types on the variables and class attributes. This minimizes the
      interaction with the Python runtime and allows the Cython compiler to generate optimized C code from Cython/Python
      code. Generating a standard Python module is a 2-step process:

      1. Cython/Python code is translated into C.

      2. The translated C code is then compiled through a C compiler into a shared library (.so on Linux, .pyd on
        Windows), which can then be loaded and used by regular Python code using the import statement.
        Optionally, one can use the embedded mode to generate an executable that embeds the main() function into the
        generated C code. The added main() function has the broilerplate code to load the module. This C code can then be
        used to generate an executable instead of a shared library.

  4. How fast is Cython

    1. Depending on the type of Python code (computational vs I/O) one can use Cython to speedup execution from 1.15x
      (15% improvement) to ~30x (study here). For operations not involving Python’s internals speedup is massive.

  5. Limitations

    1. If Cython compiler comes across Python code that cannot be translated completely to C code, it transforms that code
      into a series of C calls to Python internals. This takes the Python interpreter out of the execution loop, but provides only
      a modest improvement in performance.

    2. Cython being a superset provides support for native Python data structures. For this, Cython simply calls the C APIs
      in the Python runtime that operate on those objects. This provides only a limited speed up.

  6. References:
    a) https://cython.org/
    b) https://en.wikipedia.org/wiki/Cython
    c) https://github.com/cython/cython

PBS Hooks Interface

PBS hooks interface is divided in two parts, one implemented in C and the other in Python. The below diagram tries to explain the interaction between these two parts.
Green boxes correspond to Python modules implemented in C, yellow boxes correspond to Python modules written in Python itself.

  1. libpython library

    1. Handles most of the interaction between a hook and the embedded interpreter.

    2. Start/Stop the Python interpreter

    3. Loading of _pbs_ifl and _pbs_v1 module

    4. converting PBS types to Python types and vice versa

  2. pbs_ifl and _pbs_ifl modules provide wrappers for PBS IFL API to the hook. _pbs_ifl module is created when the embedded interepreter is started/restarted.

  3. PBS V1 package provides the below modules

    1. _attr_types.py – Python representation of all internal PBS attribute structures.

    2. _base_types.py – Mapping of all Python based types to PBS attribute types.

    3. _exc_types.py – Exceptions for PBS/Python interaction

    4. _export_types.py – This module is to help the embedded interpreter import the python types.

    5.  _svr_types.py – Python types for PBS server objects (server, queue, job, reservation, vnode, events etc.)

  4. _pbs_v1 module – Implements wrappers for the Hooks interface, part of libpython library. Created when the embedded interpreter is started/restarted.

How Cython can help PBS

  • Improving hook execution performance
    For this, we can convert PBS V1 package Python code to Cython and ship the modules as shared objects which would be native code and would run faster than the interpreted Python code.

  • Shipping of PBS hooks as a shared library instead of a Python script
    Cython can also be used to write our PBS hooks and ship them as shared objects. Some boilerplate C code will be needed for this.

  • Doing away with type conversion code
    As Cython can directly operate on C data sets most of the conversion code can be removed. For providing Python data structures to hooks, the Cython module can perform the type conversion.
    The below example program illustrates how Cython seamlessly works with C data types.

Explanation:

fill.pyx - A Cython program that operates on a C struct. It provides a "public" function fill_print()

str_def.h - declares the C struct

Cprog.c - C program that initialises the Python interpreter and loads the module "fill".

Below diagram explains the flow -

Using similar method, we can remove or refactor bunch of functions in libpython that perform conversion from Python classes to C structs and vice versa.

  • Implement pbs_ifl module in Cython
    Currently, we use swig to create pbs_ifl.py that provides wrappers to the _pbs_ifl methods. If the same is implemented using Cython, dependency on swig for building can be removed and some performance improvement can be achieved.
    Item 1, 3 and 4 would mean that we convert the yellow boxes in Hooks Interface from Python to Cython.

Running site hooks using Cython (Experimental)

  • Cython does not execute Python statements, it provides a C file as output.

  • To convert the .c file to either a .so or an executable, we need a C compiler to be present.

  • For importing and executing site hooks.

    • a C compiler (gcc for Linux and Build Tools for Windows) would be needed on all hosts.

    • Cython will also be needed to be present on all hosts

    • User provided python file will be converted to a .c file using cython

    • the resultant .c will be compiled into a .so or .pyd using the C compiler.

    • Using dlopen(), the .so will be loaded

    • Using dlsym(), we will get the address of PyInit_<hookname> function.

    • Load the module by calling PyInit_<hookname> function.

Comparison

The below table summarises the differences between implementation of PBS Hooks interface using Python and Cython.

Criteria

Python

Cython

Criteria

Python

Cython

Performance

As Python runs interpreted code, it is slower.

Cython helps in generating native code which is faster.

Data interchange

A lot of C/Python API calls are needed to exchange information.

C code calls C/Python API to convert between C struct and

Python classes.

Cython helps in generating native code which is faster.

Cython can simply copy data to/from C struct and Python classes.

Code complexity

More code to maintain as C code needs to setup each and every class

and attributes of those class-es.

Little or no conversion code needed, so code maintenance needed is less.

New hook events

For any new hook event a switch is added to pbs_python_event_set()

and main() of pbs_python.c. Similar code to existing events is written

for the new event

Addition of new hook events will be comparatively simpler as C-to-Python

conversion will be done without making explicit calls to C/Python API.

Out of process hooks

Intermediate format is used to pass information.

This intermediate format is converted to PBS attribute list from

which Python classes are created using C/Python API.

Intermediate format will still be used, however, Python classes can be

created without using C/Python API.

Memory leaks

Valgrind reports memory leaks

Valgrind reports memory leaks

Process crash

A in-process hook can cause a process crash

A in-process hook can cause a process crash

Detail on code that will be cleaned up

pbs_python_svr_internal.c

pbs_python_setup_python_resource_type()

pbs_python_setup_<class>_class_attributes(), where class = vnode, resv, server, job, queue

pbs_python_load_python_types()

pbs_python_setup_attr_get_value_type()

_pps_getset_descriptor_object()

pbs_python_setup_resc_get_value_type()

pbs_python_setup_types_table()

pbs_python_populate_attributes_to_python_class()

pbs_python_populate_python_class_from_svrattrl()

pbs_python_populate_svrattrl_from_python_class()

_pps_helper_get_<class>(), where class = que, server, job, resv, vnode

pbs_python_event_set(), _pbs_python_event_set()

Benefits

  • Faster hook execution

  • No change to the external interface

  • Less code to maintain

  • Scalable code

  • Addition of future events would be simpler

Shortcomings

  • Does not replace the Python interpreter

  • Running site hooks would need Cython and C compiler to be present on all hosts

Proposal

  • Implement the pbs_ifl, pbs_v1 modules, and _pbs_v1 package using Cython

  • Remove code that does the conversion between C and Python data types using C/Python API.

Further Work

  • Experiment further on running site hooks