Links
- Link to discussion on Developer Forum: <link to your project's discussion>
Overview
A hook in PBS is a block of Python code that PBS executes at certain events (and sometimes periodically). For supporting hooks execution, PBS has a hooks interface which is also implemented using Python.
Cython is a programming language that is designed to be a superset of Python.
This document attempts to explain how the PBS hooks interface is designed and how Cython can help (or not) in making the hooks infrastructure simpler and making the hooks infrastructure faster.
Introduction to Cython
- The term Cython could mean one of the two things, depending on the context it is used in.
- A compiled programming language, designed to be a superset of the Python programming language.
- An optimising static compiler for both Python and the extended Cython language.
- What is Cython used for
- Typically Cython is used for writing CPython extension modules in C, to speed up the execution of Python code.
- Wrapping of external C libraries
- Embedding CPython into existing applications.
Cython allows to write Python code that can call back and forth from and to C natively that can interact efficiently with
large data sets.
- Design
- The Cython language is close to Python, actually a superset that allows static typing of variables and class atrributes, or
in other words allows the programmer to declare C types on the variables and class attributes. This minimizes the
interaction with the Python runtime and allows the Cython compiler to generate optimized C code from Cython/Python
code. Generating a standard Python module is a 2-step process:- Cython/Python code is translated into C.
- The translated C code is then compiled through a C compiler into a shared library (.so on Linux, .pyd on
Windows), which can then be loaded and used by regular Python code using the import statement.
Optionally, one can use the embedded mode to generate an executable that embeds the main() function into the
generated C code. The added main() function has the broilerplate code to load the module. This C code can then be
used to generate an executable instead of a shared library.
- The Cython language is close to Python, actually a superset that allows static typing of variables and class atrributes, or
- How fast is Cython
- Depending on the type of Python code (computational vs I/O) one can use Cython to speedup execution from 1.15x
(15% improvement) to ~30x (study here). For operations not involving Python’s internals speedup is massive.
- Depending on the type of Python code (computational vs I/O) one can use Cython to speedup execution from 1.15x
- Limitations
- If Cython compiler comes across Python code that cannot be translated completely to C code, it transforms that code
into a series of C calls to Python internals. This takes the Python interpreter out of the execution loop, but provides only
a modest improvement in performance. - Cython being a superset provides support for native Python data structures. For this, Cython simply calls the C APIs
in the Python runtime that operate on those objects. This provides only a limited speed up.
- If Cython compiler comes across Python code that cannot be translated completely to C code, it transforms that code
- References:
a) https://cython.org/
b) https://en.wikipedia.org/wiki/Cython
c) https://github.com/cython/cython
PBS Hooks Interface
PBS hooks interface is divided in two parts, one implemented in C and the other in Python. The below diagram tries to explain the interaction between these two parts.
Green boxes correspond to Python modules implemented in C, yellow boxes correspond to Python modules written in Python itself.
- libpython library
- Handles most of the interaction between a hook and the embedded interpreter.
- Start/Stop the Python interpreter
- Loading of _pbs_ifl and _pbs_v1 module
- converting PBS types to Python types and vice versa
- pbs_ifl and _pbs_ifl modules provide wrappers for PBS IFL API to the hook. _pbs_ifl module is created when the embedded interepreter is started/restarted.
- PBS V1 package provides the below modules
- _attr_types.py – Python representation of all internal PBS attribute structures.
- _base_types.py – Mapping of all Python based types to PBS attribute types.
- _exc_types.py – Exceptions for PBS/Python interaction
- _export_types.py – This module is to help the embedded interpreter import the python types.
- _svr_types.py – Python types for PBS server objects (server, queue, job, reservation, vnode, events etc.)
- _pbs_v1 module – Implements wrappers for the Hooks interface, part of libpython library. Created when the embedded interpreter is started/restarted.
How Cython can help PBS
- Improving hook execution performance
For this, we can convert PBS V1 package Python code to Cython and ship the modules as shared objects which would be native code and would run faster than the interpreted Python code. - Shipping of PBS hooks as a shared library instead of a Python script
Cython can also be used to write our PBS hooks and ship them as shared objects. Some boilerplate C code will be needed for this. - Doing away with type conversion code
As Cython can directly operate on C data sets most of the conversion code can be removed. For providing Python data structures to hooks, the Cython module can perform the type conversion.
The below example program illustrates how Cython seamlessly works with C data types.
Explanation:
fill.pyx - A Cython program that operates on a C struct. It provides a "public" function fill_print()
str_def.h - declares the C struct
Cprog.c - C program that initialises the Python interpreter and loads the module "fill".
Below diagram explains the flow -
Using similar method, we can remove or refactor bunch of functions in libpython that perform conversion from Python classes to C structs and vice versa.
- Implement pbs_ifl module in Cython
Currently, we use swig to create pbs_ifl.py that provides wrappers to the _pbs_ifl methods. If the same is implemented using Cython, dependency on swig for building can be removed and some performance improvement can be achieved.
Item 1, 3 and 4 would mean that we convert the yellow boxes in Hooks Interface from Python to Cython.
PBS' dependency on the embedded Python interpreter to execute hooks
- Cython does not execute Python statements, it provides a C file as output.
- To convert the .c file to either a .so or an executable, we need a C compiler to be present
- To load the Python module in a C program, we need to include <module>.h header.
Comparison
The below table summarises the differences between implementation of PBS Hooks interface using Python and Cython.
Criteria | Python | Cython |
---|---|---|
Performance | As Python runs interpreted code, it is slower. | Cython helps in generating native code which is faster. |
Data interchange | A lot of C/Python API calls are needed to exchange information. C code calls C/Python API to convert between C struct and Python classes. | Cython helps in generating native code which is faster. Cython can simply copy data to/from C struct and Python classes. |
Code complexity | More code to maintain as C code needs to setup each and every class and attributes of those class-es. | Little or no conversion code needed, so code maintenance needed is less. |
New hook events | For any new hook event a switch is added to pbs_python_event_set() and main() of pbs_python.c. Similar code to existing events is written for the new event | Addition of new hook events will be comparatively simpler as C-to-Python conversion will be done without making explicit calls to C/Python API. |
Out of process hooks | Intermediate format is used to pass information. This intermediate format is converted to PBS attribute list from which Python classes are created using C/Python API. | Intermediate format will still be used, however, Python classes can be created without using C/Python API. |
Memory leaks | Valgrind reports memory leaks | Valgrind reports memory leaks |
Process crash | A in-process hook can cause a process crash | A in-process hook can cause a process crash |
Detail on code that will be cleaned up
pbs_python_svr_internal.c | pbs_python_setup_python_resource_type() pbs_python_setup_<class>_class_attributes(), where class = vnode, resv, server, job, queue pbs_python_load_python_types() pbs_python_setup_attr_get_value_type() _pps_getset_descriptor_object() pbs_python_setup_resc_get_value_type() pbs_python_setup_types_table() pbs_python_populate_attributes_to_python_class() pbs_python_populate_python_class_from_svrattrl() pbs_python_populate_svrattrl_from_python_class() _pps_helper_get_<class>(), where class = que, server, job, resv, vnode pbs_python_event_set(), _pbs_python_event_set() |
Summary of benefits and shortcomings
- Faster hook execution
- No change to the external interface
- Less code to maintain
- Scalable code
- Addition of future events would be simpler
- Does not replace the embedded Python interpreter
Project Documentation Main Page