PP-468 Kerberos support
Overview:
This is a new proposed feature providing Kerberos support with GSS-API layer in PBS Pro. The Kerberos support includes distributing and renewing of Kerberos credentials for users although the credentials itself are provided by an external renew-tool on demand. The renew-tool is configurable and is not part of this work. The passwordless access between nodes (and frontend) for users can be ensured by Kerberos credentials with this feature.
This implementation ensures that user credentials are distributed from the server to superior mom and to sister moms subsequently. In order to do the credential's distribution secured, the GSS layer ensures encrypted communication between the client and server and between the pbs_comm and clients (this covers server and moms). Since the Kerberos feature is added here, the GSS-API uses the Kerberos. Keep in mind that GSS layer could use another mechanism instead of Kerberos in the future. If needed.
The user credentials are obtained on the PBS Pro server by the renew-tool and are sent to the superior mom and this superior mom resends the credentials to the sister moms. This is done on the job startup and once the job is running the user credentials are renewed after a configurable time. The user credentials are destroyed on the job end. The credentials are also cached on the server. It means that the server demands new credentials for a particular user only once in a configurable time. This is very useful for renew-tools that would access the KDC directly. It significantly reduces the load on KDC.
In order to be able to provide GSS encryption, this feature requests Kerberos host keytab on each node (including the PBS Pro server) in the default keytab location. The host keytab is used to create host credential, which is used to establish the GSS context for encrypted communication between nodes (via pbs_comm). The valid user credentials are also required on the PBS Pro client (e.g. qsub, qstat, qmgr, pbsnodes, ...) for establishing the GSS context between a PBS Pro client and the PBS Pro server.
The implementation also supports OpenAFS. This support is autodetected. Once the appropriate packages are installed during compilation, the OpenAFS implementation is activated, and once the user credentials are received/renewed, the AFS logging is done (including the PAG).
GSS-API in PBS Pro:
First, GSS-API in short: GSS-API provides authentication by default and besides that, it can provide encryption between client and server. So, the identity is always verified by GSS-API and we can simply check the obtained principal with ACL. The encryption needs a GSS context on both the client and the server sides. The GSS context is used for further messages encryption. For acquiring the GSS context, we need to have valid credentials on both sides. These credentials are used for the handshake. The handshake means that the client and the server exchange messages - so-called 'tokens' as long as they need for establishing the GSS context. The client starts to establish the GSS context and sends the first token to the server. The server reads it, processes it, and responds with another token (if needed). The token exchange lingers in a loop or in some asynchronous exchange until the GSS context is established on both the client and the server. Once we have the GSS context, we can use it for encryption for further messages - it is called wrapping a message.
The GSS part has similar logic on both the TCP and the TPP. It uses the same pbs_gss* routines for establishing context and for un/wrapping messages. These routines can be found in Libutil/pbs_gss.c and use the gss_extra structure. This structure is added either to TCP connection structure or to TPP connection structure and it holds the information about GSS (like the GSS context).
- TCP part:
With TCP, the GSS context is established in a tight loop on the client-side. On the server-side, the GSS tokens with context are received asynchronously. It means that the client initiates the handshake, sends the first token and if another token is needed, the client waits for it. The server asynchronously reads the token and processes it and if needed sends reply token. TCP supports both the cleartext and encryption. Once the GSS context is established then only encrypted messages can be exchanged with a particular client.- Implementation notes:
- A new cn_ready_func function is added. This function handles the handshake and the function process_request() is called only if cn_ready_func returns true, which means that the handshake is finished and some data for process_request() are ready.
- The regular dis_* handlers are replaced with new dis_gss_* handlers and the dis_gss_* handlers call the regulars handlers. This way the GSS layer is isolated and dis_gss_* is stacked on regular dis_*. The dis_gss_wflush is called before dis_wflush and it wraps the message before sending.
- Implementation notes:
- TPP part:
With TPP, the message exchange is asynchronous on both sides. The communication is always encrypted and no cleartext is supported once the GSS is enabled. The pbs_comm is the GSS server. PBS server, PBS mom, and the scheduler are the GSS clients. The GSS encryption also works between pbs_comms (in that case the comm who initiates the connection is the GSS client).- Implementation notes:
- The GSS layer replaces the tpp handlers - leaf or router - (pkt_presend_handler, pkt_postsend_handler, pkt_handler, close_handler, post_connect_handler, timer_handler) with gss_* handlers and the regular leaf or router tpp_handlers all called within gss_* handlers.
- The handshake is asynchronous and once the handshake is finished then the leaf or router post_connect_handler is called and the encrypted communication begins.
- Implementation notes:
Following figure 1 shows the communication within PBS Pro and what communication is covered with GSS.
Figure 1: PBS Pro GSS enabled schema (original figure comes from PBS Install Guide 19.2.3)
How to set up PBS Pro with Kerberos support for testing purpose:
This chapter shows how to setup PBS Pro with Kerberos support for testing. This is not a general Kerberos guide and Kerberos knowledge is expected. The used operating system is CentOS 7 and used Kerberos implementation is MIT. Some commands may vary on different systems. The setup will consist of one node only. This node will run all the components: Kerberos server, Kerberos client, PBS server, and PBS mom. The node hostname in examples will be 'pbspro-mit' and Kerberos realm will be 'PBSPRO'. Please, change it according to your requests.
- Create a test user:
[root@pbspro-mit ~]# useradd test
- Kerberos server setup:
- Install Kerberos server packages:
[root@pbspro-mit ~]# yum -y install krb5-libs krb5-server krb5-workstation Config /etc/krb5.conf. The example is:
[logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 8h renew_lifetime = 8h forwardable = true rdns = false pkinit_anchors = /etc/pki/tls/certs/ca-bundle.crt default_realm = PBSPRO [realms] PBSPRO = { kdc = pbspro-mit admin_server = pbspro-mit } [domain_realm] pbspro-mit = PBSPRO
Config /var/kerberos/krb5kdc/kdc.conf The example is:
[kdcdefaults] kdc_ports = 88 kdc_tcp_ports = 88 [realms] PBSPRO = { acl_file = /var/kerberos/krb5kdc/kadm5.acl dict_file = /usr/share/dict/words admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal }
- Now, the database needs to be created. Make up your own db_password.
[root@pbspro-mit ~]# kdb5_util create -s -P <db_password>
Loading random data
Initializing database '/var/kerberos/krb5kdc/principal' for realm 'PBSPRO',
master key name 'K/M@PBSPRO' - Once the database is created, we will create the principals for the host and for the test user.
[root@pbspro-mit ~]# kadmin.local addprinc -randkey host/$(hostname -f)
[root@pbspro-mit ~]# kadmin.local addprinc -pw <user_password> test - Now, we need to add the existing principals to the keytab. The user principal in the keytab is used by the testing renew-tool. It is possible to use different keytab for users, but the host principal is expected in the default keytab on each PBS node (including the PBS server).
[root@pbspro-mit ~]# kadmin.local ktadd -norandkey -k /etc/krb5.keytab host/$(hostname -f)
Entry for principal host/pbspro-mit ...
[root@pbspro-mit ~]# kadmin.local ktadd -norandkey -k /etc/krb5.keytab test
Entry for principal test ... - Enable and start kdc server:
root@pbspro-mit ~]# systemctl enable krb5kdc.service
Created symlink from /etc/systemd/system/multi-user.target.wants/krb5kdc.service to /usr/lib/systemd/system/krb5kdc.service.
[root@pbspro-mit ~]# systemctl start krb5kdc.service
- Install Kerberos server packages:
- Kerberos client:
- Install Kerberos client packages:
[root@pbspro-mit ~]# yum -y install krb5-libs krb5-server krb5-workstation Test of getting credentials by the test user:
[root@pbspro-mit ~]# su test
[test@pbspro-mit root]$ cd
[test@pbspro-mit ~]$ kinit
Password for test@PBSPRO:
[test@pbspro-mit ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: test@PBSPROValid starting Expires Service principal
02/05/2019 09:25:18 02/05/2019 17:25:18 krbtgt/PBSPRO@PBSPRO
- Install Kerberos client packages:
- Installation PBS Pro with Kerberos support:
- Install packages requested by PBS Pro.
- Install packages requested for building PBS Pro with Kerberos support:
[root@pbspro-mit ~]# yum -y install krb5-libs krb5-devel libcom_err libcom_err-devel - Checkout PBS Pro with Kerberos support:
[root@pbspro-mit ~]# git clone https://github.com/PBSPro/pbspro.git - Build and install PBS Pro with Kerberos support:
[root@pbspro-mit pbspro]# ./autogen.sh
[root@pbspro-mit pbspro]# CFLAGS="-g -ggdb -Wall -Werror" ./configure --prefix=/opt/pbs --with-krbauth PATH_KRB5_CONFIG=/usr/bin/krb5-config
[root@pbspro-mit pbspro]# make
[root@pbspro-mit pbspro]# make install
[root@pbspro-mit pbspro]# chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
[root@pbspro-mit pbspro]# /opt/pbs/libexec/pbs_postinstall - Enable pbs_mom:
[root@pbspro-mit ~]# sed -i 's/PBS_START_MOM=0/PBS_START_MOM=1/' /etc/pbs.conf - Set PBS_AUTH_METHOD to GSS:
[root@pbspro-mit ~]# echo "PBS_AUTH_METHOD=GSS" >> /etc/pbs.conf - Restart PBS service:
[root@pbspro-mit ~]# systemctl restart pbs - Use qmgr to create default queue and add pbspro-mit as a pbs node.
- Build a PBS renew-tool. The renew-tool is used for obtaining user credentials. You are free to develop, use, and share your own renew-tool. This simple renew-tool is for testing purpose only and it obtains credentials from keytab. Note: It is possible to provide a guide on how to use a proper renew-tool - krb525_renew (https://github.com/CESNET/krb525) but this tool consists of a client part and server part and the server part requests the KDC server to be Heimdal. This limitation is the reason why a simple tool is used. The guide for krb525_renew will be created on demand - if requested.
- Clone the tool:
[root@pbspro-mit ~]# git clone https://github.com/PBSPro/pbspro.git - Build the tool:
[root@pbspro-mit ~]# cd pbspro/src/unsupported/renew-test/
[root@pbspro-mit renew-test]# make
- Clone the tool:
- Configure Kerberos in PBS Pro:
- Use following (or similar) qmgr commands:
[root@pbspro-mit ~]# PBSPRO_IGNORE_KERBEROS= qmgr
Max open servers: 49
Qmgr: set server acl_krb_realm_enable = True
Qmgr: set server acl_krb_realms = *@PBSPRO
Qmgr: set server acl_krb_submit_realms = *@PBSPRO
Qmgr: set server cred_renew_enable = True
Qmgr: set server cred_renew_tool = "/root/pbspro/src/unsupported/renew-test//renew-test"
Qmgr: set server cred_renew_period = 05:00:00
Qmgr: set server cred_renew_cache_period = 10:00:00
- Use following (or similar) qmgr commands:
Submit a job as the user 'test' and check the availability of credentials within the job:
[test@pbspro-mit ~]$ kinit
Password for test@PBSPRO:
[test@pbspro-mit ~]$ qsub -I
qsub: waiting for job 0.pbspro-mit to start
qsub: job 0.pbspro-mit ready[test@pbspro-mit ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_pbsjob_0.pbspro-mit
Default principal: test@PBSPROValid starting Expires Service principal
02/05/2019 10:00:45 02/05/2019 18:00:45 krbtgt/PBSPRO@PBSPRO- Add another pbs node into this infrastructure:
- Assuming the hostname of a new node is pbspro-mit-node1
- Use the KDC server to add a new principal into the database:
[root@pbspro-mit ~]# kadmin.local addprinc -randkey host/pbspro-mit-node1@PBSPRO - Create keytab for this host (use different location of keytab):
[root@pbspro-mit ~]# kadmin.local ktadd -norandkey -k /etc/krb5-node1.keytab host/pbspro-mit-node1@PBSPRO - Copy the keytab to the default location on the new node (be careful to preserve the file permission):
[root@pbspro-mit ~]# scp /etc/krb5-node1.keytab pbspro-mit-node1:/etc/krb5.keytab - Copy krb5.conf to the new node:
[root@pbspro-mit ~]# scp /etc/krb5.conf pbspro-mit-node1:/etc/krb5.conf - Install clients packages for Kerberos on the new node:
[root@pbspro-mit-node1 ~]# yum -y install krb5-libs krb5-workstation - Start the pbs service.
New Interfaces:
Interface: New option '--with-krbauth' to configure
- Visibility: Public
- Change Control: Stable
- Synopsis: Compilation option
- Details: This configure option enables the Kerberos and GSS code in the configuration of the PBS Pro compilation. The code is behind macro PBS_SECURITY == KRB5. It may be necessary to provide the correct path to Kerberos libraries. There is an option for this purpose, which is variable PATH_KRB5_CONFIG. PATH_KRB5_CONFIG should be set to config binary of either Heimdal or MIT Kerberos.
- Examples:
./configure --prefix=/usr --with-krbauth PATH_KRB5_CONFIG=/usr/bin/krb5-config.heimdal
./configure --prefix=/usr --with-krbauth PATH_KRB5_CONFIG=/usr/bin/krb5-config.mit
Interface: New value 'GSS' to PBS_AUTH_METHOD in /etc/pbs.conf
- Visibility: Public
- Change Control: Stable
- Synopsis: PBS Pro configuration option
- Details: This value enables GSS authentication for clients. It is mandatory with Kerberos support.
Interface: New server attribute 'acl_krb_realm_enable'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: This attribute enables Kerberos realm ACL checking according to a new attribute acl_krb_realm. The attribute is boolean. Managers are allowed to set this attribute.
Interface: New server attribute 'acl_krb_realm'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: This is an ACL list of realms allowed to access the server in some way. The attribute is a list. Managers are allowed to set this attribute. For being a manager/operator appropriate principal must be specified in managers and its realm must be in acl_krb_realms.
- Example:
set server managers = vchlum@ADMIN.REALM
set server acl_krb_realms = *@REALM
set server acl_krb_realms += *@ADMIN.REALM
Interface: New server attribute 'acl_krb_submit_realms'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: This is an ACL list of realms allowed to submit a job into the system. This attribute is a list. This list should be a subset of acl_krb_realm. The necessity for having both the acl_krb_submit_realms and acl_krb_realm is that you are able to access the PBS Pro server with the host credentials or with credentials of other services, which are not suitable for submitting jobs. Usually, the computation node running mom is not a good identity for submitting jobs, but it is a suitable identity for accessing the server (e.g. the pbs_python hooks will need to be specified here in the future). Managers are allowed to set this attribute.
- Example:
set server acl_krb_submit_realms = *@REALM
Interface: New server attribute 'cred_renew_enable'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: Enables renewing credentials according to the time configured in cred_renew_period (see below). This option enables renew task, which runs every 5 minutes and the process of renewing particular jobs is scattered within the 5 minutes. This way the high load caused by renewing is suppressed. This attribute is boolean. Managers are allowed to set this attribute.
Interface: New server attribute 'cred_renew_tool'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: This attribute sets the renew-tool which provides Kerberos credentials data in base64 in a specific format. This attribute is a string. Managers are allowed to set this attribute.
- The tool is supposed to take one argument, which is the user's principal.
- The tool is supposed to print to stdout in this way: The first line is 'Type: <credentials type>', the second line is 'Valid until: <timestamp in epocha>', the third line is credential data in base64.
- The credentials data will be read by krb5_rd_cred() after decoding base64. The data type is 'krb5_data'. Please see the Kerberos documentation for more info.
- The tool is not part of this work.
- Example of the tool:
- ~# ./krb525_renew vchlum@REALM
Type: Kerberos
Valid until: 1536960989
doICPDCCAjigAwI<... just base64 here ...>GqAjAA - set server cred_renew_tool = /usr/bin/krb525_renew
- ~# ./krb525_renew vchlum@REALM
Interface: New server attribute 'cred_renew_period'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: This attribute determines when to renew credentials of a job before the credentials expire. If the attribute is set to 1 hour, the credentials of jobs will be renewed one hour before the credentials expire. If the renew fails for a particular job (e.g. the node is not available), the renew task (enabled with cred_renew_enable) will try to renew the job in the next iteration (5 minutes later). This attribute is a time in seconds or in format hh:mm:ss. The default value is 3600 (1 hour). Managers are allowed to set this attribute.
Interface: New server attribute 'cred_renew_cache_period'
- Visibility: Public
- Change Control: Stable
- Synopsis: Server attribute. Settable via qmgr.
- Details: This attribute determines how old credentials of a particular user can be used. Credentials are cached on a server per user in order to save load on KDC for renewing tools accessing the KDC itself. This way a bunch of jobs started shortly one after another gets the same credentials. If the validity of credentials of a particular user in the cache is lower then cred_renew_cache_period and the credentials are needed then the new credentials are demanded using cred_renew_tool. Otherwise cached credentials are used. This attribute is a time in seconds or in format hh:mm:ss. The default value is 7200 (2 hours). This attribute is supposed to be greater then cred_renew_period. Managers are allowed to set this attribute.
Interface: New job attribute 'credential_id'
- Visibility: Public
- Change Control: Stable
- Synopsis: Job attribute.
- Details: Job owner principal. This attribute is set after job submission and remains unchanged for the whole job life. If this attribute is not set, the job will not receive the credentials at all and it will run without credentials.
Interface: New job attribute 'Submit_Host'
- Visibility: Public
- Change Control: Stable
- Synopsis: Job attribute.
- Details: This job attribute is set to the hostname of the host from which the job is submitted. The attribute is set after job submission. This attribute is added to the PBS Pro in general and does not depend on Kerberos implementation.
Interface: New job attribute 'credential_validity'
- Visibility: Public
- Change Control: Stable
- Synopsis: Job attribute.
- Details: This attribute is updated when credentials are successfully sent to the superior mom. This attribute does not reflect the state of conveying credentials between superior mom and sister mom. The attribute is a timestamp in epocha and it determines when the user credentials of the job expire.
Interface: New environmental variable 'PBSPRO_IGNORE_KERBEROS'
- Visibility: Public
- Change Control: Stable
- Synopsis: Environmental variable read by PBS Pro clients.
- Details: If this environment variable is set then the Kerberos ACL is ignored and the common ACL check is engaged. If this environment variable is set for qsub, no credential_id is set to the job and the job is run without credentials. This variable also allows accessing the PBS Pro server by root via qmgr from localhost without Kerberos credentials.
Interface: New error message 'No Kerberos credentials found.'
- Visibility: Public
- Change Control: Stable
- Synopsis: The error message returned by PBS Pro clients.
- Details: Using a PBS Pro client, if no valid Kerberos credentials are in the ccache file then this error message will be printed to stderr. This can be bypassed by using PBSPRO_IGNORE_KERBEROS but it does not bypass the ACL.
Interface: Job environmental variable 'KRB5CCNAME'
- Visibility: Public
- Change Control: Stable
- Synopsis: Job environmental variable.
- Details: This env is set for each job with credential_id to the ccache file with Kerberos credentials. If the user specifies this env, it will be ignored/overwritten. The value of the env is set to 'FILE:/tmp/krb5cc_pbsjob_<jobid>' within a job.
Interface: New job error code 'JOB_EXEC_FAIL_KRB5 = -23'
- Visibility: Public
- Change Control: Stable
- Synopsis: Job error code.
- Details: This job error code is returned if the credentials are not supplied to the mom and the job is about to start a task. The error code is not engaged if the credential_id is not present.
Interface: New file '/tmp/krb5cc_pbs_client'
- Visibility: Public
- Change Control: Stable
- Synopsis: File with host credentials.
- Details: This file is created on each node connected to pbs_comm and it contains Kerberos host ccache. The file is created from host keytab (which is required to be present on the node) and it is used by the GSS layer.