PP-479: Running subjobs to be able to survive a pbs_server restart
Target release | 18.1.1 |
---|---|
Jira | |
Document status | DRAFT |
Document owner | Shrinivas Harapanahalli |
Designer | Shrinivas Harapanahalli |
Developers | Shrinivas Harapanahalli |
QA |
Overview:
Currently on server restart, job arrays that have running sub jobs are terminated due to them only being stored in memory. With this RFE the behavior is changed so that running subjobs continue to run after a server restart . It also enables storing the information that is unique to each subjob such as run_count, resources_used, comments, and hence qstat of the sub jobs does not return just the parent information once the job is finished.