Parallelizing pipeline runs on HPC systems¶
This guide shows how to parallelize pipeline runs on HPC systems that use job schedulers supported by Nipoppy.
Currently, we have built-in support for the Slurm and SGE job schedulers. However, it is possible to manually add another job scheduler.
Important
Although the default template job script is designed to work with minimal user configuration, each HPC system is different, and some may require different/additional parameters to be set. See the Further customization section for how deeper configuration can be achieved.
If the default Slurm/SGE configurations do not work for you, please consider opening an issue on our GitHub repository so that we can improve our HPC support.
Configuring main HPC options¶
Global settings¶
The default global configuration file has two HPC-related fields that should be updated as needed:
1{
2 "SUBSTITUTIONS": {
3 "[[NIPOPPY_DPATH_CONTAINERS]]": "[[NIPOPPY_DPATH_ROOT]]/containers",
4 "[[HPC_ACCOUNT_NAME]]": ""
5 },
6 "DICOM_DIR_PARTICIPANT_FIRST": true,
7 "CONTAINER_CONFIG": {
8 "COMMAND": "apptainer",
9 "ARGS": [
10 "--cleanenv"
11 ],
12 "ENV_VARS": {
13 "PYTHONUNBUFFERED": "1"
14 }
15 },
16 "HPC_PREAMBLE": [
17 "# (These lines can all be removed if not using HPC functionality.)",
18 "# ========== Activate Python environment ==========",
19 "# Here we need the command to activate your Python environment in an ",
20 "# HPC job, for example:",
21 "# - venv: source <PATH_TO_VENV>/bin/activate",
22 "# - conda: source ~/.bashrc; conda activate <ENV_NAME>",
23 "# ========== Set environment variables ==========",
24 "export PYTHONUNBUFFERED=1"
25 ],
26 "PIPELINE_VARIABLES": {
27 "BIDSIFICATION": {},
28 "PROCESSING": {},
29 "EXTRACTION": {}
30 },
31 "CUSTOM": {}
32}
HPC_PREAMBLE
¶
HPC_PREAMBLE
is a list of Bash commands that should executed at the beginning of every job.
Importantly, there should be a command for activating the Nipoppy Python environment.
[[HPC_ACCOUNT_NAME]]
¶
The value for the [[HPC_ACCOUNT_NAME]]
field in the SUBSTITUTIONS
dictionary should be set to the account name/ID the job will be associated with.
By default this will be passed as --account-name
in Slurm systems and -q
in SGE systems during job submission.
This can be left blank if these options are not needed.
Attention
If your HPC system needs flags other than --account-name
or -q
need to be set, you will have to modify the template job submission script: see the Further customization section for more information.
Pipeline-specific settings¶
Job time limit and CPU and memory requests can be configured separately for each pipeline via the HPC config file.
Look for this file inside the pipeline config directory at <NIPOPPY_PROJECT_ROOT>/pipelines
/{bidsification,processing,extraction}/<PIPELINE_NAME>/<PIPELINE_VERSION>
– it is most likely called hpc.json
or hpc_config.json
(see the pipeline’s config.json
file for the exact name)
The HPC config file should look similar to this:
1{
2 "ACCOUNT": "[[HPC_ACCOUNT_NAME]]",
3 "TIME": "1:00:00",
4 "CORES": "1",
5 "MEMORY": "16G",
6 "ARRAY_CONCURRENCY_LIMIT": ""
7}
If the pipeline config directory has no HPC config file
You can create an HPC config file manually by copying the content above into a new file called (for example) hpc.json
.
You will also need to add an "HPC_CONFIG_FILE"
field for each step in pipeline’s config.json
file:
1 "STEPS": [
2 {
3 "INVOCATION_FILE": "invocation.json",
4 "DESCRIPTOR_FILE": "descriptor.json",
5 "HPC_CONFIG_FILE": "hpc.json"
6 }
7 ],
Set the fields in the HPC config file as needed. Set/leave as empty string if the field is not needed.
ACCOUNT
: do not modify this field – the account name should be set in the global configuration file.TIME
: time limit. Passed as--time
in Slurm jobs and-l h_rt
in SGE jobs.CORES
: number of CPUs requested. Passed as--cpus-per-task
in Slurm jobs and ignored in SGE jobs.MEMORY
: amount of memory requested. Passed as--mem
in Slurm jobs and-l h_vmem
in SGE jobs.ARRAY_CONCURRENCY_LIMIT
: maximum number of jobs in the array that can be run at the same time. Set as part of--array
specification in Slurm jobs and passed as--tc
in SGE jobs.
Submitting HPC jobs via nipoppy
commands¶
To run a pipeline on an HPC, use the --hpc
option to specify the HPC job scheduler when running the nipoppy bidsify
, nipoppy process
, or nipoppy extract
commands:
$ nipoppy <SUBCOMMAND> \
--dataset <NIPOPPY_PROJECT_ROOT> \
--pipeline <PIPELINE_NAME> \
--hpc slurm
# other desired options
# ...
This will submit a job array (one job per participant/session to run) through the requested job scheduler.
Currently, only 'slurm'
and 'sge'
have built-in support, but it is possible to add a different job scheduler.
Tip
We recommend submitting a single job (i.e. by specifying both --participant-id
and --session-id
) the first time you launch jobs on an HPC.
This will make it easier to troubleshoot if any problem occurs.
Troubleshooting¶
Below are some troubleshooting tips that might be helpful if your jobs are submitted successfully but failing before pipeline processing begins.
Slurm/SGE log files are written to <NIPOPPY_PROJECT_ROOT>/logs
/hpc
.
If you see an error message complaining about the nipoppy
command not existing, it is likely that your HPC_PREAMBLE
does not have the right command(s) for activating your Nipoppy Python environment.
By default, the job script generated by Nipoppy is deleted upon successful job submission.
If you suspect that there is something wrong with the job script, rerun the nipoppy
command you used to submit the job(s) with the --keep-workdir
flag.
Then, the script can be found at <NIPOPPY_PROJECT_ROOT>/scratch/work/<PIPELINE_NAME>-<PIPELINE_VERSION>/<PIPELINE_NAME>-<PIPELINE_VERSION>
/run_queue.sh
.
Attention
Modifying <NIPOPPY_PROJECT_ROOT>/scratch/work/<PIPELINE_NAME>-<PIPELINE_VERSION>/<PIPELINE_NAME>-<PIPELINE_VERSION>
/run_queue.sh
will not have an effect on future job submissions.
Instead, you will need to modify the template job script itself.
Further customization¶
All fields in the HPC config file are passed to the Jinja template job script, which can be found at <NIPOPPY_PROJECT_ROOT>/code/hpc
/job_script_template.sh
.
The default template job script
1#!/bin/bash
2
3{#-
4# This is a template for generating a job script that will be run on an HPC
5# cluster. It is written using the Jinja templating language (see
6# https://jinja.palletsprojects.com for more information).
7
8# All variables starting with the "NIPOPPY_" prefix are set internally by
9# Nipoppy and cannot be changed. Other (optional) variables can be defined in a
10# pipeline's HPC config file (i.e., hpc.json). Additional variables can also be
11# defined in the HPC config file for further customization.
12
13# Lines surrounded by { # and # } (without spaces) are Jinja comments and will
14# not be included in the final job script.
15#}
16
17{#-
18# ----------------------------
19# JOB SCHEDULER CONFIGURATIONS
20# ----------------------------
21# Below sections are for the Slurm and SGE job schedulers respectively.
22# Depending on the value of the --hpc argument, only one of these will be used.
23# Existing lines should not be modified unless you know what you are doing.
24# New lines can be added to hardcode extra settings that are to be constant for
25# every HPC job (no matter which pipeline).
26#}
27{%- if NIPOPPY_HPC == 'slurm' %}
28{% set NIPOPPY_ARRAY_VAR = 'SLURM_ARRAY_TASK_ID' %}
29# ===== Slurm configs =====
30#SBATCH --job-name={{ NIPOPPY_JOB_NAME }}
31#SBATCH --output={{ NIPOPPY_DPATH_LOGS }}/%x-%A_%a.out
32#SBATCH --array=1-{{ NIPOPPY_COMMANDS | length }}
33{%- if ARRAY_CONCURRENCY_LIMIT -%}
34%{{ ARRAY_CONCURRENCY_LIMIT }}
35{%- endif %}
36{% if TIME -%}
37#SBATCH --time={{ TIME }}
38{%- endif -%}
39{% if MEMORY %}
40#SBATCH --mem={{ MEMORY }}
41{%- endif -%}
42{% if CORES %}
43#SBATCH --cpus-per-task={{ CORES }}
44{%- endif -%}
45{% if ACCOUNT %}
46#SBATCH --account={{ ACCOUNT }}
47{%- endif %}
48{% if PARTITION %}
49#SBATCH --partition={{ PARTITION }}
50{%- endif %}
51
52{%- elif NIPOPPY_HPC == 'sge' %}
53{% set NIPOPPY_ARRAY_VAR = 'SGE_TASK_ID' %}
54# ===== SGE configs =====
55#$ -N {{ NIPOPPY_JOB_NAME }}
56#$ -o {{ NIPOPPY_DPATH_LOGS }}/$JOB_NAME_$JOB_ID_$TASK_ID.out
57#$ -j y
58#$ -t 1-{{ NIPOPPY_COMMANDS | length }}
59{% if ARRAY_CONCURRENCY_LIMIT -%}
60#$ -tc {{ ARRAY_CONCURRENCY_LIMIT }}
61{%- endif -%}
62{% if TIME %}
63#$ -l h_rt={{ TIME }}
64{%- endif -%}
65{% if MEMORY %}
66#$ -l h_vmem={{ MEMORY }}
67{%- endif -%}
68{% if ACCOUNT %}
69#$ -q {{ ACCOUNT }}
70{%- endif %}
71{% endif %}
72
73# for custom scripting
74DPATH_ROOT="{{ NIPOPPY_DPATH_ROOT }}"
75PIPELINE_NAME="{{ NIPOPPY_PIPELINE_NAME }}"
76PIPELINE_VERSION="{{ NIPOPPY_PIPELINE_VERSION }}"
77PIPELINE_STEP="{{ NIPOPPY_PIPELINE_STEP }}"
78PARTICIPANT_IDS=({% for participant_id in NIPOPPY_PARTICIPANT_IDS %} "{{ participant_id }}"{% endfor %} )
79SESSION_IDS=({% for session_id in NIPOPPY_SESSION_IDS %} "{{ session_id }}"{% endfor %} )
80{#
81# -------------------
82# START OF JOB SCRIPT
83# -------------------
84# Below lines should not be modified unless you know what you are doing.
85#}
86{% if NIPOPPY_HPC_PREAMBLE_STRINGS -%}
87# HPC_PREAMBLE from global config file
88{% for NIPOPPY_HPC_PREAMBLE_STRING in NIPOPPY_HPC_PREAMBLE_STRINGS -%}
89{{ NIPOPPY_HPC_PREAMBLE_STRING }}
90{% endfor %}
91{%- endif %}
92# Nipoppy-generated list of commands to be run in job array
93COMMANDS=( \
94{% for command in NIPOPPY_COMMANDS -%}
95 "{{ command }}" \
96{% endfor -%}
97)
98
99# get command from list
100# note that COMMANDS is zero-indexed (bash array)
101# but the job array is one-indexed for compatibility with SGE
102I_JOB=$(({{NIPOPPY_ARRAY_VAR}}-1))
103COMMAND=${COMMANDS[$I_JOB]}
104
105# for custom scripting
106PARTICIPANT_ID=${PARTICIPANT_IDS[$I_JOB]}
107SESSION_ID=${SESSION_IDS[$I_JOB]}
108
109# print/run command
110echo $COMMAND
111eval $COMMAND
This template can be modified to hardcode job submission settings or to expose additional pipeline-specific configurations.
As an example, let’s say we are interested in specifying the --nice
option in Slurm jobs.
To hardcode the same
--nice
value for all jobs/pipelines, add e.g.,#SBATCH --nice=10
in a new line near the beginning of the template script (outside of anyif
block).To expose
--nice
as a parameter that can be set independently for each pipeline, instead add the following block:{% if NICE %} #SBATCH --nice={{ NICE }} {%- endif %}
Then set
"NICE"
in a new field (alongside"TIME"
,"CORE"
etc.) in a pipeline’s HPC config file.
Support for other job schedulers¶
Job scheduling support in the Nipoppy package relies on the pysqa
package, which can handle several other job schedulers in addition to Slurm and SGE.
To add support for another job scheduler supported by pysqa
(e.g., Flux), follow these steps:
Navigate to
<NIPOPPY_PROJECT_ROOT>/code/hpc
.Create a
flux.yaml
file. Refer to the existingslurm.yaml
andsge.yaml
for what the content of that file should be.Update
clusters.yaml
to addflux
as an additional cluster.Update
job_script_template.sh
to add a section for Flux configs.You should now be able to run
nipoppy bidsify
/process
/extract
with--hpc flux
.
See also the pysqa
documentation for more information.
Important
If you have configured the Nipoppy HPC functionalities to work on a job scheduler other than Slurm/SGE, please consider opening an issue on our GitHub repository and contributing your additions back to the codebase.