Examples¶
Copy-paste configurations for common situations.
Minimal — one cluster, all defaults¶
[profiles.mycluster]
host = "mycluster" # resolved via ~/.ssh/config
Multi-cluster — appears as tabs¶
[defaults]
refresh_interval = 5
[profiles.dei]
host = "dei.login"
[profiles.cineca]
host = "login.cineca.it"
ssh.username = "auser"
ssh.key_filename = "~/.ssh/id_cineca"
refresh_interval = 10 # poll less often on remote/slow clusters
[profiles.hpc]
host = "hpc.university.edu"
ssh.jump_host = "bastion.university.edu"
Project-specific log patterns¶
[profiles.mycluster]
host = "mycluster"
log.default_pattern = "{work_dir}/logs/{job_id}.out"
log.specific_projects.ml_training = "{work_dir}/experiments/{job_id}.log"
log.specific_projects.simulations = "{work_dir}/sim_output/job_{job_id}/stdout.txt"
log.specific_projects.preprocessing = "{work_dir}/data/logs/{job_id}.out"
Complex nested directories¶
[profiles.supercomputer]
host = "supercomputer.edu"
ssh_timeout = 20
refresh_interval = 3
log.default_pattern = "{work_dir}/slurm_logs/{job_id}.out"
log.specific_projects.deep_learning = "{work_dir}/runs/{project_name}/logs/{job_id}.log"
log.specific_projects.molecular_dynamics = "{work_dir}/trajectories/job_{job_id}/output.log"
log.specific_projects.genomics = "{work_dir}/analysis/logs/slurm-{job_id}.out"
log.specific_projects.climate_model = "{work_dir}/model_runs/{job_id}/stdout.txt"
High-frequency monitoring¶
[defaults]
ssh_timeout = 5
refresh_interval = 2
[profiles.fast]
host = "fast-cluster.edu"
Note
More frequent updates increase load on both your machine and the cluster login node.
Configure ControlMaster auto in ~/.ssh/config for best performance — paramiko
ignores the directive itself, but any external ssh calls you make (e.g. for
debugging) will benefit.
Slow / unreliable network¶
[defaults]
ssh_timeout = 30
refresh_interval = 15
sacct_refresh_interval = 300
[profiles.remote]
host = "remote-cluster.edu"
GPU-heavy workload¶
When most of your jobs use GPUs and you want richer GPU telemetry, ensure the cluster
allows srun --jobid=<id> --overlap nvidia-smi from a login node — the detail screen
uses that to fill in per-GPU utilisation bars. No client-side configuration is
required; the call is attempted whenever a job declares GPUs in its ReqTRES.
Legacy JSON (still supported)¶
{
"remote_host": "cluster.example.edu",
"ssh_timeout": 10,
"refresh_interval": 5,
"log_paths": {
"default_pattern": "{work_dir}/logs/{job_id}.out",
"specific_projects": {
"ml_project": "{work_dir}/ml/logs/{job_id}.log"
}
}
}
The loader converts this to a single default profile. New deployments should start
from TOML.