Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
The Unreleased section is for changes that are not yet released, but are going to be released in the next version.
[1.1.0] - Unreleased¶
Added¶
Persistent job history database — a local SQLite database (default
~/.config/slurmhub/jobs.db, configurable) records every observed run, a time-series of its resource usage, and log paths. Managed with SQLAlchemy and migrated with Alembic at startup, so the schema stays forward-compatible. On by default; disable with[database] enabled = false.Job history & analytics screen (
H) — a filterable/searchable table of past runs (by state, date range, favourite, current-profile vs all-profiles), plus a usage-aggregates view (a) showing GPU-hours, CPU-hours, memory GB·h, and average measured GPU utilisation, with a per-profile breakdown.Favourites & notes — star runs (
f) and annotate them (n) from the job detail and history screens; favourites are exempt from retention pruning.Configurable retention —
[database] retention_daysprunes runs older than N days at startup (favourites always kept);0keeps everything.Measured-utilisation capture — an optional slower-cadence pass records live GPU% / actual memory for running jobs (
[database] capture_utilization,utilization_interval).New
db/subpackage (models,engine,repository, Alembic env + initial migration, demo seed) andwidgets/history_screen.py,widgets/note_input_screen.py.58 new tests covering the DB models/repository, migrations, retention, threaded read/write, aggregates, the capture hook, the history and note-input screens, demo isolation, and
[database]config parsing (475 total).
Changed¶
squeuecapture now also requests submit time, allocated CPUs, and requested memory (%V|%C|%m);sacctswitched to pipe-delimited--parsable2withSubmit,NCPUS,ReqMem(also fixes parsing of names/work dirs containing spaces). These power stable per-run identity and the CPU/memory-hour aggregates; the live views are unchanged.--demonow ships a seeded in-memory history database so the history and analytics screens are demonstrable; it never touches~/.config.sqlalchemy>=2.0andalembic>=1.13added as runtime dependencies.
[1.0.0] - 2026-05-22¶
First official public release.
Documentation¶
Sphinx documentation site under
docs/(MyST Markdown source), organised into Getting started, Configuration, Usage, and Reference guides. Built and published to GitHub Pages on every push tomasterby.github/workflows/docs.yml.Screenshots generated from the real TUI via
App.save_screenshot()(SVG), driven by a new--demoCLI flag that injects fixture data and avoids the need for an SSH connection.Existing flat
docs/user-guide.mdanddocs/configuration-examples.mdcontent migrated into the structured guide tree.Internal development roadmap
plan.mdremoved; release notes live in this file going forward.
Added¶
--demoCLI flag (slurmhub --demo) that launches the app against a built-in fixture dataset — useful for demos, tutorials, and generating documentation screenshots without a live Slurm cluster.slurmhub.demo_datamodule with hand-crafted fixture outputs forsqueue,sacct,scontrol,sinfo,sstat, andtailof log files.DemoSSHClient(inslurmhub.ssh_wrapper) which implements theSSHClientinterface against fixture data.SSHClient.stream_command()method extracted from the log viewer so streaming commands can be intercepted cleanly byDemoSSHClientand by future tests.
Added — Epic 16: UI review pass¶
Persistent help screen (
?): aModalScreenlists the keybindings grouped by section for the current context (main, detail, dashboard, log). Replaces the old 10-second notify-toast.Cluster capacity strip on the main screen: the existing
ClusterStatusline now also renders240/512 CPU · 12/16 GPU · 1.4T/2T mem · 18 up · 1 downwhensinfodata is available, so cluster-wide health is visible without opening the dashboard.GPU column in the job table (between Time and Reason). Reuses
SlurmJob.gpu_display.Per-tab filter / search / sort isolation: state filter, name search, and sort mode are now stored on
ProfileTab. Switching tabs preserves each profile’s view independently; the search input is re-seeded on tab switch.D(Shift+D): toggle the bottom job-detail panel.yto yank: copies the selected job ID (main screen) or cycles through job ID → stdout path → stderr path → work dir (detail screen) to the system clipboard via OSC 52 (works over SSH on iTerm2, WezTerm, kitty, Alacritty, tmux withset-clipboard on).cto scancel the selected job, guarded by aConfirmScreenmodal (Y/N). Works on the main screen and inside the detail screen. The capturedSlurmJobis closed over so a cursor movement between confirm and execution can’t target the wrong job.vto view the submitted batch script: newBatchScriptScreenrunsscontrol write batch_script <jobid> -and renders it read-only. Supportswsave-to-disk andycopy-path-to-clipboard.In-log search:
/opens a search bar inLogScreen;n/Njump between case-insensitive matches.wsave log buffer to disk (default~/Downloads/<jobid>_<stream>.log).ycopy log line (current match or most-recent line) to clipboard via OSC 52.Partial-fetch warning surfacing: when
sinfo,queue_stats, orpending_detailssub-fetches fail but the main job fetch still succeeds, the failure is recorded inFetchResult.partial_errors, shown as a yellow⚠next to theConnectionStatusdot, and surfaced viaself.notify(...)(throttled to once every 5 minutes per profile).Smarter status bar: shows visible vs total counts (
5 of 18 shown) when a filter is active; sort mode is rendered as a word (Sort: time) instead of a glyph; “/ search · ? help” hint when no filter is active.Resizable JobDetail panel: CSS switched from
height: 3toheight: auto; min-height: 2; max-height: 6so PENDING job rows with reason/rank/QOS/priority/submit_time no longer overflow.ConfirmScreenwidget (widgets/confirm_screen.py),HelpScreenwidget (widgets/help_screen.py),BatchScriptScreenwidget (widgets/batch_script_screen.py), shared OSC 52 clipboard helper (widgets/_clipboard.py), and shared path helpers (widgets/_utils.py).52 new tests covering the above (404 total).
Added¶
Cluster dashboard screen (
dkey). Full-screen view with cluster-wide CPU/GPU/memory bars, a partition summary table (nodes idle/mixed/alloc/down, CPUs free/total, GPUs used/total, memory total, up/down state), and a scrollable per-node table (state, CPU allocation, free/total memory, GPU usage, reason). Refreshes viasinfoevery 60 s in the background and on demand withr. Supportsj/k/g/Gfor scrolling andEsc/qto return.First-run setup wizard. When no config file exists under the default search path, an interactive
ModalScreencollects profile name, SSH host, username, port, key path, and log pattern, optionally tests the connection viaSSHClient.check_connection, and writes~/.config/slurmhub/config.tomlviaConfigLoader.save_toml. Includes an “Add another cluster?” confirmation so multiple profiles can be set up in one session. Passing--configor--hostskips the wizard.ConfigLoader.locate()— returns(path, found)for the resolved config path without touching the silent default-profile fallback. Lets the CLI decide whether to launch the wizard.sinfo_parsermodule withPartitionStats,NodeStats,ClusterCapacitydataclasses and afetch_sinfo(client)helper that issues two pipe-delimitedsinfocalls and stitches the results.Shared
widgets/_bars.pyrendering helper (extracted from the job detail screen) reused by the dashboard.66 new tests across
test_sinfo_parser.py,test_cluster_dashboard.py,test_first_run_wizard.py, andtest_cli.py(352 total).
Fixed¶
g/Gnavigation now moves the cursor to the first/last row of the job table. Previous binding was wrong on two counts: (1) it calledscroll_home/scroll_end, which only adjust horizontal scroll for row-cursorDataTables, and (2) it listed the key asshift+g, but xterm-style terminals deliver Shift+G as the literal characterG(no modifier in the key event). Bindings now usescroll_top/scroll_bottomand accept bothGandshift+gso they work in real terminals and the Textual test pilot. The same key-form fix was applied toJobDetailScreenandLogScreen.Bottom detail panel no longer stuck on “No job selected”:
JobDetailis now seeded from the current cursor row inside_update_display.DataTabledoes not emit aCursorMovedevent for its initial row-0 placement, so previously the panel stayed on its default text until the user pressed j/k. The empty-list case now reads “No jobs to display” instead of “No job selected”.
Added¶
h/ltab switching: Vim-style keys cycle through profile tabs (no-op with a single profile).g/GinJobDetailScreen: Scroll detail body to top/bottom for consistency with the job table and log viewer.7 new pilot tests in
test_app.pycoveringj/k/g/G/h/lbehavior.
[0.3.0] - 2026-04-12¶
Added¶
GPU column in job table: Shows allocated GPUs per job (e.g.,
4x l40s) via newgresfield in squeue format (%b).Colored connection indicator: ConnectionStatus widget now shows a colored dot (green = connected, red = error, yellow = loading) with profile name alongside host.
Badge-style status bar: State counts render as colored badges with sort mode indicator icons.
Scrollable job detail screen: Detail body uses
ScrollableContainerso long output doesn’t get cut off; j/k vim bindings for scrolling.Section separators in detail screen: Horizontal line separators between Time, Memory, GPUs, Resources, Schedule, and Log Files sections.
Log viewer stream indicator: Header shows stream type (stdout/stderr) and follow mode (FOLLOW/PAUSED), updating dynamically on toggle.
Path truncation: Work directory paths in the job table are truncated to the last 2 components for readability.
12 new unit tests for GPU display, gres parsing, and path truncation (231 total).
Changed¶
squeue format string now includes
%b(gres) field by default.SlurmJobdataclass has new optionalgresfield andgpu_displayproperty.LogScreenconstructor accepts astreamparameter to distinguish stdout/stderr.StatusBar.update_stats()acceptssort_modeparameter.ConnectionStatushas newprofile_namereactive field.Enhanced CSS: accent background on connection bar, border on detail panel, styled table cursor/header rows, darker status bar.
Updated pyproject.toml description and version to 0.3.0.
Rewritten README with current feature set, TOML config examples, and navigation flow documentation.
[0.2.0] - 2026-04-12¶
Added¶
Profile-based TOML configuration: Support for multiple cluster profiles, each with its own SSH, log, and Slurm settings. JSON backward compatibility preserved.
Paramiko SSH client: Replaced subprocess-based SSH with paramiko for programmatic connection management, connection reuse, key auth, and jump host support.
CLI with Click: Added
--config,--profile,--host, and--list-profilescommand-line options.Tabbed multi-profile UI: Each cluster profile appears as its own tab when multiple profiles are configured.
Job detail panel: Shows selected job’s full info and resolved log path below the table.
Sacct caching: Historical job data is cached and only re-fetched at a configurable
sacct_refresh_interval(default 60s), reducing SSH load.Configurable log viewer: Log view command is now configurable per-profile via
log.view_command(default:tail -f {log_path}).Interactive filtering: Filter jobs by state (
1=Running,2=Pending,3=Completed,4=Failed,0=All) or by name search (/key).Column sorting: Press
sto cycle sort by id, time, name, or state.Status bar: Shows job counts by state, active filters, and search query.
Extracted widget modules: ConnectionStatus, JobTable, JobDetail, StatusBar, FilterBar in
widgets/package.External CSS: Moved styles from inline Python to
app.tcss.New config dataclasses:
SSHConfig,LogConfig,SlurmConfig,ProfileConfig,AppConfig.LogPathResolver.resolve_view_command()for building configurable viewer commands.config.example.tomlwith documented TOML configuration format.
Changed¶
SSH wrapper now uses
paramiko.SSHClientinstead ofsubprocess.run(["ssh", ...]).Config format changed from flat JSON to profile-based TOML (JSON still supported for backward compatibility).
ConfigLoader.load()now returnsAppConfig(with profiles) instead of flatConfig.fetch_squeue_jobs()andfetch_sacct_jobs()now acceptSSHClientinstead of bare host string.JobAggregatornow takesSSHClientinstead of host string.LogPathResolvernow takesLogConfiginstead ofConfig.Default refresh interval changed from 2s to 5s.
Test count increased from 139 to 160.
Removed¶
Old flat
ConfigandLogPathConfigdataclasses (replaced by profile-based system).execute_ssh_command()andcheck_connection()standalone functions (replaced bySSHClientclass).
[0.1.1] - 2026-03-31¶
Fixed¶
Critical: Made
refresh_data()non-blocking by running SSH fetches in a background worker thread, preventing UI freezes during data refresh.Added overlap guard to prevent multiple concurrent refresh calls from piling up when SSH is slow.
Fixed
sort_jobs_by_time()to parse time strings numerically instead of using broken string comparison (e.g. “9:00:00” no longer sorts after “10:00:00”). Supports MM:SS, HH:MM:SS, and D-HH:MM:SS formats.Fixed shell injection vulnerability in log tail command by escaping the log path with
shlex.quote().Removed unused
import reinsacct_parser.py.
[0.1.0] - 2026-01-26¶
Added¶
Initial project setup and structure.
tests folder
Basic dependency management with
textualandrich.project plan in plan.md
claude.md for context and instructions.
CHANGELOG.md for tracking changes.
install pre-commit hooks for code quality.
SSH wrapper module (
ssh_wrapper.py) with timeout and connection error handling.Comprehensive test suite for SSH wrapper functionality.
pytest as development dependency for testing.
Squeue parser module (
squeue_parser.py) for fetching and parsing Slurm job data.Sacct parser module (
sacct_parser.py) for fetching and parsing historical Slurm job data.Job aggregator module (
job_aggregator.py) for merging active and historical job data.SlurmJobdataclass for structured job representation.JobAggregatorclass for unified job data fetching with automatic deduplication.Support for parsing pipe-delimited squeue output into JSON-compatible structures.
Support for parsing whitespace-delimited sacct output with flexible formatting.
Active jobs take precedence over historical data when duplicate job IDs exist.
Helper functions for filtering, sorting, and searching jobs.
Configuration system (
config.py) with JSON-based config loading.ConfigandLogPathConfigdataclasses for structured configuration.ConfigLoaderclass with default path search and fallback to defaults.Log path resolver (
log_path_resolver.py) with token-based pattern matching.LogPathResolverclass supporting {job_id}, {work_dir}, and {project_name} tokens.Project-specific log path patterns with automatic detection from work_dir.
Support for complex nested directory structures via configuration.
24 unit tests for squeue parser with mocked SSH commands.
29 unit tests for sacct parser covering various states and edge cases.
28 unit tests for job aggregator covering merge logic and utilities.
TUI application (
app.py) built with Textual framework.SlurmhubAppmain application class with reactive UI.ConnectionStatuswidget displaying host, status, and last update time.JobTablecustom DataTable with color-coded job states.Automatic data refresh with configurable interval.
Async data fetching using Textual workers to keep UI responsive.
Rich styling for job states (green=RUNNING, yellow=PENDING, red=FAILED, etc.).
Keyboard shortcuts: q (quit), r (refresh), ? (help).
Real-time status indicators with emoji icons.
Vim-style navigation keybindings (j/k for up/down, g/G for top/bottom).
Log viewing feature with Enter key to tail job logs.
App suspension using
with self.suspend()for seamless shell integration.SSH tail command execution with
-tflag for proper TTY handling.Automatic return to TUI after log viewing with state preservation.
Log path resolution integration for automatic log file discovery.
User-friendly error messages for missing log files or unresolved paths.
Comprehensive SSH setup documentation in README.
SSH key generation guide (ssh-keygen, ssh-copy-id).
SSH ControlMaster optimization guide for faster polling.
Step-by-step verification instructions.
Example configuration file (config.example.json).
Comprehensive configuration examples document (docs/configuration-examples.md):
Basic and standard configurations
SSH host alias usage
Project-specific log path patterns
Complex directory structures
High-frequency and slow-network configurations
Token reference and common patterns
Troubleshooting guide
Best practices
20 unit tests for config loader covering JSON parsing and defaults.
18 unit tests for log path resolver covering token replacement and patterns.