Описание
Feature update for slurm and pdsh
This update for slurm and pdsh fixes the following issues:
slurm was updated to version 24.11.1 using package slurm_24_11:
-
Security issues fixed:
- CVE-2024-48936: Fixed authentication handling in stepmgr that could permit an attacker to execute processes under other users' jobs (bsc#1236722)
- CVE-2024-42511: Fixed vulnerability with switch plugins where a user could override the isolation between Slingshot VNIs or IMEX channels (bsc#1236726)
-
Important remarks:
- Slurm can be upgraded from version 23.02, 23.11 or 24.05 to version 24.11 without loss of jobs or other state information. Upgrading directly from an earlier version of Slurm will result in loss of state information.
- If using the
slurmdbd
(Slurm DataBase Daemon) you must update this first. - The 24.11
slurmdbd
will work with Slurm daemons of version 23.02 and above. You will not need to update all clusters at the same time, but it is very important to updateslurmdbd
first and having it running before updating any other clusters making use of it. - If using a backup DBD you must start the primary first to do any database conversion, the backup will not start until this has happened.
- All SPANK plugins must be recompiled when upgrading from any Slurm version prior to 24.11.
-
Highlights of changes:
- Fixed issues related to the modified startup handling for slurmdbd:
moved PID file to
/run/slurmdbd
(bsc#1236928) - Create slurm-owned log file on behalf of slurmdbd (bsc#1236929)
- Added report AccountUtilizationByQOS to sreport.
AccountUtilizationByUser
is able to be filtered by QOS.- Added autodetected gpus to the output of
slurmd -C
- Added ability to submit jobs with multiple QOS. These are sorted by priority highest being the first.
- Removed the instant on feature from
switch/hpe_slingshot
. slurmctld
: Changed incoming RPC handling to dedicated thread pool with asynchronous handling of I/O that can be configured viaconmgr_*
entries underSlurmctldParameters
inslurm.conf
.
- Fixed issues related to the modified startup handling for slurmdbd:
moved PID file to
-
Configuration File Changes (see appropriate man page for details)
- Added
SchedulerParameters=bf_allow_magnetic_slot
option. It allows jobs in magnetic reservations to be planned by backfill scheduler. - Added
TopologyParam=TopoMaxSizeUnroll=#
to allow--nodes=<min>-<max>
fortopology/block
. - Added
DataParserParameters
slurm.conf
parameter to allow setting default value for CLI--json
and--yaml
arguments. - Hardware collectives in
switch/hpe_slingshot
now requiresenable_stepmgr
. - Added connection related parameters to
slurm.conf
underSlurmctldParameters
:
conmgr_max_connections
: Defaults to 150 connections.
conmgr_threads
: Defaults to 64 threads for slurmctld.
conmgr_use_poll
: Defaults is to use epoll in Linux.
conmgr_connect_timeout
: Defaults toMessageTimeout
.
conmgr_read_timeout
: Defaults toMessageTimeout
.
conmgr_wait_write_delay
: Defaults toMessageTimeout
.
conmgr_write_timeout
: Defaults to MessageTimeout. - Added
SlurmctldParamters=ignore_constraint_validation
to ignoreconstraint/feature
validation at submission. - Added
SchedulerParameters=bf_topopt_enable
option to enable experimental hook to control backfill.
- Added
-
Command Changes (see man pages for details):
- Remove srun
--cpu-bind=rank
. - Add
'%b'
as a file name pattern for the array task id modulo 10. sacct
: Respect--noheader
for--batch-script
and--env-vars
.- Add
sacctmgr ping
command to query status ofslurmdbd
. sbcast
: Add--nodelist
option to specify where files are transmitted tosbcast
: Add--no-allocation
option to transmit files to nodes outside of a job allocation.slurmdbd
: Add-u
option. This is used to determine if restarting the DBD will result in database conversion.- Remove
salloc --get-user-env
. scontrol
: Add--json
/--yaml
support tolistpids
.scontrol
: Addliststeps
.scontrol
: Addlistjobs
.scontrol show topo
: Show aggregated block sizes when using topology/block.
- Remove srun
-
API Changes:
- Remove
burst_buffer/lua
callslurm.job_info_to_string()
. job_submit/lua
: Addassoc_qos
attribute tojob_desc
to display all potential QOS's for a job's association.job_submit/lua
: Addslurm.get_qos_priority()
function to retrieve the given QOS's priority.
- Remove
-
SLURMRESTD Changes:
- Removed fields deprecated in the Slurm-23.11 release from v0.0.42 endpoints.
- Removed v0.0.39 plugins.
- Set
data_parser/v0.0.42+prefer_refs
flag to default. - Add
data_parser/v0.0.42+minimize_refs
flag to inline single referenced schemas in the OpenAPI schema to get default behavior ofdata_parser/v0.0.41
. - Rename v0.0.42
JOB_INFO
fieldminimum_switches
torequired_switches
to reflect the actual behavior. - Rename v0.0.42
ACCOUNT_CONDITION
fieldassocation
toassociation
(typo). - Tag
slurmdb/v0.0.42/jobs pid
field deprecated.
- For details on the changes in this version update, consult Slurm 24.11 changelog
pdsh:
- Fix version test for munge build (bsc#1236156)
- Dropped Slurm support for s390x and i586: Slurm no longer builds for s390x or 32bit
- Implementation of package
pdsh-slurm_24_11
compatible with Slurm 24.11
Список пакетов
SUSE Linux Enterprise High Performance Computing 15 SP3-LTSS
SUSE Linux Enterprise High Performance Computing 15 SP4-ESPOS
SUSE Linux Enterprise High Performance Computing 15 SP4-LTSS
SUSE Linux Enterprise High Performance Computing 15 SP5-ESPOS
SUSE Linux Enterprise High Performance Computing 15 SP5-LTSS
SUSE Linux Enterprise Module for HPC 15 SP6
SUSE Linux Enterprise Module for Package Hub 15 SP6
openSUSE Leap 15.6
Ссылки
- Link for SUSE-FU-2025:0660-1
- E-Mail link for SUSE-FU-2025:0660-1
- SUSE Security Ratings
- SUSE Bug 1236722
- SUSE Bug 1236726
- SUSE Bug 1236928
- SUSE Bug 1236929
- SUSE CVE CVE-2024-42511 page
- SUSE CVE CVE-2024-48936 page
Описание
** RESERVED ** This candidate has been reserved by an organization or individual that will use it when announcing a new security problem. When the candidate has been publicized, the details for this candidate will be provided.
Затронутые продукты
Ссылки
- CVE-2024-42511
- SUSE Bug 1236726
Описание
SchedMD Slurm before 24.05.4 has Incorrect Authorization. A mistake in authentication handling in stepmgr could permit an attacker to execute processes under other users' jobs. This is limited to jobs explicitly running with --stepmgr, or on systems that have globally enabled stepmgr via SlurmctldParameters=enable_stepmgr in their configuration.
Затронутые продукты
Ссылки
- CVE-2024-48936
- SUSE Bug 1236722