Skip to content

Commit 2a36020

Browse files
Waiman-Longhtejun
authored andcommitted
cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition, the sibling's partition state becomes invalid. This is overly harsh and is probably not necessary. The cpuset.cpus.exclusive control file, if set, will override the cpuset.cpus of the same cpuset when creating a cpuset partition. So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up a partition. However, it cannot override a conflicting cpuset.cpus file in a sibling cpuset and the partition creation process will fail. This is inconsistent. That will also make using cpuset.cpus.exclusive less valuable as a tool to set up cpuset partitions as the users have to check if such a cpuset.cpus conflict exists or not. Fix these problems by making sure that once a cpuset.cpus.exclusive is set without failure, it will always be allowed to form a valid partition as long as at least one CPU can be granted from its parent irrespective of the state of the siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive will fail if it conflicts with the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value of a sibling. Partition can still be created by setting only cpuset.cpus without setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will be removed from its cpuset.cpus.exclusive.effective as long as there is still one or more CPUs left and can be granted from its parent. This CPU stripping is currently done in rm_siblings_excl_cpus(). The new code will now try its best to enable the creation of new partitions with only cpuset.cpus set without invalidating existing ones. However it is not guaranteed that all the CPUs requested in cpuset.cpus will be used in the new partition even when all these CPUs can be granted from the parent. This is similar to the fact that cpuset.cpus.effective may not be able to include all the CPUs requested in cpuset.cpus. In this case, the parent may not able to grant all the exclusive CPUs requested in cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have already been granted to other partitions earlier. With the creation of multiple sibling partitions by setting only cpuset.cpus, this does have the side effect that their exact cpuset.cpus.exclusive.effective settings will depend on the order of partition creation if there are conflicts. Due to the exclusive nature of the CPUs in a partition, it is not easy to make it fair other than the old behavior of invalidating all the conflicting partitions. For example, # echo "0-2" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cat A1/cpuset.cpus.partition root # cat A1/cpuset.cpus.exclusive.effective 0-2 # echo "2-4" > B1/cpuset.cpus # echo "root" > B1/cpuset.cpus.partition # cat B1/cpuset.cpus.partition root # cat B1/cpuset.cpus.exclusive.effective 3-4 # cat B1/cpuset.cpus.effective 3-4 For users who want to be sure that they can get most of the CPUs they want, cpuset.cpus.exclusive should be used instead if they can set it successfully without failure. Setting cpuset.cpus.exclusive will guarantee that sibling conflicts from then onward is no longer possible. To make this change, we have to separate out the is_cpu_exclusive() check in cpus_excl_conflict() into a cgroup v1 only cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change() helper is now no longer needed and can be removed. Some existing tests in test_cpuset_prs.sh are updated and new ones are added to reflect the new behavior. The cgroup-v2.rst doc file is also updated the clarify what exclusive CPUs will be used when a partition is created. Reported-by: Sun Shaojie <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Chen Ridong <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
1 parent 6e6f13f commit 2a36020

5 files changed

Lines changed: 90 additions & 71 deletions

File tree

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2584,9 +2584,9 @@ Cpuset Interface Files
25842584
of this file will always be a subset of its parent's
25852585
"cpuset.cpus.exclusive.effective" if its parent is not the root
25862586
cgroup. It will also be a subset of "cpuset.cpus.exclusive"
2587-
if it is set. If "cpuset.cpus.exclusive" is not set, it is
2588-
treated to have an implicit value of "cpuset.cpus" in the
2589-
formation of local partition.
2587+
if it is set. This file should only be non-empty if either
2588+
"cpuset.cpus.exclusive" is set or when the current cpuset is
2589+
a valid partition root.
25902590

25912591
cpuset.cpus.isolated
25922592
A read-only and root cgroup only multiple values file.
@@ -2618,20 +2618,33 @@ Cpuset Interface Files
26182618
There are two types of partitions - local and remote. A local
26192619
partition is one whose parent cgroup is also a valid partition
26202620
root. A remote partition is one whose parent cgroup is not a
2621-
valid partition root itself. Writing to "cpuset.cpus.exclusive"
2622-
is optional for the creation of a local partition as its
2623-
"cpuset.cpus.exclusive" file will assume an implicit value that
2624-
is the same as "cpuset.cpus" if it is not set. Writing the
2625-
proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
2626-
before the target partition root is mandatory for the creation
2627-
of a remote partition.
2621+
valid partition root itself.
2622+
2623+
Writing to "cpuset.cpus.exclusive" is optional for the creation
2624+
of a local partition as its "cpuset.cpus.exclusive" file will
2625+
assume an implicit value that is the same as "cpuset.cpus" if it
2626+
is not set. Writing the proper "cpuset.cpus.exclusive" values
2627+
down the cgroup hierarchy before the target partition root is
2628+
mandatory for the creation of a remote partition.
2629+
2630+
Not all the CPUs requested in "cpuset.cpus.exclusive" can be
2631+
used to form a new partition. Only those that were present
2632+
in its parent's "cpuset.cpus.exclusive.effective" control
2633+
file can be used. For partitions created without setting
2634+
"cpuset.cpus.exclusive", exclusive CPUs specified in sibling's
2635+
"cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective"
2636+
also cannot be used.
26282637

26292638
Currently, a remote partition cannot be created under a local
26302639
partition. All the ancestors of a remote partition root except
26312640
the root cgroup cannot be a partition root.
26322641

26332642
The root cgroup is always a partition root and its state cannot
26342643
be changed. All other non-root cgroups start out as "member".
2644+
Even though the "cpuset.cpus.exclusive*" and "cpuset.cpus"
2645+
control files are not present in the root cgroup, they are
2646+
implicitly the same as the "/sys/devices/system/cpu/possible"
2647+
sysfs file.
26352648

26362649
When set to "root", the current cgroup is the root of a new
26372650
partition or scheduling domain. The set of exclusive CPUs is

kernel/cgroup/cpuset-internal.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
312312
struct cpumask *new_cpus, nodemask_t *new_mems,
313313
bool cpus_updated, bool mems_updated);
314314
int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
315+
bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
315316
void cpuset1_init(struct cpuset *cs);
316317
void cpuset1_online_css(struct cgroup_subsys_state *css);
317318
int cpuset1_generate_sched_domains(cpumask_var_t **domains,
@@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
326327
bool cpus_updated, bool mems_updated) {}
327328
static inline int cpuset1_validate_change(struct cpuset *cur,
328329
struct cpuset *trial) { return 0; }
330+
static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
331+
struct cpuset *cs2) { return false; }
329332
static inline void cpuset1_init(struct cpuset *cs) {}
330333
static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
331334
static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,

kernel/cgroup/cpuset-v1.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
373373
return ret;
374374
}
375375

376+
/*
377+
* cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
378+
* to legacy (v1)
379+
* @cs1: first cpuset to check
380+
* @cs2: second cpuset to check
381+
*
382+
* Returns: true if CPU exclusivity conflict exists, false otherwise
383+
*
384+
* If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
385+
*/
386+
bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
387+
{
388+
if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
389+
return cpumask_intersects(cs1->cpus_allowed,
390+
cs2->cpus_allowed);
391+
392+
return false;
393+
}
394+
376395
#ifdef CONFIG_PROC_PID_CPUSET
377396
/*
378397
* proc_cpuset_show()

kernel/cgroup/cpuset.c

Lines changed: 26 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,17 @@ static bool force_sd_rebuild;
129129
* For simplicity, a local partition can be created under a local or remote
130130
* partition but a remote partition cannot have any partition root in its
131131
* ancestor chain except the cgroup root.
132+
*
133+
* A valid partition can be formed by setting exclusive_cpus or cpus_allowed
134+
* if exclusive_cpus is not set. In the case of partition with empty
135+
* exclusive_cpus, all the conflicting exclusive CPUs specified in the
136+
* following cpumasks of sibling cpusets will be removed from its
137+
* cpus_allowed in determining its effective_xcpus.
138+
* - effective_xcpus
139+
* - exclusive_cpus
140+
*
141+
* The "cpuset.cpus.exclusive" control file should be used for setting up
142+
* partition if the users want to get as many CPUs as possible.
132143
*/
133144
#define PRS_MEMBER 0
134145
#define PRS_ROOT 1
@@ -616,27 +627,25 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
616627
* Returns: true if CPU exclusivity conflict exists, false otherwise
617628
*
618629
* Conflict detection rules:
619-
* 1. If either cpuset is CPU exclusive, they must be mutually exclusive
620-
* 2. exclusive_cpus masks cannot intersect between cpusets
621-
* 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
630+
* o cgroup v1
631+
* See cpuset1_cpus_excl_conflict()
632+
* o cgroup v2
633+
* - The exclusive_cpus values cannot overlap.
634+
* - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed.
622635
*/
623636
static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
624637
bool xcpus_changed)
625638
{
626-
/* If either cpuset is exclusive, check if they are mutually exclusive */
627-
if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
628-
return !cpusets_are_exclusive(trial, sibling);
629-
630-
/* Exclusive_cpus cannot intersect */
631-
if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
632-
return true;
639+
if (!cpuset_v2())
640+
return cpuset1_cpus_excl_conflict(trial, sibling);
633641

634642
/* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
635643
if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
636644
cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
637645
return true;
638646

639-
return false;
647+
/* Exclusive_cpus cannot intersect */
648+
return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus);
640649
}
641650

642651
static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
@@ -2312,43 +2321,6 @@ static enum prs_errcode validate_partition(struct cpuset *cs, struct cpuset *tri
23122321
return PERR_NONE;
23132322
}
23142323

2315-
static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs,
2316-
struct tmpmasks *tmp)
2317-
{
2318-
int retval;
2319-
struct cpuset *parent = parent_cs(cs);
2320-
2321-
retval = validate_change(cs, trialcs);
2322-
2323-
if ((retval == -EINVAL) && cpuset_v2()) {
2324-
struct cgroup_subsys_state *css;
2325-
struct cpuset *cp;
2326-
2327-
/*
2328-
* The -EINVAL error code indicates that partition sibling
2329-
* CPU exclusivity rule has been violated. We still allow
2330-
* the cpumask change to proceed while invalidating the
2331-
* partition. However, any conflicting sibling partitions
2332-
* have to be marked as invalid too.
2333-
*/
2334-
trialcs->prs_err = PERR_NOTEXCL;
2335-
rcu_read_lock();
2336-
cpuset_for_each_child(cp, css, parent) {
2337-
struct cpumask *xcpus = user_xcpus(trialcs);
2338-
2339-
if (is_partition_valid(cp) &&
2340-
cpumask_intersects(xcpus, cp->effective_xcpus)) {
2341-
rcu_read_unlock();
2342-
update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
2343-
rcu_read_lock();
2344-
}
2345-
}
2346-
rcu_read_unlock();
2347-
retval = 0;
2348-
}
2349-
return retval;
2350-
}
2351-
23522324
/**
23532325
* partition_cpus_change - Handle partition state changes due to CPU mask updates
23542326
* @cs: The target cpuset being modified
@@ -2408,15 +2380,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
24082380
if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
24092381
return 0;
24102382

2411-
if (alloc_tmpmasks(&tmp))
2412-
return -ENOMEM;
2413-
24142383
compute_trialcs_excpus(trialcs, cs);
24152384
trialcs->prs_err = PERR_NONE;
24162385

2417-
retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
2386+
retval = validate_change(cs, trialcs);
24182387
if (retval < 0)
2419-
goto out_free;
2388+
return retval;
2389+
2390+
if (alloc_tmpmasks(&tmp))
2391+
return -ENOMEM;
24202392

24212393
/*
24222394
* Check all the descendants in update_cpumasks_hier() if
@@ -2439,7 +2411,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
24392411
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */
24402412
if (cs->partition_root_state)
24412413
update_partition_sd_lb(cs, old_prs);
2442-
out_free:
2414+
24432415
free_tmpmasks(&tmp);
24442416
return retval;
24452417
}

tools/testing/selftests/cgroup/test_cpuset_prs.sh

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ TEST_MATRIX=(
269269
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
270270
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
271271
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
272-
" C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
272+
" C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
273273
" C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5"
274274
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
275275
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
@@ -318,7 +318,7 @@ TEST_MATRIX=(
318318
# Invalid to valid local partition direct transition tests
319319
" C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
320320
" C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
321-
" C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
321+
" C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0"
322322
" C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
323323
324324
# Local partition invalidation tests
@@ -388,10 +388,10 @@ TEST_MATRIX=(
388388
" C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2"
389389
" C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
390390
391-
# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
392-
" C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
393-
" C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
394-
" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
391+
# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
392+
" C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0"
393+
" C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1"
394+
" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
395395
396396
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
397397
" C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
@@ -417,6 +417,14 @@ TEST_MATRIX=(
417417
" CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \
418418
A1:P0|A2:P2:B1:P-1 2-4"
419419
420+
# When multiple partitions with conflicting cpuset.cpus are created, the
421+
# latter created ones will only get what are left of the available exclusive
422+
# CPUs.
423+
" C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1"
424+
425+
# cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive
426+
" C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:1-2"
427+
420428
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
421429
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
422430
# Failure cases:
@@ -427,7 +435,7 @@ TEST_MATRIX=(
427435
# Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected
428436
" C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5"
429437
430-
# cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
438+
# cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.cpus
431439
" C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5"
432440
)
433441
@@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=(
477485
. . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \
478486
p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \
479487
1-2,4-6|1-2,5-6"
488+
# c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid partition
489+
" C1-5:P1:S+ . C1-4:P1 C2-3 . . \
490+
. . . P1 . . p1:5|c11:1-4|c12:5 \
491+
p1:P1|c11:P1|c12:P-1"
480492
)
481493
482494
#

0 commit comments

Comments
 (0)