Features Download
From: Davidlohr Bueso <davidlohr.bueso <at> hp.com>
Subject: [PATCH v2 0/4] ipc: reduce ipc lock contention
Newsgroups: gmane.linux.kernel
Date: Tuesday 5th March 2013 09:35:49 UTC (over 3 years ago)

The following set of patches are based on the discussion of holding the 
ipc lock unnecessarily, such as for permissions and security checks:


Patch 1/4: Remove the bogus comment from ipc_checkid() requiring that
the ipc lock be held before calling it. Also simplify the function
return. This is a new patch, not present in the RFC.

Patch 2/4: Introduce functions to obtain the ipc object without holding
the lock. Two functions, ipc_obtain_object() and
ipc_obtained_object_check() are created, which are analogous to
ipc_lock() and ipc_lock_check(). This patch was acked by Michel
Lespinasse and reviewed by Chegu Vinod.

Patch 3/4: Introduce ipcctl_pre_down_nolock() function, which is a
lockless version of ipcctl_pre_down(). This function is common to sem,
msg and shm and does some common checking for IPC_RMID and IPC_SET
commands. The older version was kept but calls the lockless version
without breaking the semantics, and is hence transparent to users. This
was suggested by Linus. Once all users are updated, the
ipcctl_pre_down() function can be removed.

Patch 4/4: Use the new, lockless, functions introduced above to only
hold the ipc lock when necessary. The idea is simple: only check ipc
security and permissions within the rcu read region, *without* holding
the ipc lock. This patch was acked by Michel Lespinasse and reviewed by
Chegu Vinod.

Changes since v1 (RFC):
- Add patches 1 and 3.

- Patch 2: In ipc_lock(), instead of checking the return of
ipc_obtain_object_check() against NULL, use IS_ERR(). Suggested by
Michel Lespinasse.

- Patch 2,4: In order for the rcu read lock/unlock calls to be paired up
more obviously, force the user to call rcu_read_unlock *before* calling
ipc_obtain_object[_check](). Suggested by Michel Lespinasse.

- Patch 4: Return ERR_CAST() in sem_obtain_object[_check]() instead of a
cast to struct sem_array *. Suggested by Linus.

- Patch 4: Change open coded spin_lock calls to ipc_object_lock in
semaphore code. Suggested by Linus.

- Patch 4: Added a 'out_wakup' label to semctl_main() and semtimedop()
to return from the functions without having to call sem_unlock (and
hence spin_unlock) without having the lock held.

- More tests: For the past few days I've been running this patchset on
my own laptop, and a 2 and 8 socket machines running my Oracle
swinbbench workloads. I have not encountered any issues so far. The main
fix was suggested by Linus with the bogus ipcctl_pre_down() changes
without updating the callers.

Ok, some numbers...

1) With Rik's semop-multi.c microbenchmark we can see the following

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   2.06%            a.out  [kernel.kallsyms]  [k]
+   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
+   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check

2) While on an Oracle swingbench DSS (data mining) workload the
improvements are not as exciting as with Rik's benchmark, we can see
some positive numbers. For an 8 socket machine the following are the
percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

CD: 2ms