Features Download
From: Paul Menage <paul <at> paulmenage.org>
Subject: Re: [RFC] cgroup: syscalls limiting subsystem
Newsgroups: gmane.linux.kernel.lsm
Date: Wednesday 19th October 2011 05:26:06 UTC (over 6 years ago)
On Tue, Oct 18, 2011 at 5:21 PM, Łukasz Sowa  wrote:
> Hi,
> Currently, I'm writing BSc thesis about security in modern Linux.
> Together with my thesis mentor, I decided that as practical part of my
> work I'll implement cgroup subsystem that allows to limit particular
> (defined by number) syscalls for groups of processes. The patch is ready
> for first review and we decided that I may try to push it to the
> mainline - so here it is.

The major problem with this approach is that denying/allowing system
calls based purely on the syscall number is very coarse. I'd guess
that most vulnerabilities in a system can be exploited just using
system calls that almost all applications need in order to get regular
work done (open, write, exec ,mmap, etc) which limits the utility of
only being able to turn them off by syscall number. So overall I don't
think you'll find very much support for this patch.

Having said that, to address the patch itself:

> Things that I dislike or particularly need comments are:
> 1. The asm parts where I push/pop callee-modified registers are ugly.
> I'm aware of macros (for x64) like SAVE_ARGS and RESTORE_ARGS but they
> simply don't work for me because they modify EFLAGS registers (because
> of sub instruction I suppose), forcing me to write uglier and less
> efficient code (additional jump needed). I can elaborate on this, if
> someone's in doubt. Maybe another version of the SAVE_ARGS/RESTORE_ARGS
> macro is needed?

Can't you hook into the ptrace callpath? That's already implemented on
every architecture. Set the thread bit that triggers diverting to
syscall_trace_enter() only when any of the thread's syscalls are
denied, and then you don't have to work in assembly.

> 2. Performance. It's not that bad: I measured 5% more sys time for
> process on root level, and 8% more sys time for processes on first
> level. However, I think it may and should be improved but currently
> I have no idea how to do it.

Do you mean for all processes, or just for processes that had deny
bits set? If the former, then 8% is a non-starter, I think. But if you
hook into the ptrace as I suggested above, you can probably get it to
zero overhead for threads in allow-all cgroups, and I suspect people
would scream much less about slowing down threads that are being
specifically limited anyway.

> 3. Naming convention - it's not bad either but I don't like the names -
> 'scs' abbreviation sound a little bit silly but full form (syscalls)
> makes lines far too long.

I'd suggest syscall rather than scs.

> +       switch (cftype->private) {
> +       case SCS_CGROUP_ALLOW:
> +               for (bit = 0; bit < NR_syscalls; ++bit) {
> +                       if (__scs_cgroup_perm(scg, bit))
> +                               seq_printf(seq_file,
"%d\n", bit);

Since this is presumably meant to be only used programmatically (as
syscall numbers can be different on different platforms, so you'll
need programs to translate between names and numbers) I wouldn't
bother with the \n - a plain space will be fine for the process
reading it, and less destructive to screen real-estate if someone
happens to cat it by hand.

> +       switch (cftype->private) {
> +       case SCS_CGROUP_ALLOW:
> +               if (__scs_cgroup_perm(parent_scg, number)) {
> +                       write_seqlock(&scg->seqlock);
> +                       set_bit(number, scg->syscalls_bitmap);
> +                       write_sequnlock(&scg->seqlock);
> +               } else {
> +                       return -EPERM;
> +               }
> +               break;
> +       case SCS_CGROUP_DENY:
> +               write_seqlock(&scg->seqlock);
> +               clear_bit(number, scg->syscalls_bitmap);
> +               write_sequnlock(&scg->seqlock);
> +               break;

Deny needs to clear the bit in all descendants as well. And there
would need to be synchronization between checking the parent bit
during an ALLOW write, and someone changing the parent bit to a DENY.

To unsubscribe from this list: send the line "unsubscribe
linux-security-module" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CD: 17ms