Features Download

From: Lennart Poettering <mzwnpx <at> 0pointer.de>
Subject: Re: Unchecked malloc()s in libjack/thread.c and libjack/intclient.c
Newsgroups: gmane.comp.audio.jackit
Date: Thursday 12th November 2009 23:02:31 UTC (over 9 years ago)
On Thu, 12.11.09 10:07, Stéphane Letz ([email protected]) wrote:

> > if you wasted more hours on IRC fons, you'd know that we've already
> > chatted with the submitter, argued about the point of checking malloc
> > returns, discussed the current status of null pointer exploits on
> > linux and elsewhere, and finally agreed to apply the patches :)
> > 
> > --p
> If we go in this direction, would it be better to go the direction
PulseAudio (for instance) is following, that is redefining a protected
malloc like in:
> http://git.0pointer.de/?p=pulseaudio.git;a=blob;f=src/pulse/xmalloc.c;h=e17a354a465f21a7503aae8167d468a63d2b737a;hb=dad36acea34f91cc1d4a7f8e4909369d2225525b
> and use that (possibly with a redefined free) everywhere needed?

My 5¢ on the OOM situation:

The OOM situation has been discussed quite often in various
projects. On modern systems it has become pretty clear that for normal
userspace software only an aborting malloc() makes sense for a couple
of reasons:

OOM handling creates a substantial number of error paths that need to
be tested, but seldomly are. i.e. when in an inner routine a malloc()
fails you need to rollback the entire operation which can be very hard
to get right, and since this is seldomly tested for almost never
works. OOM error paths add a substantial amount of code to most
projects and that code is seldomly tested and verified. Also very
interesting here are Havoc's notes on the OOM handling in D-Bus:


And note that I can list you a couple of places were D-Bus'
transactional rollback code still didn't get it right, although this
is much much better and systematically tested than most other
projects. (e.g. https://bugs.freedesktop.org/show_bug.cgi?id=21259)

And then there's the big thing that malloc() returning NULL seldomly
means what people think it means. On modern systems RAM exhaustion is
signalled differently, certainly not via malloc(), since that
allocates address space, not actual memory. So if malloc() returns
NULL this is a sign of address space exhaustion or by hitting some
resource limit (whcih is actually more or less the same thing). In
that case it seldomly makes sense trying to go on, and especially the
often suggested solution such as "malloc() in a loop" is not going to
help a tiny bit.

If the system is truely short on RAM the kernel will kill a process
via the OOM killer, in an unfriendly way. I'd say it only makes sense
for a process to treat address space exhaustion the same way as true
OOM is treated: by killing itself.

Also, almost all sensible libraries enforce an aborting malloc()
anyway. For example, glib/gtk works exclusively with aborting
malloc, and that's the same for quite a few libraries. That means the
entirety of your GNOME desktop, or even many system daemons such as
HAL or DeviceKit-disks abort on OOM, and if you think it is worth
handling OOM in your user software one might wonder what the point is
if the underlying services dont't.

Now, the often heard argument for handling malloc() returning NULL
properly is that it is that users don't want their data lost and the
app should exit cleanly saving everything that there is to save. In my
opinion that is a completely artificial issue. A program should always
be written in a way that the data it can lose is minimal, and that
includes when it is terminated abruptly by a power outage, or by being
killed by OOM. So auto-save the user data from time to time anyway,
dont wait until youre officially fucked by malloc() returning
FULL. Also, it is incredibly hard to write code that is still able to
write data safely to disk when you cannot allocate memory while doing

So in summary, OOM-safety is wrong:

- Because it increases your code size by 30%-40%
- You're trying to be more catholic than the pope, since various
  systems services you build on and interface with aren't OOM-safe
- You are trying to solve the wrong problem. Real OOM wil be signalled
  via SIGKILL, not malloc() returning NULL.
- You are trying to solve the wrong problem. Make sure your app never loses
  data, not only when malloc() returns NULL
- You can barely test the OOM codepaths

Does that mean it never makes sense to write OOM-safe userspace code?
No. For very low-level system daemons, such as udev or the init system
itself clean OOM handling is important. And in some embedded
appliances that is true too. But it is very easy to identify these
cases: these are those programs that modify their own oom_adj value in
/proc or appliances where memory overcommit is disabled anyway. (And
recursively, the few libraries that are used by those daemons should
be OOM-safe, too, i.e. libc, dbus)

I don't think that Jack or libjack qualify as that.

So, say no to OOM handling! Saves you time, makes your code faster and
nicer and shorter!

Or to say in a more sarcastic way: the most visible effect the extra
code you have to write for making things OOM-safe will be that due to
higher memory/address space consumption the OOM situation will be
coming earlier then without it.


Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
   GnuPG 0x1A015CC4
CD: 4ms