Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b <at> public.gmane.org>
Subject: Re: [Bug #14388] keyboard under X with 2.6.31
Newsgroups: gmane.linux.kernel.kernel-testers
Date: Wednesday 14th October 2009 01:03:40 UTC (over 7 years ago)
On Tue, 13 Oct 2009, Paul Fulghum wrote:
> 
> This is correct, the last buffer is not passed to tty_buffer_free()
> if it is the last in the list so tail is maintained.
> There is no free space in it so no new data can be added.
> There is no place where tail is null while the spinlock
> is released in preparation for calling receive_buf.
> I still can't spot any flaw in the current locking.

Do you even bother reading my emails?

Let me walk through an example of where the locking F*CKS UP, exactly 
because it's broken.

	thread1		thread2		thread3

	flush_to_ldisc
	set_bit(TTY_FLUSHING)
	buf.head = NULL
	...
	..release lock..
	.. sleep in ->receive_buf ..

			flush_to_ldisc
			set_bit(TTY_FLUSHING)
			.. head==NULL ..
			clear_bit(TTY_FLUSHING)
			.. release lock ..

					tty_ldisc_flush()
					-> tty_buffer_flush()
					TTY_FLUSHING not set!
					-> __tty_buffer_flush()
					-> tty->buf.tail = NULL

and now you're screwed. See? You have both 'buf.tail' and 'buf.head' both 
being NULL, and look what happens in that case 'tty_buffer_request_room()' 
if some new data comes in? Right: it will add the buffer to both tail and 
head.

And notice how 'thread1' is still inside flush_to_ldisc()! The buffer that 
got added will be overwritten by the old one, and now tail and head no 
longer match. Or another flush_to_ldisc() comes in, and now it won't be a 
no-op any more, and it will find the new data, and run ->receive_buf 
concurrently with the old receive_buf from thread1.

And the whole reason was that there were some very odd locking rules: 
buf.head=NULL meant "don't flush", and "TTY_FLUSHING is set" meant "don't 
clear 'buf.head'", and but the "don't flush" case still cleared 
TTY_FLUSHING (after not flushing), and it all messed up.

I could just have fixed it (move the "clear_bit(TTY_FLUSHING)" but up, but 
the fact is, once you fix that, it then becomes obvious that 
"buf.head=NULL" really is the wrong thing to test in the first place, and 
we should just use TTY_FLUSHING instead, and simply _remove_ the odd 
"buf.head=NULL is special" case. Which is what my patch did

> Your statement that the locking is too clever/subtle is
> clearly true since I am struggling to work this out again.

I have to say that the only case I could make up that is _clearly_ a bug 
is the above very contrieved example. I don't really think something like 
the above happens in reality. But it's an example of bad locking, and what 
happens when the locking logic isn't obvious.

There may be other cases where the locking fails, and I just didn't find 
them. 

Or the patch may simply not fix anything in practice, and nobody has ever 
actually triggered the bad locking in real life. I dunno. I just do know 
that the locking was too damn subtle.

Any time people do ad-hoc locking with "clever" schemes, it's almost 
invariably buggy. So the rule is: just don't do that. Make the locking 
rules "obvious".  Don't have subtle rules about "if head is NULL, then 
we're not going to add any new buffers to it, except if tail is also 
NULL". Because look above what happens, and see how complicated it was to 
even see the bug.

			Linus
 
CD: 3ms