Features Download
From: Arjan van de Ven <arjan <at> infradead.org>
Subject: Re: RFC: starting a kernel-testers group for newbies
Newsgroups: gmane.linux.kernel
Date: Wednesday 30th April 2008 14:15:26 UTC (over 8 years ago)
On Thu, 1 May 2008 01:13:46 -0700
Andrew Morton  wrote:

> On Wed, 30 Apr 2008 00:03:38 -0700 Arjan van de Ven
>  wrote:
> > > First of all:
> > > I 100% agree with Andrew that our biggest problems are in
> > > reviewing code and resolving bugs, not in finding bugs (we
> > > already have far too many unresolved bugs).
> > 
> > I would argue instead that we don't know which bugs to fix first.
> How about "a bug which we just added"?  One which is repeatable. 
> Repeatable by a tester who is prepared to work with us on resolving
> it. Those bugs.
> Rafael has a list of them.  We release kernels when that list still
> has tens of unfixed regressions dating back up to a couple of months.

I know he does. But I will still argue that if that is all we work from,
and treat
all of those equally, we're doing the wrong thing.
I'm sorry, but I really do not consider "ext4 doesn't compile on m68k"
which is 
on that list to be as relevant as a "i915 drm driver crashes" bug which is
us for a while and not on that list, just based on the total user base for
either of those. 

Does that mean nobody should fix the m68k bug?
Someone who cares about m68k for sure should work on it, or if it's easy
for an ext4 developer,
sure. But if the ext4 person has to spend 8 hours on it figuring cross
compilers, I say 
we're doing something very wrong here. (no offense to the m68k people, but
there's just
a few of you; maybe I should have picked voyager instead)

Maybe that's a "boggle" for you; but for me that's symptomatic of where we
are today:
We don't make (effective) prioritization decisions. Such decisions are
hard, because it 
effectively means telling people "I'm sorry but your bug is not yet
important". That's
unpopular, especially if the reporter is very motivated on lkml. And it
will involve a 
certain amount of non-quantifiable judgement calls, which also means we
won't always be
right. Another hard thing is that lkml is a very self-selective audience. A
bug may be 
reported three times there, but never hit otherwise, while another bug
might not be reported
at all (or only once) while thousands and thousands of people are hitting

Not that we're doing all that bad, we ARE fixing the bugs (at least the
oopses/warnings) that
are frequently hit. So I wouldn't blindly say we're doing a bad job at
prioritizing. I would
rather say that if we focus only on what is left afterwards without doing a
reality check,
we'll *always* have a negative view of quality, since there will *always*
be bugs we don't 
fix. Linux well over ten million users (much more if you count embedded
A lot of them will have "standard" hardware, and a bunch of them will have
"weird" stuff.
Cosmic rays happen. As do overclocking and bad DIMMs. And some BIOSes are
just weird etc etc.
If we do not prioritize effectively we'll be stuck forever chasing ghosts,
or we'll be stuck
saying "our quality sucks" forever without making progress.

Another trap is to only look at what goes wrong, not on what goes right...
we tend to only
see what goes wrong on lkml and it's an easy trap to fall into doomthinking
that way.
Are we doing worse on quality? My (subjective) opinion is that we are doing
better than last year.
We are focused more on quality. We are fixing the bugs that people hit
most. We are fixing most
of the regressions (yes, not all). Subsystems are seeing flat or lower
bugcounts/bugrates. Take ACPI, 
the number of outstanding bugs *halved* over the last year. Of course you
can pick a single 
bug and say "but this one did not get fixed", but that just loses the big
picture (and 
proves the point :). All of this with a growing userbase and a rate of
development that's a bit
faster than last year as well.

Can we do better? Always. More testing will help. Both to detect things
early, and by 
letting us figure out which bugs are important. Just saying "more testing
is not relevant
because we're not even fixing the bugs we have now" is just incorrect.
More testers helps. Wider range of hardware/usages allows us to find better
in the hard to track down bugs. More testers means more people willing to
see if they
can diagnose the bugs at least somewhat themselves, via bisection or
otherwise. That's important,
because that's the part of the problem that scales well with a growing
CD: 3ms