Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Tejun Heo <htejun <at> gmail.com>
Subject: Why we were seeing so many spurious NCQ completions
Newsgroups: gmane.linux.ide
Date: Friday 7th December 2007 02:18:00 UTC (over 9 years ago)
Hello, all.

This has been going on for quite some time now but I finally succeeded
to reproduce the problem and find out what has been going on.  It
wasn't drive's or controller's fault.  The spurious completion
detection logic was wrong which makes all of this my fault.  :-)

The attached patch induces NCQ spurious completions by inserting
artificial delays during irq handling.  The following is log with the
patch applied.

A [ 1125.478813] ata35: MON issue=0x0 SAct=0x1 sactive=0x3 SDB
FIS=004040a1:00000002
B [ 1125.480248] ata35: MON issue=0x4 SAct=0x6 sactive=0x7 SDB
FIS=004040a1:00000001
C [ 1125.481614] ata35: MON issue=0x0 SAct=0x5 sactive=0x7 SDB
FIS=004040a1:00000002
D [ 1125.481704] ata35: YYY 0x2 -> 0x4
E [ 1125.481722] ata35: XXX issue=0x0 SAct=0x1 sactive=0x1 SDB
FIS=004040a1:00000004
F [ 1125.483087] ata35: MON issue=0x0 SAct=0x0 sactive=0x1 SDB
FIS=004040a1:00000001
G [ 1125.484297] ata35: MON issue=0x4 SAct=0x6 sactive=0x7 SDB
FIS=004040a1:00000001

MON lines are printed on each SDB FIS while YYY line indicates that
SDB FIS RX area has changed during the artificial delay.  XXX line
notes condition which triggers spurious NCQ completion - invoking EH
is disabled for debugging.

Here's what happens.

1. On A, NCQ command 0 and 1 are in flight - command 0 is still being
   transmitted to the device.  The first SDB FIS indicates completion
   of command 1.

2. Between A and B, the driver issues NCQ commands 1 and 2.  0 is
   still in flight.

3. On B, command 0 completes and drive sends completion for it.

4. Between B and C, the driver issues NCQ command 0.

5. On C, command 1 completes and drive sends completion for it.

6. On D, then the drive completes command 2 and sends completion for
   it.  This makes SDB FIS RX area updated and as the driver is still
   in IRQ handler, sets IRQ pending bit again.  Note that YYY line is
   printed *before* actually completing commands.  So, after printing
   YYY line, the driver completes both commands 1 and 2.

7. On E, the IRQ handler is invoked again because of the IRQ pending
   status set from #6.  However, completions contained in the SDB FIS
   which triggered this IRQ handler invocation is already processed.
   ie. Completion for command 2 is already processed in the previous
   IRQ handler invocation, so this time IRQ handler has nothing to do
   but SDB FIS RX area shows that this IRQ is for SDB FIS which
   includes completion for command 2, which triggers spurious NCQ
   completion condition.

8. It goes on.

So, trying to detect spurious completions using IRQ and RX FIS area
turns out to be stupid as they aren't interlocked.  I'll soon post a
patch to remove spurious completion check and blacklist resulted from
it.

Thanks a lot.

-- 
tejun
 
CD: 3ms