|
Subject: tg3 update available... Newsgroups: gmane.linux.hardware.dell.poweredge Date: 2003-02-14 22:58:08 GMT (6 years, 19 weeks, 6 days, 19 hours and 48 minutes ago) Ladies and gentlemen, For your weekend stress-testing pleasure, the latest release of tg3 is available from http://people.redhat.com/jgarzik/pub/legolas2-7.x/ (redhat 7.x) http://people.redhat.com/jgarzik/pub/legolas2-8.0/ (redhat 8.0) This kernel rpms are based on the latest errata kernel (2.4.18-24), and like the other rpms I have posted, they are unofficial, not for production use, not passed Red Hat Q/A, etc., etc. This version of tg3 adds several workarounds for hardware bugs in the BroadCom chips, and all users are recommended to be using this driver version (tg3 v1.4). Addressing the recent furor on this list... Bad things come in spades, it seems. I had to disappear due to a family medical emergency, and was on the left coast all last week. And of course that's when the errata furor happens. :/ [BIG DISCLAIMER: My presence on this list, and my statement following, are purely of my own choice. This is my own opinion, not my employer's, and not representative of any sales agreements, contracts, etc.] 1) "arrrrrrgh!!!! why not bcm5700?????" This is a big question, but it's critical you guys understand the answer. A small but not insignificant part of it is political: BroadCom doesn't release hardware docs, and didn't work [past tense] with the Linux community. So from Red Hat's perspective, we get no support, just "pray that each new release works." That is unacceptable. On a technical level, there are several issues with all BroadCom drivers that prevent us from shipping it. The big item of the driver is an interrupt stack corruption issue that is remotely trigger-able. Read: bad guys can crash your production box, with bcm5700. Or it might crash if you're just unlucky, and hit the same condition. But there are other operational and portability bugs too, which bite us on other Red Hat platforms such as IA64 [where bcm5700 simply doesn't work at all]. 2) "But bcm5700 works for me and tg3 doesn't! Ship it!" The tg3 that went out in the kernel errata was, at the time, the results of successful stress testing on a bunch of systems. Several not-tg3 issues were also identified and resolved, that potentially played a role in the problems people were seeing. BroadCom unexpectedly turned up the day after the errata kernel was cut, with a list of NIC hardware bugs tg3 needed to work around. We integrated these fixes, but clearly timing was awful as these fixes did not make it into the errata kernel. Those fixes are in tg3 version 1.4, posted in the above rpms. Anyway, for the kernel errata, Red Hat had two drivers, one of which was known to have serious bugs and simply fails to work on many of Red Hat's systems [bcm5700], and one of which successfully passed stress tests on our local systems [tg3]. The choice was obvious, though in hindsight from this list, more could have been done. 3) What's the deal with "noapic"? That changes the way interrupts are delivered, from the "new way" back to the "old way." [more details upon request] Why does it solve some people's tg3 problems? Because they are seeing problems unrelated to tg3, that tg3 is just fast enough to trigger. 4) What are good temporary solutions? Running a uniprocessor kernel instead of SMP will likely fix all NIC-related problems. "noapic" on an SMP kernel helps, but the deadlock potential is still there, just decreased. Using e1000 NIC+driver should be very unlikely to trigger a freeze, too. 5) Is there hope, Luke? Yes! Just today, Red Hat has identified two non-tg3 problems that are likely causes of several of the reported freezes. These problems can trigger on bcm5700, e1000, and other NICs, but are much more likely to trigger on tg3 because it is very aggressive with its net stack usage. Add that, in addition to the above BroadCom-requested tg3 fixes, and the future is looking very rosy. Questions, comments and flames are all appreciated. I'm only the kernel hacker tasked to fix net drivers, not a Red Hat Support guy, but I'll do the best I can to address people's concerns. Jeff P.S. The "RPMS" subdirectory of http://people.redhat.com/jgarzik/tg3/tg3-1.4/ is simply a symlink to the above "legolas2-8.0" kernel, and is exactly the same. _______________________________________________ Linux-PowerEdge mailing list Linux-PowerEdge <at> dell.com http://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq or search the list archives at http://lists.us.dell.com/htdig/ |
|
|