Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

OpenSUSE Beta Can Brick Intel e1000e Network Cards

Posted by timothy on Tue Sep 23, 2008 08:23 AM
from the price-of-progress dept.
An anonymous reader writes "Some Intel cards don't just not work with the new OpenSUSE beta, they can get bricked as well. Check your hardware before you install!" The only card mentioned as affected is the Intel e1000e, and it's not just OpenSUSE for which this card is a problem, according to this short article: "Bug reports for Fedora 9 and 10 and Linux Kernel 2.6.27rc1 match the symptoms reported by SUSE users."
+ -
story

Related Stories

[+] e1000e Bug Squashed — Linux Kernel Patch Released 111 comments
ruphus13 writes "As mentioned earlier, there was a kernel bug in the alpha/beta version of the Linux kernel (up to 2.6.27 rc7), which was corrupting (and rendering useless) the EEPROM/NVM of adapters. Thankfully, a patch is now out that prevents writing to the EEPROM once the driver is loaded, and this follows a patch released by Intel earlier in the week. From the article: 'The Intel team is currently working on narrowing down the details of how and why these chipsets were affected. They also plan on releasing patches shortly to restore the EEPROM on any adapters that have been affected, via saved images using ethtool -e or from identical systems.' This is good news as we move towards a production release!"
[+] openSUSE Launches 11.1 173 comments
Novell has unveiled their latest release to the openSUSE line with 11.1. Offering both updates and new features, Novell continues to push for more openness and transparency. The new release includes Linux kernel 2.6.27, Python 2.6, Mono 2.0, OpenOffice 3.0, and many others. "[...] Our choice was also influenced by impressive changes that are transpiring in the openSUSE community, which is growing rapidly and is also becoming more open, inclusive, and transparent. Last month, the project announced its first community-elected board, a major milestone in its advancement towards community empowerment. This is a very good openSUSE release and it delivers some very impressive enhancements. The distro has evolved tremendously in the past two releases and is becoming a very solid and usable option for regular users."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by psergiu (67614) on Tuesday September 23 2008, @08:27AM (#25119197)
    Any decent firmware for a device should not allow the user to accidentally destroy the device. Looks like Intel skipped on Q&A.
    • by reashlin (1370169) on Tuesday September 23 2008, @08:30AM (#25119227)
      Any decent device driver should also not be writing to the firmware, which i'm guessing is how the device can become bricked.
      • Re: (Score:3, Informative)

        by Anonymous Coward

        It appears that the bug is a combination of memory-mapped control registers which can enable flash writing and another (graphics related) problem which causes random data to be written to that area. The driver itself does not attempt to rewrite the firmware.

      • by jandrese (485) <kensama@vt.edu> on Tuesday September 23 2008, @09:43AM (#25120387) Homepage Journal
        Except for the multitude of cards that require you to basically reflash the firmware as part of the initialization? Cheap 802.11 cards are notorious for this, and it's a pain because it means you have to ship a binary blob with the driver and all of the licensing headaches that entails.
      • Re: (Score:3, Insightful)

        err, what's the point of having firmware if your driver can't talk to it?

        Also, this isn't about firmware, it's about Non-volatile memory. The chip uses it to store things like its MAC address.

        You fail.

        • How can an invalid MAC address brick the hardware? Lesson one is to check user input, no matter if it's strings in an URL or user-written data in NVRAM.
          • by ChrisJones (23624) <cmsj-slashdot@tensh u . net> on Tuesday September 23 2008, @10:09AM (#25120793) Homepage Journal

            the NVM is checksummed. If the checksum fails, the driver refuses to initialise the card.
            It seems that something is able to write garbage data to the NVM, leaving all of its settings broken.

            This isn't some database API where you get to do lots of nice high-level verification, this is twiddling bits in hardware. Of course it should be properly protected, and my discussions with Intel about this suggest that it is, and that something else is at work here, but until they release a fix, we won't know for sure.

            Also, their own DOS tools to restore NIC EEPROMs actually break the laptop NICs to the point that they won't enumerate on the PCI bus, so there is literally no hope of recovery unless you happen to have a BIOS update which will rewrite all of the memory the NIC uses.

          • Re: (Score:3, Insightful)

            There's no way that a device can check all possible combinations of input for crash-inducing behaviour. If you think it can, go read "GÃdel, Escher, Bach". In fact, go read it anyway, it's awesome.

            Also there's a difference between a NIC and a web site - the NIC's API input is coming from its owner, the web site's customer is not. If you're a piece of hardware, you do what your owner tells you.

            • Re: (Score:3, Interesting)

              Can you get us an ISBN for that book? The non-ASCII character in there got mangled by slashdot (or my browser) and all search results based on my assumptions, are trash.

    • by arkhan_jg (618674) on Tuesday September 23 2008, @08:39AM (#25119367)

      A few years back, Mandrake merged a kernel patch in their new release that would accidentally brick certain LG CDROM drives using old firmware versions when it checked if it had writing capabilities. This was largely LG's fault for re-using a valid command code to mean 'start flashing me now' instead, and of course, no firmware was then forthcoming, leaving the drive in an unusable state.

      LG ended up replacing old affected drives, and the kernel patch was rewritten. Mandrake bore the brunt of the reputation hit though for quite a while, which I suspect will happen to SuSE.

      The e1000e driver is the new one for pci-e based intel pro 1000 chipsets, with the old pci and pci-x cards unaffected with the original e1000 driver. Still, that's going to be quite a lot of cards affected.

    • I don't think that's entirely possible.

      Consider a cooling system. Is there any reason that shouldn't be software-controlled? If it is, the user could conceivably turn off all fans, thus overheating the device.

      Sure, you could take away enough functionality that the user can't do that. But that's the tradeoff -- functionality. No decent gun wouldn't allow you to shoot yourself in the foot.

      But then, I do think Linux should be in Intel's Q&A, especially for something like a network card.

    • Gun and air-conditioning aside, devices should not allow accidental bricking or physical damage unless it is inherent in the function of the hardware.

      For cases of loading bad firmware, the "load new firmware" instruction should have a few failsafes like magic words or what-not so it isn't accidentally invoked.

      Even better, hardware devices should have a failsafe firmware burned on silicon that can be reactivated by flipping a switch, setting a jumper, or some other hardware-action-required setting. This "fa

      • For cases of loading bad firmware, the "load new firmware" instruction should have a few failsafes like magic words or what-not so it isn't accidentally invoked.

        So you are saying that QA engineers could learn something from BSDM? Um, sign me up for that training class!
          • I bet it's not as painful as the alternatives you mentioned!
          • Google it, should be pretty apparent what BDSM is.

            He is talking about a safe word for when things get a bit out of hand - but the other way around; a word to allow things to get out of hand.

  • !Bricked (Score:5, Funny)

    by Anonymous Coward on Tuesday September 23 2008, @08:30AM (#25119229)

    Why won't people stop using the word brick to mean things that aren't bricked! All you have to do is use a quasi-negative reverse transponder linked to your flux capacitor to generate an inverse tachyon field, connect it to the JTAG while chanting Siaynoq and it will come right up. Sheesh!

    • What's your definition? From TFA:

      The problem is described as "a serious issue with the potential to damage the network card in a way that it cannot be used any longer".

      I thought that was pretty much textbook "bricking".

    • Unfortunately, due to inferior materials used in the chip's casing, exposing the device to a sufficiently strong inverse tachyon field will cause protonium breakdown which will in turn cause an endothermic reaction, which in turn will fracture the silicon along the sharp drop-offs in the resulting thermal gradient. As a side-effect of the presence of the inverse tachyons, the failure will happen in the near future rather than immediately. In other words, your device will work on the testbench but by the t

    • You forgot the deflector dish.
    • Re:!Bricked (Score:5, Funny)

      by clickety6 (141178) on Tuesday September 23 2008, @10:13AM (#25120859)

      I also hate it when people call these things bricked incorrectly.

      Bricked XBOXen, bricked PSPs, bricked iPhones and now bricked network cards.

      People, these things are not bricked! Believe me. I've tried building houses and garden walls out of them and they are absolutely fecking useless as bricks!

      Please use the correct term in future. These items are not bricked, they are just FUBI (fecked up by incompetence)

      Thank you...

  • by OverlordQ (264228) on Tuesday September 23 2008, @08:36AM (#25119319) Journal

    I hate it when people keep incorrectly using brick . . . . wait, what? They used it right? Oh . . . my bad, carry on.

      • Except that that's not true.
        You *may* be saved by a BIOS update, but only if that update happens to include the LAN option ROM and NVM area.
        All of the publically available BIOS images for my Thinkpad X300 do *not* include the LAN portions and so were unable to rescue my corrupted NVM.
        Ironically, Intel do ship rescue tools for this sort of thing, but while they run on Laptop parts, they are not supposed to, so trying to use that actually made things worse and the NIC refused to initialise and didn't even app

  • Kernel fix, perhaps? (Score:4, Informative)

    by Anonymous Coward on Tuesday September 23 2008, @08:38AM (#25119341)

    Kernel 2.6.27-rc7 has a changelog entry that reads:

    Christopher Li (1):
                e1000: prevent corruption of EEPROM/NVM

    • by neonprimetime (528653) on Tuesday September 23 2008, @08:46AM (#25119463)
      From: Christopher Li

      Andrey reports e1000 corruption, and that a patch in vmware's ESX fixed it.

      The EEPROM corruption is triggered by concurrent access of the EEPROM read/write. Putting a lock around it solve the problem.


      link [kernel.org]
        • e1000 and e1000e are separate drivers.
          Some of the devices which are now supported by e1000e were previously supported by e1000, so it's all a bit confusing.

          AIUI, if the part is PCI Express, it's now in e1000e.

  • Oh great. (Score:5, Interesting)

    by gandhi_2 (1108023) on Tuesday September 23 2008, @08:51AM (#25119543)
    Remember when Dell told customers that installing Linux on their computers voided the warranty?

    Remember how everyone on /. called bullshit?

    This doesn't look good for our cause.

    • This doesn't look good for our cause.

      It doesn't, but it does get me thinking: Given that it's possible for a badly behaved driver running in the kernel to stamp over NVRAM rendering hardware useless, how many pieces of bricked hardware have I thrown out over the years that were bricked because of a freak coincidence involving a rogue driver?

      Bear in mind that many drivers in Windows run quite close to ring 0 of the kernel and they would be just as capable of causing such a problem.

      • Isn't there an option to buy support from Canonical?

        (Disclaimer: I work for Canonical, but not in the bits that produce or support Ubuntu)

  • by ronch (82516) on Tuesday September 23 2008, @08:51AM (#25119553)

    I work on the e1000 team (including the e1000e driver) and here is what we know. A panic in another driver (believed to be the gfx driver but uncertain) which scribbles over the NIC/LOM non-volatile memory (NVM). This is only happening with the 2.6.27-rc kernels on ICHx systems. Since the NIC/LOM VNM is part of the whole BIOS image other things in the system could be effected by this driver panic as well. An update of the system BIOS will restore the NIC/LOM to be operational. We have some patches under test right now that we will be releasing later today to protect the NIC/LOM NVM. That should help narrow down who is scribbling over NVM.

    • Re: (Score:3, Funny)

      by Anonymous Coward

      This post was what to helpful and informative.

      It doesn't belong in the comments of a Slashdot article!

    • Would you like some toast with your acronym soup ? The scary thing is that it is in fact readable...

    • Re: (Score:2, Funny)

      You should take their crayons away until they learn to stop scribbling everywhere.

      -

    • Re: (Score:2, Informative)

      I work on the e1000 team (including the e1000e driver) and here is what we know. A panic in another driver (believed to be the gfx driver but uncertain) which scribbles over the NIC/LOM non-volatile memory (NVM). This is only happening with the 2.6.27-rc kernels on ICHx systems. Since the NIC/LOM VNM is part of the whole BIOS image other things in the system could be effected by this driver panic as well. An update of the system BIOS will restore the NIC/LOM to be operational.

      In other words, as usual, the device is NOT bricked.

    • ronch: very interesting.

      I was given to understand that writing to the NVM on these parts was controlled by a hardware lock bit and so it shouldn't be possible for something to scribble over it?

      • Most Ethernet drivers expose interfaces for writing to their NVM, specifically for ethtool to do its work.

      • by T.E.D. (34228) on Tuesday September 23 2008, @02:14PM (#25125389)

        I've written an E1000 driver (for a realtime program on a different OS). The issue is that once you have the base address registers for the card mapped into system memory, they are there. There's no super-special secret mechanisim devoted to writing only to the flash RAM.

        Oversimplifying things a great deal (so experts out there, please don't roast me over the nitty details), every PCI device in your system presents software drivers with up to 6 "Base Address Registers" (BARs). Most PCI devices really only use one or two of those. This is (mostly) the device-driver's only window into the PCI device.

        At bootup the system places the physical address [wikipedia.org] of the device's control registers and memory into its BARs. When the device driver starts up later, it grabs those physical addresses and maps them into virtual memory [wikipedia.org] so that software can get at them.

        Once this is done, *all* the device's control registers are avialble to software. If one of these registers can command the card to write data to flash (as one of the control registers on the E1000 does), then the proper (or improper) value written to that memory location by *anyone* will cause a flash write. Its that simple.

  • Lesson finished (Score:2, Insightful)

    by Anonymous Coward

    What do we learn from this incident?

    1. Beta is not for the common people.
    2. Programmers are humans are erroneous.
    3. "This program is distributed in the hope that it will be useful,
            but WITHOUT ANY WARRANTY; [...]"

    • 1. Beta is not for the common people.

      Is that why Betamax (consumer video tape format) died and Betacam (professional video tape format family) lived?

    • 3. "This program is distributed in the hope that it will be useful,
                      but WITHOUT ANY WARRANTY; [...]"

      Don't run any Microsoft or Apple operating system either, then. Both of them expressly declare there is no warranty.

  • by AdamWill (604569) on Tuesday September 23 2008, @02:44PM (#25125877)
    This can also affect Mandriva Linux 2009 pre-releases. To be clear, the bug is in the upstream kernel itself, not in any code specific to any distribution.
    It affects any 2.6.27rc kernel, whether it's in a distribution or a clean upstream build.
    We have posted a full, detailed notification of the issue [mandriva.com] for Mandriva users.
    • I can see how that will work well: "we need the change the bios update procedure. Where it says:
      "click on bios update, click yes, click yes, really, come back in 2 min"
      we need:
      "get on your knees, pull out computer from under desk
      get screw driver, take off cover, remove dust
      find flashlight and small needle nose pliers, locate jumper
      take out graphic card to access jumper hiding under oversized heatsink, move jumper, reinsert graphic card
      boot, click on bios update, click yes, click yes, really, come back
      • Are you talking about the Linux driver? e1000e isn't at all a closed driver, it's in the kernel.org source, which is published under the GPL v2. That's about as un-closed as you can get.