Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage Software Education Linux

The Many Paths To Data Corruption 121

Runnin'Scared writes "Linux guru Alan Cox has a writeup on KernelTrap in which he talks about all the possible ways for data to get corrupted when being written to or read from a hard disk drive. This includes much of the information applicable to all operating systems. He prefaces his comments noting that the details are entirely device specific, then dives right into a fascinating and somewhat disturbing path tracing data from the drive, through the cable, into the bus, main memory and CPU cache. He also discusses the transfer of data via TCP and cautions, 'unfortunately lots of high performance people use checksum offload which removes much of the end to end protection and leads to problems with iffy cards and the like. This is well studied and known to be very problematic but in the market speed sells not correctness.'"
This discussion has been archived. No new comments can be posted.

The Many Paths To Data Corruption

Comments Filter:
  • End-to-end (Score:5, Informative)

    by Intron ( 870560 ) on Friday September 14, 2007 @05:59PM (#20609847)
    Some enterprise server systems use end-to-end protection, meaning the data block is longer. If you write 512 bytes of data + 12 bytes or so of check data and carry that through all of the layers, it can prevent the data corruption from going undiscovered. The check data usually includes the block's address, so that data written with correct CRC but in the wrong place will also be discovered. It is bad enough to have data corrupted by a hardware failure, much worse not to detect it.
  • Hello ZFS (Score:5, Informative)

    by Wesley Felter ( 138342 ) <wesley@felter.org> on Friday September 14, 2007 @06:09PM (#20609971) Homepage
    ZFS's end-to-end checksums detect many of these types of corruption; as long as ZFS itself, the CPU, and RAM are working correctly, no other errors can corrupt ZFS data.

    I am looking forward to the day when all RAM has ECC and all filesystems have checksums.
  • Re:Hello ZFS (Score:4, Informative)

    by harrkev ( 623093 ) <kevin@harrelson.gmail@com> on Friday September 14, 2007 @06:52PM (#20610409) Homepage

    I am looking forward to the day when all RAM has ECC and all filesystems have checksums.
    Not gonna happen. The problem is that ECC memory costs more, simply because there is 12.5% more memory. Most people are going to go for as cheap as possible.

    But, ECC is available. If it is important to you, pay for it.
  • by E-Lad ( 1262 ) on Friday September 14, 2007 @08:39PM (#20611497)
    Give this blog entry a read:
    http://blogs.sun.com/elowe/entry/zfs_saves_the_day_ta [sun.com]

    And you'll understand :)
  • by Anonymous Coward on Saturday September 15, 2007 @01:29AM (#20613369)
    Note: The newer Intel P965 chipset does not support ECC memory while their older 965x does. Crying shame too given the P965 has been designed for Core 2 Due and Quad Core CPUs.

    You meant 975x, not 965x. The successor of 975x is X38 (Bearlake-X) chipset supporting ECC DRAM. It should debut this month.
  • by Anonymous Coward on Saturday September 15, 2007 @04:30AM (#20614299)
    Sad given that ECC logic is so simple it's basically FREE.

    What's worse? It IS free!
    Motherboard chips (e.g. south bridge, north bridge) are generally limited in size NOT by the transistors inside but by the number of IO connections. There's silicon to burn, so to speak, and therefore plenty of room to add features like this.

    How do I know this? Oh wait, my company made them.... We never had to worry about state-of-the-art process technology because it wasn't worth it. We could afford to be several generations behind for exactly this reason.

Work is the crab grass in the lawn of life. -- Schulz

Working...