Results 1 to 25 of 36

Thread: bit error = lost file, reduce BER how?

Hybrid View

  1. #1
    Xtreme Mentor
    Join Date
    Sep 2006
    Posts
    3,246
    Quote Originally Posted by m^2 View Post
    So RAID controller just flips a bit on one drive, not necessarily the correct one? Therefore I don't see any significant advantage of RAID 6 over RAID 5 here.

    What happens if there's error on the drive that internal ECC can't correct? Drive returns error message and RAID can use another drive to get the data, right?

    BTW do you know anything about chance of getting bit corruption in case of having RAID+ECC mem?
    I mean (undetected) error on HDD / in memory / cables etc.?
    I guess that if industry doesn't care, it's a very minor thing, but it would still be good to know the problem.
    I'll try:

    RAID 5, you have to tell it which bit to favor, parity or original data. This means no real protection other than preventing corrupt volumes.

    RAID 6, there are 3 bits and the odds of 2 bits being bad are small, therefore when the bad data is fixed, odds are good it really was the bad data and you are doing the equivalent of a coin toss.

    As far as RAM bit errors, the original Corsair rule I recall is: 1 bit error occurs in 256MB of ram every month.
    4GB = 15 bit errors/month
    8GB = 31 bit errors/month
    16GB = 62 bit errors/month
    32GB = 125 bit errors/month

    However, other more recent sources maintain 1 bit error per gigabyte per month, so that cuts those numbers by a factor of 4, i.e.:
    4GB = 4 bit errors/month
    8GB = 8 bit errors/month
    16GB = 15 bit errors/month
    32GB = 31 bit errors/month

    Which is more correct? My guess is that it's somewhere in between, governed by the quality of the RAM, amount of overclock beyond stated specs, the luck of the draw, and a host of other issues.


    Anyone with more up to date or correct info please feel free to correct me.
    Last edited by Speederlander; 12-29-2008 at 09:28 AM.
    [SIGPIC][/SIGPIC]

  2. #2
    Xtreme Addict
    Join Date
    Mar 2008
    Posts
    1,163
    Quote Originally Posted by Speederlander View Post
    I'll try:

    RAID 5, you have to tell it which bit to favor, parity or original data. This means no real protection other than preventing corrupt volumes.

    RAID 6, there are 3 bits and the odds of 2 bits being bad are small, therefore when the bad data is fixed, odds are good it really was the bad data and you are doing the equivalent of a coin toss.
    RAID 6 of 3 drives doesn't make sense. You have most likely 6-24 drives. And have to choose 1. Assumption that there's only 1 error is reasonable. With it the second parity lets you halve (?) number of drives that are suspected. So you have 8-33% of guessing correctly.

    Quote Originally Posted by Speederlander View Post
    As far as RAM bit errors, the original Corsair rule I recall is: 1 bit error occurs in 256MB of ram every month.
    4GB = 15 bit errors/month
    8GB = 31 bit errors/month
    16GB = 62 bit errors/month
    32GB = 125 bit errors/month

    However, other more recent sources maintain 1 bit error per gigabyte per month, so that cuts those numbers by a factor of 4, i.e.:
    4GB = 4 bit errors/month
    8GB = 8 bit errors/month
    16GB = 15 bit errors/month
    32GB = 31 bit errors/month

    Which is more correct? My guess is that it's somewhere in between, governed by the quality of the RAM, amount of overclock beyond stated specs, the luck of the draw, and a host of other issues.


    Anyone with more up to date or correct info please feel free to correct me.
    Thanks, I knew the Corsair data, but not the other one. But this applies to non-ECC memory. ECC has some error rate too - a double error in not correctable. Taking the data above and calculating chance if double error is wrong as ECC memory has more modules - how many? Depends on implementation.

    Anyway that's not what I was looking for - I'm curious how is it if you take computer as a whole, add standard protection (RAID+ECC mem+ECC controller cache), pump data constantly for a year or 2 and then verify the output. I didn't expect that you'll know of such data, but it doesn't hurt to ask.

  3. #3
    Xtreme Mentor
    Join Date
    Sep 2006
    Posts
    3,246
    Quote Originally Posted by m^2 View Post
    RAID 6 of 3 drives doesn't make sense.
    Where did I say it was only three drives?
    [SIGPIC][/SIGPIC]

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •