Thanks.
I read briefly Intel's Intelligent RAID 6 Theory Overview and Implementation and that's enough.
Printable View
Thanks.
I read briefly Intel's Intelligent RAID 6 Theory Overview and Implementation and that's enough.
Thank you for the code, stevecs. Testing now.
The script runs great, but I ran into a small problem. The creation of the md5 hash value completes and logs the value in a file within /var/log/md5sum. However, when the check is run it produces the error message you defined in the Md5CheckFileArray function...
Any ideas? I tried running a check after editing the same file and creating a new md5 value but received the same error. I also verified the md5 values in the output file were different.Code:./md5check.sh: INFO: Md5CheckFileArray: reading in old md5sum output file...
./md5check.sh: ERROR: Md5CheckFileArray: old md5sum file does not exist
./md5check.sh: INFO: starting Md5Check...
Thanks for your help.
Yeah, the INFO's you can ignore they are just telling you what functions are being entered (it /IS/ only an after-dinner first pass. :P) The ERROR one basically is saying that there is no previous output to compare the file system to in the first draft it is set up (hard-coded) to write a file called md5sum.<yearmonth>01 (I must have forgotten to fix that) was going to just search the $HASHDIR for the newest output and use that. Anyway, I ran it against about 900,000 files here both unicode-16 and normal file names (multi-byte characters) so that's clean at least and it does multi-threading (which was the main thing I was going for).
create the directory /var/log/md5sum then run it first in create mode:
md5check.sh <directory> create
then run it in check mode
md5check.sh <directory> check
I've been e-mailing Eric Gerbier (writer of afick http://afick.sourceforge.net/) to see if we can merge ideas. Mine above works well with large files (100MiB+) (was able to push 800MiB/s throughput on a 2.4Ghz quad core which fully saturated the areca 1280ML) but with small files the spawning of the md5sum is a killer. Eric uses perl which has md5/sha1 already built in (no spawning penalty) however he doesn't have any multi-threading (so limited to about 200-250MiB/sec max on a 2.4Ghz core2). He also is already using GDBM, which is what I was going to do next in mine.
That same code would take me several days to hack together, and it definitely wouldn't have multi-threading support. As a grateful tester, I was just hoping you knew how to fix the output file discrepancy. :)
I've subscribed to the afick project on sourceforge. It already has a nice feature set and implementation of multi-threading would make it a very nice program for detecting bit errors.
Like enteon said, this thread has made me totally re-evaluate my storage infrastructure.
@gogeta, try it again, I fixed the directory creation stuff, also made the time stamps include hour/min/sec as well so you can do multiple runs in a day (not a problem here as it takes a long time to run against many files).
The 1.5TB dirves are desktop drives, they have a bit error rating of 10^14, a duty cycle of 8 hours/day (all desktop drives), and a mtbf of 750,000 hours which calculate out to a % change of not being able to read all sectors of ~11% for a single drive. Contrast that with nearline drives (NS series) 1TiB which have a bit error rate of 10^15 (order of magnitude better), 24 hour duty cycle, and mtbf of 1,500,000 hours which have a % chance of not reading all sectors on a drive to about 1%. With large drives like these they need to have much higher BER ratings (enterprise has 10^16) which should really be put on the larger drives (you don't need that high of a rating for a small drive, you DO need them on huge drives).
Basically, you get what you pay for to some degree, best (lowest error rating) would be use enterprise drives (normally the high rpm 300GiB or less SAS ones these days) and bunches of them; next would be nearline drives; last would be desktop. And the absolute last would be the very large desktop ones (or I guess laptop drives should fall near the end as well but generally people don't use them for long term storage).
:) A NAS w/ laptop drives !?!? wow, talk about Russian roulette with an automatic.