Some numbers: 40GiB-40GB = 2813MiB. Considering 512KiB blocks =>5626 blocks ~= number of reallocated sectors until spare area reached 0 for Intel 320 drive.
Now 40GB model has 48GiB (20% more) while 80GB and 160GB models have 88GiB and 176GiB. If it is needed to have 10% more for parity, then we can assume the drive has 4GB more spare area (plus or minus 1-2GB as parity data usage is unknown) which is used right now. That means maybe 8000-10000 more spare blocks. Also, it is possible that another complete die from one package is wearing out so the number of reallocated sectors might stabilize again around 8192.
Not sure but it looks to misinterpret something, the data is written sequentially, the only part that is not seq is the small part at the end of the loop, which is limited to 100MiB per loop.
Kingston SSDNow 40GB (X25-V)
398.93TB Host writes
Reallocated sectors : 11
MD5 OK
33.36MiB/s on avg (~46 hours)
--
Corsair Force 3 120GB
01 94/50 (Raw read error rate)
05 2 (Retired Block count)
B1 57 (Wear range delta)
E6 100 (Life curve status)
E7 53 (SSD Life left)
E9 183160 (Raw writes)
F1 243906 (Host writes)
104.17MiB/s on avg (~144 hours)
power on hours : 722
144 hours = 6 days, it has written > 51TiB in one single session, that is pretty good.
I'll restart the computer and re-activate C-States shortly.
Last edited by Anvil; 10-16-2011 at 01:24 PM.
-
Hardware:
Anvil,
Are you still planing on adding user adjustable parameters for the end of the loop?
I put a different Windows installation on the Mushkin. This installation was larger than than than the previous one on there, and as a result much fewer files were being generated per loop.
The last four crashes I've encountered have occurred only during the deleting files phase. I jacked down the free space to get the number of files on par with what was being generated before to see if this helps. I'm grasping at straws here, but below 10,000 files I think the pause isn't long enough for the Mushkin. I'm testing it with more files per loop now to see what happens.
Last edited by Christopher; 10-16-2011 at 04:34 PM.
Mushkin Chronos Deluxe 60 Update, Day 25
05 2
Retired Block Count
B1 16
Wear Range Delta
F1 234538
Host Writes
E9 180877
NAND Writes
E6 100
Life Curve
E7 10
Life Left
Average 129.42MB/s Avg
Intel RST drivers, Asus M4g-z
574 Hours Work (28hrs since the last update)
Time 23 days 22 hours
6 GiB Minimum Free Space
I think the sudden increase in Wear Range Delta is attributable to the decrease in free space I'm using to achieve more than 10,000 files per loop.
Last edited by Christopher; 10-16-2011 at 07:28 PM.
The 320 still is not dead ? Has Samsung arrived yet ?
Yes, you can adjust the pause, the delay while deleting files is still not configurable by the end-user but it is adjusted to 500ms per 500 files.
Both are now running on the ASRock Z68, C States Enabled.
Kingston SSDNow 40GB (X25-V)
400.33TB Host writes
Reallocated sectors : 12 1 up
MD5 OK
36.63MiB/s on avg (~11 hours)
--
Corsair Force 3 120GB
01 88/50 (Raw read error rate)
05 2 (Retired Block count)
B1 59 (Wear range delta)
E6 100 (Life curve status)
E7 52 (SSD Life left)
E9 186411 (Raw writes)
F1 248238 (Host writes)
106.99MiB/s on avg (~11 hours)
power on hours : 735
Wear Range Delta is as high as it was before it started decreasing, lets see what happens later today.
-
Hardware:
M225->Vertex Turbo 64GB Update:
521.28 TiB (573.15 TB) total
1357.30 hours
10602 Raw Wear
118.38 MB/s avg for the last 63.81 hours (on W7 x64)
MD5 OK
C4-Erase Failure Block Count (Realloc Sectors) from 8 to 9.
(1=Bank 6/Block 2406; 2=Bank 3/Block 3925; 3=Bank 0/Block 1766; 4=Bank 0/Block 829; 5=Bank 4/Block 3191; 6=Bank 7/Block 937; 7=Bank 7/Block 1980; 8=Bank 7/Block 442; 9=Bank 7/Block 700)
24/7 Cruncher #1
Crosshair VII Hero, Ryzen 3900X, 4.0 GHz @ 1.225v, Arctic Liquid Freezer II 420 AIO, 4x8GB GSKILL 3600MHz C15, ASUS TUF 3090 OC
Samsung 980 1TB NVMe, Samsung 870 QVO 1TB, 2x10TB WD Red RAID1, Win 10 Pro, Enthoo Luxe TG, EVGA SuperNOVA 1200W P2
24/7 Cruncher #2
ASRock X470 Taichi, Ryzen 3900X, 4.0 GHz @ 1.225v, Arctic Liquid Freezer 280 AIO, 2x16GB GSKILL NEO 3600MHz C16, EVGA 3080ti FTW3 Ultra
Samsung 970 EVO 250GB NVMe, Samsung 870 EVO 500GBWin 10 Ent, Enthoo Pro, Seasonic FOCUS Plus 850W
24/7 Cruncher #3
GA-P67A-UD4-B3 BIOS F8 mod, 2600k (L051B138) @ 4.5 GHz, 1.260v full load, Arctic Liquid 120, (Boots Win @ 5.6 GHz per Massman binning)
Samsung Green 4x4GB @2133 C10, EVGA 2080ti FTW3 Hybrid, Samsung 870 EVO 500GB, 2x1TB WD Red RAID1, Win10 Ent, Rosewill Rise, EVGA SuperNOVA 1300W G2
24/7 Cruncher #4 ... Crucial M225 64GB SSD Donated to Endurance Testing (Died at 968 TB of writes...no that is not a typo!)
GA-EP45T-UD3LR BIOS F10 modded, Q6600 G0 VID 1.212 (L731B536), 3.6 GHz 9x400 @ 1.312v full load, Zerotherm Zen FZ120
OCZ 2x2GB DDR3-1600MHz C7, Gigabyte 7950 @1200/1250, Crucial MX100 128GB, 2x1TB WD Red RAID1, Win10 Ent, Centurion 590, XFX PRO650W
Music System
SB Server->SB Touch w/Android Tablet as a remote->Denon AVR-X3300W->JBL Studio Series Floorstanding Speakers, JBL LS Center, 2x SVS SB-2000 Subs
Bumping down free space to generate over 10K files per loop seems to have done the trick -- so far.
No more crashing on file deletes... bonus.
320 still going strong. The reallocated sectors are still quickly rising but the SSD seems fine...
I just got the Samsung 5 minutes ago. First I have to fix the broken power connector. I probably wont get to it today, but I should get it done tomorrow. Johnw - what read tests do you want done?
After the read tests, I will attempt to write a translator algorithm onto it (which should take forever considering its state) then pop the NAND off and read it directly.
One_Hertz,
Do you have to make your own Flash Translation Layer for each drive?
First, see if you can read anything on the filesystem at all. Just before I broke the SATA power connector, I was unable to get the BIOS to even recognize the SSD. It reached write exhaustion on 2011-Aug-20, but I was still able to read files from it. Then I left it unpowered for a month, and tried to read it again, but the BIOS would not recognize the drive. I fiddled with it a little with no luck, then when I was trying to get it recognized on another computer was when the SATA power connector broke.
The idea was originally to check the MD5 of the ~40GB file on the SSD every month for a year, since consumer SSDs are supposed to be able to retain data, unpowered, for a year after write exhaustion. It seemed like the Samsung did not even last a month, but it would be good for you to double check. If you do manage to mount the file system read-only, then please compute an MD5 checksum on the ~40GB file.
If you are unable to mount the filesystem at all, then you can proceed with whatever tests you would like to try.
Hi
First of all, thank you for doing this great project - I've been following this thread every day for weeks now with great interest.
Having an SF-22xx drive myself, I've been especially interested in this aspect of the thread. I've owned a Force GT 120GB for a bit more than a month now and I've had crashes with the drive disappearing from BIOS two times only. While not much compared to other folks, it's still annoying, and maybe even more annoying knowing I've got a... fragile... piece of hardware with the potential to cause trouble when it feels like doing so.
So, may we have a recap on the SF issue and can you say anything concrete on the matter in terms of causes/workarounds at this time?
Also, I don't mind helping out if you have some specific BIOS settings/workloads/etc to try out if it brings the world closer to solving the infamous SF problem. This is my only PC tho and it's not running 24/7, so I may not have the same resources but I'll do what I can if needed.
What I have observed around the SF bug myself is that it's only happened during an overclock with an OFFSET vcore while running the BOINC client with the specific settings of 100% CPUs used and 60% CPU time. This configuration results in an erratic load on the processor going literally from 0% to 100% and back down to 0% in under a second with the vcore being thrown around like mad. The first time it crashed within 2 hours of this load and the second time it happened within 10 hours of this load. C3/6 states disabled, altho C1E and EIST have been enabled.
At the moment I'm testing the same CPU workload but with a static vcore.
Folmer
Take a look here. This is where they have been working on those problems.> http://www.xtremesystems.org/forums/...nd-workarounds
Really interesting to see the results of the 320 and the Samsung. Can't wait. How long will the M4 last now ? Any estimates ? Maybe 1 PB ?
Based on what happened to your drive (and what appears to be happening to One-Hertz's drive) I suspect that is based on MWI exhaustion, not the physical exhaustion of the NAND. It’s a grey however area that manufacturers should confirm.
I am interested to see if/ how well the Samsung managed static data. One_Hertz, is this something you can determine?
@ Anvil, I monitored "normal" activity today using DiskMon and when I plugged in the stats it came out as 37% random/63% sequential for writes, with an average write file size of 64KB. Still not sure why ASU came out as mostly random. Will check into it more later.
FoLmEr,
There is a new thread dedicated to this very issue here:
http://www.xtremesystems.org/forums/...=1#post4975129
Could you post your entire system specs there?
And
Are you using the GT as a system drive?
Anything relevant, and perhaps not so relevant could be useful (one day).
1: AMD FX-8150-Sabertooth 990FX-8GB Corsair XMS3-C300 256GB-Gainward GTX 570-HX-750
2: Phenom II X6 1100T-Asus M4A89TD Pro/usb3-8GB Corsair Dominator-Gainward GTX 460SE/-X25-V 40GB-(Crucial m4 64GB /Intel X25-M G1 80GB/X25-E 64GB/Mtron 7025/Vertex 1 donated to endurance testing)
3: Asus U31JG - X25-M G2 160GB
Ideally, if the SSD is accessible through the controller, I would write the logical block address of each sector as the sector content of that sector. Then I would read all the NAND chips directly and have all the data, except it would be in the order it is actually on the flash. The contents of the sectors will allow me to take a very basic look at how the wear leveling algorithm works. Most likely, it will be such a complex mess that I won't be able to figure much out myself, but who knows... it did fail first. I mainly just want to take it apart to see how easily the NAND itself could be read. It might read just fine or it might be a sea of ECC errors. I will see.
I'll try this. Hopefully it is accessible still.
Cheers for the heads up on the SF thread. I'll keep my SF business inthere
Ao1,
Maybe the 3xnm flash in the Samsung is just really consistent and got worn out mostly at the same time. What do you mean by MWI exhaustion vs. Wearing the flash out?
What I was trying to say was that MWI is based on the theoretical P/E cycles for NAND. What this thread is showing is that actual P/E capability is significantly more. The Samsung looked like it kept writing until the P/E capability was physically deleted. The 1 year data retention duration however is most likely based on expiry of the theoretical P/E cycles, rather than physical depletion.
Regarding wear out I’m just curious to see if/ how well the Samsung rotated the static data. The high rate of sustained sequential write speeds might be helped by little controller overhead in the form of W/A & WL.
Kingston SSDNow 40GB (X25-V)
401.63TB Host writes
Reallocated sectors : 12
MD5 OK
34.83MiB/s on avg (~22 hours)
--
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 61 (Wear range delta)
E6 100 (Life curve status)
E7 51 (SSD Life left)
E9 189633 (Raw writes)
F1 252531 (Host writes)
107.04MiB/s on avg (~22 hours)
power on hours : 747
B1 is still climbing, I expected it to stay in the 50s.
-
Hardware:
Bookmarks