SSD Write Endurance 25nm Vs 34nm

Printable View

Show 100 post(s) from this thread on one page

10-08-2011, 04:03 PM
SynbiosVyse

Quote:

Originally Posted by sergiu

Most probably it is a controller failure, as it is highly unlikely to get all NAND dies dead. How many power on hours had? This could be a indicator for "controller burn test".

It was only online for about 2 months. Keep in mind though that also my drive did not have very much static data (just a few MB). I agree that this was probably a case of the controller pooping out.
10-09-2011, 05:38 AM
B.A.T

3 Attachment(s)

Update:
m4
632.6673 TiB
2303 hours
Avg speed 91.02 MiB/s.
AD gone from 254 to 245.
P/E 11010.
MD5 OK.
Still no reallocated sectors
Attachment 121000 Attachment 121001

Kingston V+100
It dropped out during the night. I'm not be able to reconnect it until tomorrow since I'm away this weekend. Anvil was talking about an updated ASU so we can restore the log ourselves when this happens. I'll ask him to help me unless the new ver of ASU is finished.
Attachment 120999
10-09-2011, 08:52 AM
Anvil

Both drives are now running on the X58 which is configured w/o power savings and runs OROM 11 + RST 11 alpha.

I'll play a bit with the Z68 setting during this session.

Both drives were allowed to idle for 12 hours and so there isn't much progress.
Wear Range Delta stuck at 57 so no decrease while idling.

Kingston SSDNow 40GB (X25-V)

380.19TB Host writes
Reallocated sectors : 11
MD5 OK

38.69MiB/s on avg (~4.5 hours)

--

Corsair Force 3 120GB

01 92/50 (Raw read error rate)
05 2 (Retired Block count)
B1 56 (Wear range delta)
E6 100 (Life curve status)
E7 66 (SSD Life left)
E9 136592 (Raw writes)
F1 181934 (Host writes)

106.33MiB/s on avg (~4.5 hours)

It has dropped a bit in avg speed but that is mainly due to that I have activated MD5 for this session.

power on hours : 544
10-09-2011, 09:16 AM
alpha754293

Quote:

Originally Posted by Anvil

I'll see what I can do about that summary.
PM/email me what you'd like to have in that summary.

Yeah, I don't think most people are actually getting how much data is written.
I've got one Kingston thats been running for more than 10,000 hours and its still short of 1TB Host Writes. (running as a boot drive on a server, not much happening but still it's running 24/7)
I'll check the two other 40GB drives I've got (both Intels), they are both used as boot drives as well but not running 24/7.

I'm pretty sure that 10-20TB Host Writes is what most of these drives will ever see during their normal life-span (2-3 years), unless they are used in non-standard environments.

I had a Fusion-io ioXtreme that had 307.8 TB writes. DOM was Jan 4 2010. And I posted it in Nov 9 2010.

See thread: http://www.xtremesystems.org/forums/...=1#post4619346

My current SSDs that I'm testing (Vertex 3 240 GB and RevoDrive 3 240 GB) are at about 1.4 TB write already and I've only had it for 5 days, so I'm at about 300 GB/day (at least for the RevoDrive 3). If I didn't have unexpected power losses, I'm sure that it would be higher by now.

So....14 TB of data is NOTHING to me.

*edit*
a) The Fusion-io ioXtreme was only 80 GB. And b) I think it was on a PCIe x4 connector (same as my RevoDrive 3 now).
10-09-2011, 10:23 AM
Ao1

Anvil I’ve been trying to find out a bit more about how the static data levelling process works. This white paper is the only one I can find that talks about how static wear levelling is implemented, but even here it is not discussed in great detail. If I read the following correctly the processes requires idle time before the process can start. (I guess you could also read the 1st trigger as the period the static data had sat idle, rather than how long the drive had sat idle).

The reason I think static data levelling is not working (as well as intended) on the SF drive is due to the high difference between worn/ least worn blocks. The SF controller is supposed to keep this to a very low threshold and the only reason I can think why it is not working is a lack of idle time. Obviously to get back to a low threshold you would need to write, pause, write, pause etc. until the wear could be evenly distributed, which would take a lot of time once the threshold has got beyond a certain point, as your drive appears to have done.

“Static wear levelling addresses the blocks that are inactive and have data stored in them. Unlike dynamic wear levelling, which is evaluated each time a write flush buffer command is executed, static wear levelling has two trigger mechanisms that are periodically evaluated. The first trigger condition evaluates the idle stage period of inactive blocks. If this period is greater than the set threshold, then a scan of the ECT is initiated.

The scan searches for the minimum erase count block in the data pool and the maximum erase count block in the free pool. Once the scan is complete, the second level of triggering is checked by taking the difference between the maximum erase count block found in the free pool and the minimum erase count block found in the data pool, and checking if that result is greater than a set wear-level threshold. If it is greater, then a block swap is initiated by first writing the content of the minimum erase count block found in the data pool to the maximum erase count block found in the free pool.

Next, each block is re-associated in the opposite pool. The minimum erase count block found in the data pool is erased and placed in the free pool, and the maximum erase count block, which now has the contents of the other block’s data, is now associated in the data block pool. With the block swap complete, the re-mapping of the logical block address to the new physical block address is completed in the FTL. Finally, the ECT is updated by associating each block to its new groups”.

Anyway I was going to experiment with wear levelling today but my V2 got hit with the time warp bug. First I noticed that files within folders had lost their integrity. It was as if I had run a SE from within Windows, but had not then rebooted. The file structure remained, but I could not copy or open files.

The drive then “disappeared”. It could not be seen in the bios or by the OS. After physically disconnecting and then reconnecting the drive it reappeared. The folder structure was complete and all files could be opened/ copied. Only one problem; the file structure had reverted to a previous time, i.e. recently created folders had disappeared.
10-09-2011, 11:52 AM
sergiu

Static data rotation might just be working but in a totally different way than everybody expects for SF. I interpret B1 (WRD) as difference in percents between most worn and least worn block. For 136592GB written, if considering 128GB as one complete cycles, we have ~1067 P/E cycles in average. For this values there might be a block with 1300 P/E cycles and another one with 572 cycles (1300-572)/1300 = 0.56.
Now, the controller might be programmed to postpone data rotation as much as possible to avoid increased wear, but to achieve a wear range delta of 5% (or any other value) at the end of estimated P/E cycles. This would explain why the value increased suddenly and now is slowly decreasing.
10-09-2011, 12:24 PM
Christopher

My Mushkin dropped from about 26 down to 9 - 10, but maybe we're misinterpreting. Maybe even 50+ is nothing on a 120GB drive. If my drive peaked at 27, and the 120GB Force 3 peaks at around twice that (as in that's the high water mark before it drops back down), then perhaps that's normal.

I haven't seen much in the way of interpreting Wear Range Delta.
10-09-2011, 01:34 PM
Anvil

Ao1,
I've read that paper before (or something similar), not sure how to interpret the lack of WRD movement while idling, I even restarted the computer and there was nothing monitoring the drives, no CDI or SSDLife, so, it was definitely idling. (and it was a secondary/spare drive)

And, if it wasn't doing static wear leveling the WRD would make no sense imho.

I think it somewhere along the lines that sergiu explains and there should be no problem in addressing static wear leveling on the fly.

Lets see if it keeps on decreasing, it's still at 56 and it's been writing for >9 hours since moving it to the other rig.

I've read about the time warp, have you been using the drive or has it been sitting there just idling?
Keep us updated on that issue.
10-09-2011, 01:39 PM
Anvil

Quote:

Originally Posted by alpha754293

...
My current SSDs that I'm testing (Vertex 3 240 GB and RevoDrive 3 240 GB) are at about 1.4 TB write already and I've only had it for 5 days, so I'm at about 300 GB/day (at least for the RevoDrive 3). If I didn't have unexpected power losses, I'm sure that it would be higher by now.

So....14 TB of data is NOTHING to me.
...

1.4TB in 5 days is quite a bit and certainly not normal, unless when doing lots of benchmarks and testing drives.
Let us know how it works out in a week/month or so.
10-09-2011, 02:01 PM
Christopher

My drive doesn't ever get a chance to idle since it's either writing 120+ mbs or disconnected. I only have 17 percent static data anyway, as I used a fresh Win7 install on it for when I was using it in a laptop. Today WRD has advanced to 11, so perhaps it will go back up to 20+.
10-09-2011, 03:26 PM
Christopher

1 Attachment(s)

Mushkin Chronos Deluxe 60 Update, Day 18

05 2
Retired Block Count

B1 11 Going up
Wear Range Delta

F1 168250
Host Writes

E9 129720
NAND Writes

E6 100
Life Curve

E7 32
Life Left

Average 128.23MB/s Avg
RST drivers, Intel DP67BG P67

415 Hours Work (23hrs since the last update)
Time 16 days 7 hours

12GiB Minimum Free Space 11720 files per loop

SSDlife expects 7 days to 0 MWI
Attachment 121022
10-10-2011, 12:26 AM
bulanula

Most likely controller failure for the Corsair drive. Too bad because it even had LTT removed. So I guess with the Mushkin disconnecting, we won't get to see how SF does perform !? Maybe SF is just not suitable for this kind of testing. Or maybe Anvil's drive will give us some results without the controller crapping out.
10-10-2011, 12:47 AM
Ao1

I read a note that wear levelling via static data rotation can potentially impact performance, so one could assume that idle time is not required, however I also read that the SF controller is supposed to maintain the wear delta within a few % of the maximum lifetime wear rating of the NAND.

Vapor indicates a value of 3 in post # 1,899, which seem about right to how the SF is supposed to behave. AFAIK Vapor’s SF has run continuously. So how could the F3 get to 58? If the P/E is 5K or 3K the difference between least and most worn is huge and would take a lot of writes (not just idle time) to rebalance.

3 = [(166.7) / 5,000] x 100 - Vapor
11 = [(550) / 5,000] x 100 – The highest I saw before aborting
58 = [(2,900) / 5,000] x 100 - Anvil

It seems there are two triggers for static data rotation:

• Length of time that data remains static before it is moved
• The maximum threshold between least and most worn blocks

If SF works on the first trigger then it would be reasonable that the delta would get quiet high if the drive is written to heavily. That does not seem to match up with what Vapor reports. :shrug:
10-10-2011, 01:17 AM
Christopher

Quote:

Originally Posted by bulanula

Most likely controller failure for the Corsair drive. Too bad because it even had LTT removed. So I guess with the Mushkin disconnecting, we won't get to see how SF does perform !? Maybe SF is just not suitable for this kind of testing. Or maybe Anvil's drive will give us some results without the controller crapping out.

I don't think you should worry about results with the Mushkin or the Force 3. It's not like total drive failures are common, but I'm not really surprised one drive (the F40-A) died from something other than wearing out the NAND. The chances of it happening to another drive in the near future is minuscule.

I've bought a new motherboard, it should be here this later this week. While I think it's ridiculous to have to buy hardware that works with the SF2281 rather than just having a drive that works with anything (you know, like it's supposed to), I'm committed to figuring out what the hell is going on. I can always reboot once a day as this should stop the drop outs from occurring, but that's not a very good solution (I'm not always going to be here to babysit the Mushkin). I've made some other changes this weekend, and as a result, I'm already past my own personal consecutive time between drive failures. If I can make it to 40 - 50 hours I may just be on to something.

The new motherboard is a Maximus IV Gene-z, so that makes one P67, H67, and Z68 board I'll end up testing on with the only hardware difference being motherboards. The P67 and H67 were basically the same at no more than 30 hours of running. Either the Mushkin lasts no more than 30 hours with the new motherboard, or it just works. Like I said, in the meantime I'm trying something else, but it would be hilarious if the Mushkin worked for the next 3 days until the Gene-z gets here. But it would also be hilarious if people have been swapping out the wrong components the whole time...
10-10-2011, 01:33 AM
Anvil

Kingston SSDNow 40GB (X25-V)

382.22TB Host writes
Reallocated sectors : 11
MD5 OK

36.12MiB/s on avg (~21 hours)

--

Corsair Force 3 120GB

01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 55 (Wear range delta)
E6 100 (Life curve status)
E7 64 (SSD Life left)
E9 141282 (Raw writes)
F1 188176 (Host writes)

106.62MiB/s on avg (~21 hours)

power on hours : 561

--

Wear Range Delta is slowly decreasing, nothing else of importance except for that it's still running.

Ao1,
We'll just have to monitor the Force 3, the next few days should tell us the trend although it looks like a slow process.
Some parameters a surely related to the capacity and free space.
Maybe the continuing creating and deleting of "fresh" files disturbs the advanced logic on the SF controller.
10-10-2011, 01:38 AM
Anvil

Quote:

Originally Posted by Christopher

..
I've made some other changes this weekend, and as a result, I'm already past my own personal consecutive time between drive failures. If I can make it to 40 - 50 hours I may just be on to something.
...

What changes :) and are you running a no "Power save" setup?

I'll know in 12 hours or so, the speed is OK for an 3Gb/s port.
10-10-2011, 03:34 AM
Christopher

Quote:

Originally Posted by Anvil

What changes :) and are you running a no "Power save" setup?

I'll know in 12 hours or so, the speed is OK for an 3Gb/s port.

All power saving technologies enabled, same RST 10.6 drivers. It's pulling 46.9w from the wall right now.

34 hours straight... still too early to tell, but still a record for me.

I don't want to jinx myself by proclaiming that I've found a fix, so the most I'll say at the moment is it SHOULD have already crashed, or SHOULD crash any moment...

:D
10-10-2011, 04:14 AM
Anvil

:)

Lets hope it works out and is reproducible!
10-10-2011, 06:52 AM
bluestang

1 Attachment(s)

M225->Vertex Turbo 64GB Update:

466.85 TiB (513.31 TB) total
1245.34 hours
9609 Raw Wear
118.46 MB/s avg for the last 64.76 hours (on W7 x64)
MD5 OK
C4-Erase Failure Block Count (Realloc Sectors) from 6 to 7.
(Bank 6/Block 2406; Bank 3/Block 3925; Bank 0/Block 1766; Bank 0/Block 829; Bank 4/Block 3191; Bank 7/Block 937; Bank 7/Block 1980)

Attachment 121053
10-10-2011, 08:00 AM
Anvil

The boot drive just disconnected, a Force GT on fw 1.3.

I've got a screenshot from 30min ago so avg speed etc is pretty close, the other values are the current ones, taken from CDI.

Kingston SSDNow 40GB (X25-V)

382.91TB Host writes
Reallocated sectors : 11
MD5 OK

35.33MiB/s on avg (~27 hours)

--

Corsair Force 3 120GB

01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 55 (Wear range delta)
E6 100 (Life curve status)
E7 64 (SSD Life left)
E9 143026 (Raw writes)
F1 190497 (Host writes)

MD5 OK

106.65MiB/s on avg (~27 hours)

power on hours : 568
10-10-2011, 08:28 AM
Ao1

2 Attachment(s)

Here is a summary of B1 data posted on SF drives. There is not enough info on what happened with Vapor’s SF drive. Vapor also did a lot of testing on compression before getting started on the endurance app, which would have made a difference; plus it has modified f/w.

Anyway, when the endurance app is running all the time B1 appears to only increase. I’m wondering if SynbiosVyse’s drive failed due the fact that B1 got too high. The difference between least and most worn blocks exceeds the P/E cycle capability.

130 = (6,500 / 5,000] x 100
130 = (3,900 / 3,000] x 100

Edit: Looking at SynbiosVyse’s posts he mentions he had hardly any static data (a few MB?) yet B1 was still able to significantly increase above the target SF value. :shrug:

Attachment 121054

Attachment 121087
10-10-2011, 10:11 AM
B.A.T

2 Attachment(s)

Update:
m4
641.5496 TiB
2331 hours
Avg speed 90.95 MiB/s.
AD gone from 245 to 240.
P/E 11165.
MD5 OK.
Still no reallocated sectors
Attachment 121056 Attachment 121055

Kingston V+100
The drive is up again but I need to restore the log before I can restart the test.
10-10-2011, 11:15 AM
bulanula

The real hero here is the M4. Too many problems and headaches with the SF. I wonder if any SSD will beat in in the future. Probably the C300 :)
10-10-2011, 12:40 PM
Anvil

Quote:

Originally Posted by Ao1

Here is a summary of B1 data posted on SF drives.

Nice compilation, maybe Vapor has more info on his drive(s)

It looks like mine hit some boundary at 59, it's still at 55. (the drive is not back online yet)
10-10-2011, 02:38 PM
Christopher

Now I'm starting to get nervous... 45 hours straight without disconnects.

Show 100 post(s) from this thread on one page