The system frooze almost 2 hours ago, will try the add-on controller next.
edit:
It was within a minute of starting a new loop (1580 files created).
The drive is now connected to a Marvell SE9120 6Gb/s controller. (in raid mode)
Printable View
The system frooze almost 2 hours ago, will try the add-on controller next.
edit:
It was within a minute of starting a new loop (1580 files created).
The drive is now connected to a Marvell SE9120 6Gb/s controller. (in raid mode)
Just to compare speeds...what is everyones Estimated/Current MiB/s min and max for a loop? I'll start with the M225.
bursts to 155-160+ for the first few seconds.
settles down in the low 150's to upper 140's.
about 75% through the loop will fall slowly till each reaches the end of the loop ending in the upper 130's.
Evidently my random writes kill my total Avg MB/s
The Force 3 does about 115-117MiB/s throughout the loop, that is on the PCH 6Gb/s controller.
The "X25-V" starts off at full speed (40MB/s+) and then gradually slows down during the loop, the biggest falloff is during the last 8-10GB.
I'll restart my system and take notes on the first loop.
edit:
I've been playing a bit with the drivers and they are just awfull.
Using the Marvell drivers I got disconnects in seconds, I have not looked for updated drivers, will do that later.
Using MSAHCI the speed is down from 115-117MiB/s to <90MiB/s (a bit early though as it's based on 1 loop)
If the speed doesn't improve I'm dropping the Marvell controller. (unless I can find an updated driver that works)
Will do some more tests before I decide what to do.
It starts at 157, but in less than a second it drops to 137,136,135, where it stays until the end (135).
So really you could say it stays within the 135 range for the entirety of the loop - it never drops below that on the P67 SATA III ports (the avg MB/s under total lists the avg now at 121.84, so maybe that includes the randoms and the pause/delete time). On the SATA III ports it averages 124MBs instantaneous estimated speed, and about 116MBs for the Avg MB/s rating.
EDIT
I've also tried something a little different. About 4 hrs ago, I reflashed the 3.20 FW on the Mushkin. I then unistalled the mushkin from the device manager. After a reboot, Windows wants to reinstall the driver, which it did, but then you have to reboot again before it gets active. Maybe this will help in some way -- maybe. I'm really just grasping at straws here.
I originally intended on purchasing another 2281 to run as a regular system drive. I decided against it because the problems keep getting worse. So I bought a used X25-M 80GB. It's the unSandforce.
m4:
The loop start at 111 MiB/s and stays around 109 MiB/s the whole loop, but once in a while the speed drops to around 80MiB/s. The speed is better after Anvil adjusted the pause. In the earlier versons the speed dropped to around 50 MiB/s
Kingston V+100:
The loop start at 190 MiB/s but dropps rapidly to around 120 MiB/s and slowly continues down to 89 MiB/s.
Mushkin Chronos Deluxe 60 Update
05 0
Retired Block Count
B1 20 (Up from 19)
Wear Range Delta
F1 84776
Host Writes
E9 65351
NAND Writes
E6 100
Life Curve
E7 67
Life Left
Average 122.25MBs.
215 Hours Work Time
Last 6hrs on 6gps port
12GiB Minimum Free Space
SSDlife expects 17 days to 0 MWI
Attachment 120682
EDIT
Notice that the Power Cycle count is now at two -- that looks like the only SMART value that was reset after the FW reflash.
i hate to hear about your SF issues Chris, but seriously i would advise you to get a refund. From what i have seen this will not get better unfortunately.
C300 Update
445.58TiB host writes, 1 MWI, 7532 raw wear indicator, 2048/1 reallocations, 62.65MiB/sec, MD5 OK
SF-1200 nLTT
295.563TiB host writes, 220.094TiB NAND writes, 10 MWI, 3521.5 raw wear (equiv.), wear range delta 3, 55.6MiB/sec, MD5 OK
I'd really hate to just give up on it though - that's not really my style, and it would be amazing if I could get around these problems. This thing is incredible for a 60GB drive, but I'd seriously reconsider any decision to buy a SF drive at this point, except from OWC -- They have a 30 day no-questions-asked money back policy . The vendor I bought the Mushkin from has an exchange only policy -- but maybe I can sweet talk them if it comes down to it.
well i think that alot of manufacturers have been bending the rules a bit with the SF issue. From what i have read many are allowing it.
If there was some kind of assurance that one or two out of every ten SF22xx drives were subject to the error, trying to get a new one wouldn't be a bad idea -- but if data like that exists, no one is talking. I have a few weeks left to make a decision like that, and I know there are many, many SF2281's out there happily working with 1155 boards without incident. There just isn't much real information about it since everyone with inside knowledge is bound by iron clad NDA, but if it's some SF incompatibility with my hardware then drive swapping won't matter. I'm going to roll with it for a while longer before making any decisions, but if it keeps up like this I'll invoke the nuclear option.
EDIT
It finally crashed again, so I tried it in my AMD system. I couldn't even get the system to boot with the drive in the machine, even as a secondary. I don't know what the hell's going on, so I put it in my laptop.
I just read AnandTech's review on the Intel 710 HET MLC. On the second page I read this:
I looked at the Smart data for my Intel drives and it looks like it works. I don't know how to reset the attribute with Smartmontools, but I'm working on it. This could be a useful tool for Anvil and OneHertz.Quote:
Thankfully we don't need to just take Intel's word, we can measure ourselves. For the past couple of years Intel has included a couple of counters in the SMART data of its SSDs. SMART attribute E2h gives you an accurate count of how much wear your current workload is putting on the drive's NAND. To measure all you need to do is reset the workload timer (E4h) and run your workload on the drive for at least 60 minutes. Afterwards, take the raw value in E2h, divide by 1024 and you get the percentage of wear your workload put on the drive's NAND. I used smartmontools to reset E4h before running a 60 minute loop of our SQL benchmarks on the drive, simulating about a day of our stats DB workload.
Once the workloads finished looping I measured 0.0145% wear on the drive for a day of our stats DB workload. That works out to be 5.3% of wear per year or around 18.9 years before the NAND is done for. I'd be able to find more storage in my pocket before the 710 died due to NAND wear running our stats DB.
For comparison I ran the same test on an Intel SSD 320 and ended up with a much shorter 4.6 year lifespan. Our stats DB does much more than just these two tasks however - chances are we'd see failure much sooner than 4.6 years on the 320. An even heavier workload would quickly favor the 710's MLC-HET NAND.
http://www.anandtech.com/show/4902/i...200gb-review/3
Nice find! Very interesting :)
I've never used smartmontools, but somehow you can reset certain smart attributes with it. I'm trying to figure that part out now, or whether just disconnecting the drive resets the counter. I did some cross checking and I think its in the data sheet for one of the series of drives (it works for all of them)
^ Intel talked about this at IDF 2010. The web seminar can be found here. (How do you measure endurance)
Kingston SSDNow 40GB (X25-V)
358.91TB Host writes
Reallocated sectors : 9
MD5 OK
36.45MiB/s on avg (~12 hours)
--
I've moved the Corsair to an ASUS M4E-Z68 and it can't keep up the same pace as the ASRock MB.
It'll have to do for this weekend though, the Marvell controller on the ASRock was terrible and resulted in an avg of <70MiB/s.
I managed getting it stable though by downloading a new driver and just as important, changing the SATA cable.
(the one I started off with was bad, never used it before though)
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 47 (Wear range delta)
E6 100 (Life curve status)
E7 80 (SSD Life left)
E9 85962 (Raw writes)
F1 114552 (Host writes)
94.81MiB/s on avg (~11 hours), so, down more than 10MiB/s on avg.
power on hours : 328
The Force 3 disconnected 30minutes ago. (in the middle of a loop)
In 12 hours it wrote 4026 GiB, avg MiB/s 94.64.
So, no improvement on the other Z68 rig, will be moving it to a X58.
Well Christopher, at least we now firmly know that SF2 has not been fixed. 10% failure due to incompatibility alone sounds ridiculously high to me. Lesson to learn : avoid SF drives until a major vendor jumps onto it. Avoid for now unless you like to RMA all day long back and forth ( one guy on OCZ forums RMA his drive 3 times and it still failed LOL ).
The Force 3 is now up and running on my super-rig :)
(X58, 3Gb/s ICH10R, iaStor 9.6.0.1014)
Speed looks to be on par or better than the M4E-Z 68 rig but let's give it a few loops...
Avoid is too harsh to say considering that for each drive with issues there are 9 drive that are working. A better advice would be "Evaluate carefully your options! If you cannot afford to risk, then consider SSDs with other controllers". I know Sandforce based SSDs deployed in desktops that are used as servers (24/24 usage) and in 3 months there was only one restart having root cause an unstable overclocking during which one drive was temporary not recognized.
Update with a milestone
m4:
574.7837 TiB
2117 hours
Avg speed 88.77 MiB/s.
AD gone from 27 to 22.
P/E 10044.
MD5 OK.
Still no reallocated sectors
Attachment 120702Attachment 120703
Kingston V+100
92.3074 TiB
367 hours
Avg speed 76.74 MiB/s.
AD gone from 114 to 105.
P/E ?.
MD5 OK.
Attachment 120700Attachment 120701
My Mushkin has run without incident on my C2D laptop (I think it's the ICH8M). Speeds drop from 123MBs Avg on my 1155 system to 94MBs Avg. I'm using the RST drivers as well.
The Sata II ports on the 1155 rig were good for just as much speed as the Sata III ports, so maybe the much slower laptop helps stability. I couldn't even get my AMD system to boot with the drive attached to a controller... which is strange (it's the 710 SB).
I certainly don't think it's every SF2281 drive, but I think some hardware just doesn't work with the drives due to defect, or of out of spec hardware, or one of twenty other reasons. Until and unless this drive doesn't work with any hardware I have, I won't be giving up on it. So if it's not stable in this laptop, then I get a PCIe Marvel 6G controller. If that doesn't work, then I start thinking about more extreme options.
I heard that it if you put a SATA bus analyzer in the chain to try and analyze SF stability issues, you end up stopping the instability you're trying to detect. Maybe I can find one cheap on ebay :rolleyes:
EDIT
Forget that... they're seriously expensive... who would have thought?
PS
Here is the link to the Intel 320 PDF for enterprise use if you want to see what Intel has to say on the matter of workload timers.
http://www.intel.com/content/dam/doc...n-addendum.pdf
Well, the overclocked server + Sandforce SSD was my idea, but I did not implemented the solution by myself, as it is over 1000Km away. I could only "crash" the server remotely, nothing more :) . Either way, the guys are very happy with the choice. They did not complain about problems and they noticed that swapping takes place very fast. Having an i7 2600K @4.8Ghz, 16GB ram and a Vertex 3 240GB hosting VMs (in house usage) for half the price that they would have paid for a real server with slow HDDs is a good deal for them.
im not sure how much stock i would put into the analyzer causing the issue to stop. If they cant even explain how it is happening, or make it replicable, how would they be sure the analyzer is stopping it?
So much FUD coming out around this issue! Hard to sift out the truth.
if the problem cant be located via a bus analyzer, then it will be impossible to fix.
I was joking about that ;)
Seriously, that's what SandForce said to AnandTech a while back. I don't believe anyone who claims to know what's going on with any sort of specificity. Every other controller seems to work okay (with maybe the Kingston 100+'s being some kind of exception) with every 1155 board and X58 and AMD southbridge. I'm going out on a limb here, but I think maybe the problem is with SandForce :rolleyes:
Here's some more on the Intel workload timers
From the Smartctl MAN pageQuote:
Example for Intel X18-M/X25-M G2 SSDs only: The subcommand 0x40 ('-t vendor,0x40') clears the timed workload related SMART attributes (226, 227, 228).
http://smartmontools.sourceforge.net...martctl.8.html
Kingston SSDNow 40GB (X25-V)
360.19TB Host writes
Reallocated sectors : 10 (up 1 again)
MD5 OK
34.42MiB/s on avg (~24 hours)
--
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 47 (Wear range delta)
E6 100 (Life curve status)
E7 79 (SSD Life left)
E9 88562 (Raw writes)
F1 118022 (Host writes)
94.70MiB/s on avg (~9 hours)
power on hours : 339
Let's see how long it lasts on the X58.
--
I'm happy with my SF based drives, including the SF2 based ones. (and I've got a few :))
Mushkin Chronos Deluxe 60 Update
05 2 <<-----This has increased from 0
Retired Block Count
Also Program Fail and Erase Fail (B5 and B6) increased to 1, likely increasing retired block count.
B1 25 (Up from 20)
Wear Range Delta
F1 92732
Host Writes
E9 71505
NAND Writes
E6 100
Life Curve
E7 64
Life Left
Average 93.23MBs.
239Hours Work
Time 9 days, 23 hours
Last 19.7hrs on 3gps port (C2D Laptop)
12GiB Minimum Free Space
SSDlife expects 16 days to 0 MWI
Attachment 120706
It looks like it will run forever in the laptop, but it's much (soul-crushingly) slower that the P67's 3gbps ports.
We'll see, but if(when) it crashes I'm trying something else.
EDIT: I wasn't even paying attention. At some point I had two reallocations with one program fail and one erase fail. I'll keep an eye on it.
** Apparently, I'm not S.M.A.R.T. enough to reset the workload timer on my Intels with Smartctl... **
Update
m4:
581.5452 TiB
2138 hours
Avg speed 88.74 MiB/s.
AD gone from 22 to 18.
P/E 10156.
MD5 OK.
Still no reallocated sectors
Attachment 120728Attachment 120729
Kingston V+100
98.1974 TiB
389 hours
Avg speed 76.91 MiB/s.
AD gone from 105 to 98.
P/E ?.
MD5 OK.
Attachment 120727Attachment 120726
Anybody wants to bet the m4 goes all the way to 1PB?
Did you try this command to reset the values?
smartctl -t vendor,0x40 /dev/pd0
(pd0 is the drive indentify and is user variable. In my case pd0 = C drive on SATA Port 0)
If the command works you should see this:
C:\Program Files (x86)\smartmontools\bin>smartctl -t vendor,0x40 /dev/pd0
smartctl 5.41 2011-06-09 r3365 [i686-w64-mingw32-win7(64)-sp1] (sf-win32-5.41-1)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "SMART EXECUTE OFF-LINE IMMEDIATE subcommand 0x40".
Drive command "SMART EXECUTE OFF-LINE IMMEDIATE subcommand 0x40" successful.
After an hour the values should change. I just ran it to make sure it worked. I only ran it for 80mins and the amount of writes were so low they did not even register. (This is on an X25-M btw)
(E2) 226 – 0 (0/1024 = 0.000%).
(E3) 227 – 79 (79% Reads)
(E4) 228 – 80 (80 minutes)
Kingston SSDNow 40GB (X25-V)
362.43TB Host writes
Reallocated sectors : 10
MD5 OK
33.47MiB/s on avg (~44 hours)
--
Corsair Force 3 120GB
01 86/50 (Raw read error rate)
05 2 (Retired Block count)
B1 49 (Wear range delta)
E6 100 (Life curve status)
E7 78 (SSD Life left)
E9 93488 (Raw writes)
F1 124591 (Host writes)
94.06MiB/s on avg (~29 hours).
power on hours : 359
Looking good, the next few hours will tell if there is a difference wrt SF2 based drives on SnB and X58.
I'll update the firmware on the Corsair when it either disconnects or it manages to stay online for more than 33-35 hours. (33-35hours was the maximum on SnB)
The new 1.3.2 fw was released a few days ago, not sure if it helps but it is worth trying.
Here is a screen shot using GSmartControl. If you hover the mouse over the SMART values info boxes appear. On the SSD Life left it seems to confirm that a value of 10 is a milestone, but it seems it will go to zero once the reserved blocks get used up. (It's the same for SF1 & SF 2 drives)
Attachment 120733
I’m not sure how to read the retired block rate value, which it states is the estimated remaining life. In the Intel spec sheets they state that out of 4,096 blocks at least 3,936 must be available throughout the endurance life of the product. Maybe that is the value that determins MRB?
65536 is the factory default afaik, so it sounds like you have correctly set the values.
From the product manual that Christopher linked it states:
3. Run the workload to be evaluated for at least 60 minutes (otherwise the drive wear
attributes will not be available).
Only only checked my values after an hour and at that point they had changed from default to the new values.
EDIT:
SMARTMONTOOLS needs to be Version 5.41 to be able to use the Intel reset command.
I've got 5.42, will check in an hour or so.
I got an error message, but when I looked all the timer values had been reset to 65536.
They've been there for the past 18hrs or so... I swapped the mother board in my SB rig to a H67. The H67 has hotswapping and Advanced LPM on/off options the bios, and I moved the Mushkin to it, with a X25-M system drive. The Mushkin made it overnight, but I'm not expecting any miracles -- the Dell D630 notebook is just too slow, and whether it's on the SnB II or III ports it's just a massive difference from the laptop's SATA II.
The X25s timer didn't seem to reset on power off (and then I cloned an image to it, then tried resetting the timer). My desktop only writes about 2.0 GB a day to the system drive.... Probably not enough to increase the timer on an 80GB, but then I kept getting error messages, so I was surprised it was reset at all.
No reads yet, will check in 30minutes or so, an MD5 check is due in 4 loops.
Attachment 120741
It's already at 105 in 90 minutes
105 / 1024 = 0.1025%
Not sure if one should take the minutes into account.
In section 3 of the product specs Intel provide some example use cases.
EDIT: I made a calculator with xcel. Just change the figures in red to the correct SMART values and everything else will update automatically.
Attachment 120747
Attachment 120748
EDIT (changed TiB to GiB)
Anvil,
Does your Z68 motherboard(s) have hotplugging and Advanced Link Power Management as UEFI options?
EDIT:
I looked at a couple Z68 Asrock manuals and didn't see those items mentioned. I don't know if your X58 has them.
I don't know if they are presently helping me or not, but the Mushkin didn't crash overnight. Unfortunately, I have to take it down for a reboot later, but I'm not getting my hopes up that it will run indefinitely without dropping out.
On the ASUS I can enable/disable hot-plugging, can't remember on the ASRock.
edit:
Will check on next boot.
Are you saying that disabling Hot-plugging helped?
(mine has been Enabled on the Asus and if it is an option it has definitely been enabled on the ASRock as well)
I'm pretty sure there are no such options on the X58. (Gigabyte X58A-UD7)
The Force 3 has been running for almost 32hours, I'll let it run through the night.
I don't know if it has helped yet, just that I've made it further that I did with the other mobo since switching the Mushkin to a secondary.
I looked at the Asrock Extreme 4 Z68 manual and didn't see it. Like my Intel DP67BG, hotplugging is enabled by default and there is no Aggressive LPM.
At this point all I can say is it increases the amount of time between disconnects (for me). Unfortunately I have to take the system down for a few minutes later as I'd like to see how far it gets uninterrupted.
Right now it's only been running non stop for 18hrs. It's actually slightly faster on the H67 as well, 2mb/s faster on average for the last 18hrs.
I'll let it run through the night, there virtually are no reads, just the MD5 and E3 is at 0.
@Christopher
I checked on the ASRock and there are no options on LPM or Hot plugging.
On the ASUS Z68 one can Enable/Disable SMART and Hot Plug. (SMART is a global option, Hot Plug is per drive)
Mushkin Chronos Deluxe 60 Update
05 2
Retired Block Count
B1 21 (Down from 25)
Wear Range Delta
F1 102285
Host Writes
E9 78711
NAND Writes
E6 100
Life Curve
E7 60
Life Left
Average 124.95MB/s Avg
261Hours Work (22hrs since the last update)
Time 10 days, 21 hours
Last 18.85hrs on 6gps port (Biostar TH67+)
11GiB Minimum Free Space 11500 files per loop, 12.9 loops per hour
SSDlife expects 15 days to 0 MWI
Attachment 120749
EDIT:
The Avg MBs just keeps creeping up. After 20Hrs it's gone from 124 to 125.02MBs. I was kinda under the impression that one Cougar Point chipset was the same as another (and boards too, except H67/P67 video and OC differences, should be very similar -- but aren't), but it just doesn't seem true given both my understanding of the SF2281 issues and my own observations. This uATX Biostar even boots absurdly fast too. If you blink, you'll miss it. The Intel DP67GB ATX is actually far more energy efficient though.
This is it, Force 40-A disappeared sometime today. The SMART data is unattainable. The drive is not read by the machine, SSDLife, CrystalDiskInfo, etc.
These are the final stats from Anvil's app:
156.52 hours (since changing to 46%)
240.8065 TiB written
57.86 MB/s
MD5 ok 193/0
weird, total failure?
Maybe one day it'll show up in another machine one day and the firmware can be reflashed/secure erase or something?
How many reallocated sectors did it have?
Maybe it's 'panic locked'? Does that happen to 1200s?
So many questions...
The last time I had access to the SMART data, it was still at 0 re-allocated sectors. I can try to see if the drive comes back, especially, see if it shows up in parted magic and maybe do a secure erase.
I mean, based on my recent experiences, you may want to try something as simple as unplugging the drive for a few minutes. I don't know if you have physical access to it or not (you may be remote accessing the testing system - I don't recall if you've mentioned it).
If you remove the power to the drive for a while, on powering it back on, or a manual power cycle -- all could all help as simple as a solution as it is.
The drive has been working flawlessly for quite a while, so I'd be surprised if it was really dead.
Maybe it's just gone on strike due to the harsh working conditions :shrug:
I think you will end up hitting it against the wall in more extreme cases.
LOL made my day !
Seems like the SSD obeys quantum mechanics effects. If it is being watched, it reacts differently. Tell you what this whole SF problem is a complete joke. 2 years and still not fixed is pretty laughable.
Surely this is just the effect of SF controller panic mode.
Kingston SSDNow 40GB (X25-V)
364.29TB Host writes
Reallocated sectors : 10
MD5 OK
34.75MiB/s on avg (~12 hours)
--
Corsair Force 3 120GB
01 92/50 (Raw read error rate)
05 2 (Retired Block count)
B1 48 (Wear range delta)
E6 100 (Life curve status)
E7 76 (SSD Life left)
E9 97381 (Raw writes)
F1 129780 (Host writes)
93.96MiB/s on avg (~45 hours).
power on hours : 375
SF provides a number of features that might not always be utilised or configured in the same way. (I.E. temp sensor). LED signals might be combined on one LED to show activity and a fault or they might be on separate LED’s. LED Fault and Activity may be configured as:
50% duty cycle blink, half-second period (ie HIGH 250msec, LOW 250msec, repeat)
50% duty cycle blink, one-second period (ie HIGH 500msec, LOW 500msec, repeat)
50% duty cycle blink, two-second period (ie HIGH 1 sec, LOW 1 sec, repeat)
50% duty cycle blink, three-second period (ie HIGH 2 sec, LOW 2 sec, repeat)
Fault: 100% ON
Activity: 100% ON (valid option only if “ACTIVE” means “PHY Ready”
If the drive has entered a panic condition (or other fault mechanism) the LED activity should behave in one of the ways described above.
A panic condition is related to a serious firmware event. Up to 4 “slots” are available for panic events to be logged. Once all 4 slots are full no further panic condition can be recorded. Personally I would ditch a SSD that entered a panic state.
When the 4 "slots" are all used, how are they cleared?
User fixable or?
It requires specialist hardware.
Your Corsair is dead?
M225->Vertex Turbo 64GB Update:
414.13 TiB (455.34 TB) total
1136.52 hours
8647 Raw Wear
117.97 MB/s avg for the last 64.32 hours (on W7 x64)
MD5 OK
C4-Erase Failure Block Count (Realloc Sectors) at 6.
(Bank 6/Block 2406; Bank 3/Block 3925; Bank 0/Block 1766; Bank 0/Block 829; Bank 4/Block 3191; Bank 7/Block 937)
Attachment 120760
The Corsair is still doing fine on the X58.
It is noticeably slower in 3Gb/s mode but it looks to be stable.
50 hours w/o issues points to that there is something with the drive in 6Gb/s mode or that the drive is having issues with the SnB chipset.
I'll leave it running a few more hours and then I'll update fw + move it back to the SnB rig, if it fails I'll try disabling Hot Plugging.
Attachment 120761
Attachment 120762
The drive just disconnected, so it took almost 51 hours on the X58.
Not sure what triggered it, a lot of activity on this rig but still it shouldn't disconnect.
I'm about to upgrade the fw.
1PB would be around 15890 P/E cycles at the current rate.
Since I had to take my desktop with the H67 down for a few minutes last night, the Mushkin's been running non-stop. So all I can say is that it's more stable on the H67, not completely stable. It might only last 40 or so hours uninterrupted -- I can't really say yet.
In the first few days I had the drive, as the boot device, it only crashed once every 30 hrs. After that I moved it to to a secondary position where disconnects were much more frequent -- every night (and every day). At least for the last two days when I check the system in the morning the drive is still working... that in itself is pretty nice, and if I can keep up the pace I'll pass the Force 3 in writes/host writes in a few days. With all the disconnects over night, then the testing in the much slower laptop, I was only getting about 8 - 9 TB a day.
What part of this H67 board is helping is impossible to say. It could be the disabled per-device hotplugging, or advanced link power management (also in the UEFI) disconnected. It could just be that the sata ports themselves are operating in spec or whatever. Who knows? It might not even be stable yet. I just don't understand what the hell is going on -- but I'll know more in a few days.
Seems to me this app is the most consistent/repeatable usage case for causing SF-2200 failure....
I believe so.
What I find most interesting is that most of the blame is placed on power saving chipset functions, like the ability to partially shutdown the drive. Why would a drive operating at 100% go into a power saving mode? I would assume it would happen when the system is idling, as is the drive -- not when the drive is interleaving writes across many devices -- when the drive uses the most power. It really seems like the drive just "gives up" after a while under endurance loads. So if you want to make sure your SF 2281 is actually stable, the best thing to do is endurance test it for months on end...
If power saving chipset functions are to blame, then a simple solution would be to disable all power saving options possible. This might explain why is so hard to reproduce and why is not affecting all the drives, as this is usually configured differently from one computer to another.
I just realized a funny thing... I have set my laptop on max power profile from the moment I have bought my SSD, as my CPU is undervolted and there is not a significant gain in keeping it running at lower frequencies. It might be a reason why I never experienced any issues
Why don't you enable power savings :)
Using these exact drives, it certainly looks that way.
Would be tempting trying one of the other new gen drives like the m4 but as B.A.T has had no similar experiences...
My drive has been running for almost 3 hours with the new fw (1.3.2) on the original SnB rig, let's see if the new fw makes a difference.
I formatted the drive as a test, next time I might SE the drive.
The fw upgrade reset the power on counter on my drive as well.
Maybe you just solved the SF issue : power saving features !? SF should pay up a big stash :p:
I hope it works out for the Force 3, but I'm more of the opinion that the motherboard matters the most. There may ne some edge cases where stability can be achieved some other way, but it seems like it works with some mainboards and doesn't work with others. A possible exception is if you have ALPM(very few boards have this) and hotplugging disabled in the UEFI... I'd love to test this some more but I'm going to see what happens as is. It's only been 40 hrs on this H67, and only 20 without a reboot. I think the mark to beat is 51hrs stable right? That's the longest number of consecutive hours either 2281 was active.
EDIT
The Avg MB/s has crept up to an astounding 127.28MBs while the wear delta has plunged to 16 from the mid 20's. That's a substantial increase in speed
@bulanula
Power savings (and what not) have been frequently mentioned on the OCZ forums, I've not tried it myself as it hasn't been an issue, until now.
@christopher
51 hours on the X58/ICH10R, I'll have to do a check on the AMD rig though, could be that the first runs were longer...
Like Anvil says the m4 has been performing without a hitch. The same can't be said of the Kingston V+100, but it has behaved nicely for a week now. On the other hand, I'm using AMD chipset but I don't know if that has anything to do with it.
Todays update
m4
590.0843 TiB
2166 hours
Avg speed 88.75 MiB/s.
AD gone from 18 to 13.
P/E 10300.
MD5 OK.
Still no reallocated sectors
Attachment 120772Attachment 120773
Kingston V+100
105.6310 TiB
417 hours
Avg speed 77.03 MiB/s.
AD gone from 98 to 93.
P/E ?.
MD5 OK.
Attachment 120770Attachment 120771
I just noticed BATs SSDLife warning says "too many bad cells and projected lifetime is less than one month".
I'm certain that both are false :yepp:
Lets hope so :)
SSDLife is just reading value 202
I know... just really not applicable.
I'm really, really impressed with it's performance.
EDIT
I emailed someone at OCZ to ask what the deal was with my brand-new Vertex Turbo 120. I asked if it was possible for a drive manufactured in the last few months to have 50nm Samung flash (if that was in fact what it was using). I posted the identity tool results of 081102. I said I had bought the drive a little over two weeks ago new from an etailer. It had the pre-May plastic chassis (doesn't fit in many laptops as the case dimensions are too large).
They sent me the datasheet for some Samsung NAND and replied, "hope this helps".
I'm not sure it does... on the bright side, it's nice that I can just ask random questions to people at OCZ.
Mushkin Chronos Deluxe 60 Update
05 2
Retired Block Count
B1 16 (Down from 21)
Wear Range Delta
F1 113531
Host Writes
E9 87533
NAND Writes
E6 100
Life Curve
E7 55 (Down from 60)
Life Left
Average 127.02MB/s Avg
(up from 124.95)
261Hours Work (22hrs since the last update)
Time 10 days, 21 hours
Last 23.64hrs on 6gps port (Biostar TH67+)
11GiB Minimum Free Space 11500 files per loop, 12.9 loops per hour
SSDlife expects 14 days to 0 MWI
Attachment 120781
Hi Johnw, have you tried reading from the Samsung recently? I’m not 100% sure but I believe current SSD’s use dynamic wear levelling rather than static wear levelling. Dynamic wear levelling excludes static data, which mean that the NAND with static data on your SSD would have no wear. It would be interesting to run the endurance test on another Samsung without static data to see how much longer it would last for.
After about 30 hours of consecutive running, the Mushkin disconnected -- but I actually think it was something I did. I ejected a USB HDD enclosure a few hours before, and then disconnected it. I left for a few hours and when I came back to the system CrystalDiskInfo wasn't running anymore. I tried relaunching it several times. After a few seconds it popped up while the Mushkin disconnected. When I had ejected the USB drive earlier, it didn't fully dismount which CDI doesn't like. Somehow in that condition, trying to relaunch CDI may have done it. What's weird is hotplugging is disabled in the UEFI, but RST still acted like hotplugging was enabled. So I removed the RST drivers, reverted to MSAHCI, and registry hacked the two SATA III ports to internal only. The disconnect could have been nothing more than coincidence, but I'm not so certain. Of course, the two other SSDs didn't go anywhere -- only the SF drive.
Kingston SSDNow 40GB (X25-V)
367.12TB Host writes
Reallocated sectors : 10
MD5 OK
35.60MiB/s on avg (~16 hours)
--
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 51 (Wear range delta)
E6 100 (Life curve status)
E7 75 (SSD Life left)
E9 103714 (Raw writes)
F1 138188 (Host writes)
107.07MiB/s on avg (~17 hours).
power on hours : 400
Both are running off the ASRock SnB Z68 rig.
It looks like I had forgotten to disable Security Essentials scanning on the X58, so, it could have had higher throughput.
(will test later)
@christopher
Most of the disconnects have happened in conjunction with me checking the status or some other heavy I/O (when using the computer)
It has happened more than once so I'm not sure what it could mean. All power savings are on on my systems, I'd be willing to try disabling those features if necessary.
Anvil,
I'm starting to think the drives either like your motherboard or they hate it. End of story.
One consideration is the fact that I used the OS installation from the DP67BG mainboard. Windows 7 doesn't protest much (Office needs reactivation though), but I'm just running out of ideas. I'm running it with MSAHCI with this H67 for a while. I don't think the disconnect I had was a coincidence, but I'm just going to try it the H67 msahci combo for a little while. It didn't work very well with the other board and msahci is 6mb/s slower avg as well. I'm pretty sure I caused it, but the drive is just really "sensitive". I'd had another similar circumstance with the other board too. I'll take a crash every thirty hours over every eighteen, but I won't be happy about it.
M225->Vertex Turbo 64GB Update:
420.32 TiB (462.14 TB) total
1149.38 hours
8760 Raw Wear
117.04 MB/s avg for the last 15.32 hours (on W7 x64)
MD5 OK
C4-Erase Failure Block Count (Realloc Sectors) at 6.
(Bank 6/Block 2406; Bank 3/Block 3925; Bank 0/Block 1766; Bank 0/Block 829; Bank 4/Block 3191; Bank 7/Block 937)
Attachment 120795
Yes, I tried and failed, then had a small catastrophe.
The Samsung met its "write death" on Aug 20, so I figured I would check it again for readability on September 20. I plugged the drive in and powered up my computer, but neither Windows nor the BIOS could see the Samsung SSD. I tried rebooting a couple times to no avail.
Then I powered down and disconnected the SSD, and brought it to another computer with an eSATA port and an external SATA power connectors. Disaster! When I went to plug in the power to the SSD, my hand slipped and bent the connector sideways, snapping off the plastic ridge of the SATA power connector. The metal pieces are still there (and still soldered to the PCB), but they are not stabilized by the plastic ridge. I found that it is still possible to get the SSD to power up by working a SATA power connector onto the metal pieces at the right position (they have a warp/bend to them that actually helps a little), but I am not certain that they are all making contact. But it is enough that when I "hot plug" the SSD, Intel RST notices something, although it never manages to mount the SSD.
I've been contemplating trying to repair the connector, but I have not yet come up with a good plan. Possibly I can super-glue the plastic ridge back on, but it is going to be difficult to get it lined up properly I think (it is in two pieces). I'm also thinking about trying to solder another SATA power connector on (if I can salvage one from a dead HDD), but there is a lot of solder there and if I get it hot enough to desolder, I am worried I might disturb some of the other components on the SSD PCB. So I haven't done anything yet.
Actually, if anyone reading this is experienced at this sort of thing, and would like to contribute to this thread, I'd be happy to send the SSD to you for repair and then you could keep it (if you are willing to try the read-only tests yourself), or send it back, whichever works best for you.
Johnw - I have a pretty high end forensics/data recovery lab over here :). A sata connector is very easy for me to repair. Furthermore, I actually have the ability to take the NAND chips right off the Samsung and try to read them directly with a specialized device to see how bad it is :). I will be doing that to my Intel when it finally dies.
I am in Canada though.
I'm not convinced about that, it's more like some SSD's are OK and some are not. (could be a combo of course)
My 240GB SF-2281's have never caused BSOD's, just the 60GB Agility and the 120GB Force 3.
(not sure about the Force GT 120GB, it might have had issues)
None of the 240GB drives have been used in Endurance testing though.
---
Kingston SSDNow 40GB (X25-V)
368.10TB Host writes
Reallocated sectors : 10
MD5 OK
34.43MiB/s on avg (~25 hours)
--
Corsair Force 3 120GB
01 92/50 (Raw read error rate)
05 2 (Retired Block count)
B1 52 (Wear range delta)
E6 100 (Life curve status)
E7 74 (SSD Life left)
E9 106187 (Raw writes)
F1 141480 (Host writes)
107.02MiB/s on avg (~25 hours).
power on hours : 409
Although the Samsung performed admirably I can’t help think that it should have flagged up a warning (via SMART) once a critical endurance threshold had been reached, which then switched the drive to read only after a warning period. At least it would then have failed gracefully.
According to JEDEC218A “The SSD manufacturer shall establish an endurance rating for an SSD that represents the maximum number of terabytes that may be written by a host to the SSD” It then outlines integrity conditions that the SSD must retain after the maximum amount of data has been written:
1) The SSD maintains its capacity
2) The SSD maintains the required UBER for its application class
3) The SSD meets the required functional failure requirement (FFR) for its application class
4) The SSD retains data with power off for the required time for its application class
The functional failure requirement for retention of data in a powered off condition is specified as 1 year for Client applications and 3 months for Enterprise (subject to temperature boundaries).
I’m really not sure why the MWI appears to be so conservative. Does it really represent a point in time when the endurance threshold to maintain integrity (according to JEDEC specs) has passed? The Samsung wrote over 3 ½ times the data required to expire the MWI. Are you really supposed to throw it away when the MWI expires?
It will be really interesting to see what One_Hertz can uncover on the condition of the NAND.
Anyway I came across an interesting paper from SMART Modular Technologies. This is the second time I’ve seen compressibility referred to as data randomness. Anyone know the issues related to why randomness of data is linked to compressibility?
All compression relies on finding some sort of pattern in the data, usually various kinds of repetition. Random data, by definition, has no pattern. Therefore, truly random data cannot be compressed.
Also, data that has already been highly compressed will no longer have patterns that can be exploited for further compression. That is what they mean by high entropy.
Todays update
m4
597.4063 TiB
2190 hours
Avg speed 88.82 MiB/s.
AD gone from 13 to 09.
P/E 10423.
MD5 OK.
Still no reallocated sectors
Attachment 120820Attachment 120821
Kingston V+100
111.9978 TiB
441 hours
Avg speed 77.11 MiB/s.
AD gone from 93 to 83.
P/E ?.
Attachment 120818Attachment 120819
Thanks for that John. So it looks like the XceedIOPS SSDs employ compression techniques. I wonder how SMT compression compares to SF on a like for like basis.
1. Mixed random write workload—Medium degree of compressibility, write operations aligned on 4K boundaries, random starting LBAs. With a 28% over‐provisioned XCeedIOPS SSD, the average Write Amplification is approximately 1.0.
2. Database write workload—Highly compressible data, write operations aligned on 4K boundaries, random starting LBAs. With a 28% over‐provisioned XCeedIOPS SSD, the average Write Amplification is approximately 0.75.
3. Video Server workload—Minimal compressible data, write operations aligned on 4K boundaries, random starting LBAs. Represents a generic worst‐case write workload. With a 28% over‐provisioned XCeedIOPS SSD, the average Write Amplification is approximately 4.0.
With regards to the Samsung, surely only enough blocks have to be bad that the drive can't enforce it's own ECC or data integrity scheme. Not every block or even 25% of blocks could be bad... right?
Anvil,
I have to think that a lot of SF2281 drives are just BSODs waiting to happen... I'm not sure how it is that some drives seem to have problems, but endurance testing seems to tease it out. I wouldn't be at all surprised to learn that SF just can't handle days on end of endurance loads, regardless of motherboard. However, I've seen marked improvement with the H67, if not complete rock-solid stability. With a normal desktop load, some motherboards and drives just don't work well together.
I really want another Mushkin Chronos Deluxe to play with, but I'm running out of systems to use in such a small space. I'd need a bigger apartment for another system. Might be worth it though.
The Force 3 disconnected 35 minutes ago and it was one of those times where it left the system completely frozen. (it's the 2nd time iirc)
(it froze in the middle of a loop)
So, it does not look like the new firmware has made any difference so far.
I'll probably secure erase it one of these days.
@christopher
Yeah, you better get a bigger apartment :p:
It's hard to tell, if the user pattern is to restart every day one might not get hit by the issue at all, too many factors.
In my case most disconnects are somewhere between 24-35 hours.
I can safely say that the freeze is easily reproducible on several systems.
There are still a few more options to try in my case, like disabling power saving, will give it a try.
Anvil, try animal sacrifice first -- power saving options second...
:)
Someone mentioned Voodoo science on the OCZ forums, so, still a few more options left. :ROTF:
You have to cut the head off the chicken very carefully...
I've been reading the OCZ forums frequently of late, including the "voodoo magic" thread.
Goat sacrifices aren't recommended with 3.20/2.13 FW... don't forget to properly clean the motherboard's aura by burning sage first, but only on the vernal equinox.
lol. we will live to see it fixed, or recalled.
hopefully.
OCZ is the poster child for spontaneous bluescreen interruptus. Only because its lonely at the top -- they must sell more 2281s than every one else put together. There won't be a recall because no one knows why it happens. As long as Sanforce can lay this at the feet of Intel, nothing is going to happen. The fact that other drives don't have the same problem is further indictment of Sandforce.
I still want another one....