I think that is correct, since the threshold is clearly labeled as 10 in the SMART attributes. With SMART attributes, a failure is defined as the normalized value reaching the threshold value.
Printable View
This will be my last update at 100%. I will be switching to 46% after this post.
585.77 hours
209.7130 TiB written
40.72 MB/s
MD5 ok
05: 0
B1: 119
E7: 10%
E9: 170816
EA/F1: 215808
F2: 384
Any news testing an SLC drive ?
I restarted the process on both SSD's an hour ago, no specific reason I just wanted to get ahead of the SF drive doing it's disappearance act. (It had been running for about 33 hours without any issues)
Kingston SSDNow 40GB (X25-V)
345.41TB Host writes
Reallocated sectors : 8
MD5 OK
35.43MiB/s on avg (~1 hour)
--
Corsair Force 3 120GB
01 94/50 (Raw read error rate)
05 2 (Retired Block count)
B1 35 (Wear range delta)
E6 100 (Life curve status)
E7 88 (SSD Life left)
E9 56554 (Raw writes)
F1 75376 (Host writes)
107.62MiB/s on avg (~1 hours)
Uptime 216 hours. (power on hours)
SSDLife estimates lifetime to be 1 month 17 days. (November 13th)
M225->Vertex Turbo 64GB Update:
371.99 TiB (411.69 TB) total :D Another one past the 400 TB mark.
1054.44 hours
7922 Raw Wear
118.41 MB/s avg for the last 64.09 hours (on W7 x64)
MD5 OK
C4-Erase Failure Block Count (Realloc Sectors) from 4 to 6.
(Bank 6/Block 2406; Bank 3/Block 3925; Bank 0/Block 1766; Bank 0/Block 829; Bank 4/Block 3191; Bank 7/Block 937)
Attachment 120533
The SF drive was disconnected when I got home, so, it ran for about 1 hour after restarting the app.
Looks like a proper disconnect is needed not just restarting the app.
After the crash early last night, it looks like it crashed again probably somewhere around 4am or 5am. The really annoying thing is, the system crashes and reboots, but the BIOS can't find the OS, so it just sits at a black screen with the error message until I get there to to physically turn it off/on again. After a soft reboot it still can't find the OS, so I have to physically turn the system off.
I was running it with 5 min GiB free space, so maybe it's hanging on the delete section. I was kinda trying to get that to happen, but only while I was sitting in front of the system.
The last time I checked before the overnight crash it had crept up to a little over 124MBs. I'm still really liking this drive. OWC makes a 240GB version of this drive but it uses the 2282 SF controller, and is probably the fastest drive out there at the moment.
I stopped testing for a while, and while the drive was idling just after a reboot, I noticed that Raw Read Error Rate is probably just a timer. It increases at the same pace while idling as it does under endurance loads. If you open CrystalDiskInfo and then just continually refresh the info with F5 you'll see it tick up every second.
m4 update:
Here is the last screenshot before the avg restarted the rig and I lost contact with it.
538.0406 TiB
1759 hours
Avg speed 91.11 MiB/s.
AD gone from 46 to 42.
P/E 9430.
MD5 OK.
Still no reallocated sectors
Attachment 120551Attachment 120552Attachment 120553
I have restartet ASU and will bring next update tomorrow evening.
Kingston:
I'm gone try a SE and put the SSD on the esata controll to see if the problem goes away.
Mushkin Chronos Deluxe 60
05 0
Retired Block Count
B1 17
Wear Range Delta
F1 47754
Host Writes
E9 36770
NAND Writes
E6 100
Life Curve
E7 83
Life Left
Average 124.87MBs.
118 Hours Work Time
12GiB Minimum Free Space
SSDlife expects 21 days to 0 MWI
Attachment 120554
With that new firmware, the M4 is really the best thing going at 64 and 128 capacities, but the fact than not a single a sector has been reallocated is maybe more impressive. I hope the Mushkin keeps up (still holding steady at 0, but it's real early in the game...). But doesn't the Force 3 have the exact same NAND in it? It's already got two reallocations.
EDIT
I called Mushkin to try and order another drive and they told me they're completely sold out and that they won't have any more drives out there for another couple weeks. I guess it isn't just me out here singing their praises. Of course, Mushkin/Patriot/OWC/FutureStorageUK/EdgeTech are using the same American assembly facilities making the same drives, so it might be hard to fill all the demand.
My guess is it has to do with burn-in procedures. It is possible to run some tests on the flash after the circuit board of the SSD is assembled, identify marginal blocks, and mark them failed. I'm pretty sure Samsung does something like this on the 470 SSDs, since SMART attribute 178 started at 72 (instead of 99 or 100) and then held steady at 72 until near the end of the SSD's lifetime (when it began rapidly decreasing). I think this indicates that a number of flash blocks equal to about 28% of the reserve count were failed before the SSD was shipped, with the result that all the other flash blocks were tested to be very robust.
Judging from the behavior of the m4, I'd guess that Micron did something similar with a burn-in test.
But maybe some of the other SSD brands do not fail the marginal flash blocks during burn-in. So some other models may have reallocated "sectors" sooner than SSDs that do fail the marginal blocks during burn-in.
Maybe the Micron *AAA and AAB flash (sync vs async) are just graded and binned in such a way that all of the premium flash is just 50% more awesome. That, or SMART doesn't reflect some reallocations, or initial bad blocks get zero'd out. If it turns out that the M4 really just doesn't have any bad blocks after all these writes, then Crucial deserves some sort of official award -- I know my perception of 25nm flash has certainly changed based on it's performance.
Christopher, have you been able to figure out why your system is having BSOD?
Does your drive have the latest firmware? Try to figure out what version number of SandForce's firmware equates to that version of Mushkin's firmware.
It's 3.20 FW, which should be the latest SandForce standard FW revision.
To be fair, I've not actually "seen" a BSOD. I just walk away for a while, come back, and the system has restarted... but not back to Windows, as the OS can't be found by the BIOS. I'm not sure whether it would actually cause a BSOD or the system would just spontaneously restart. I'm confident that it has nothing do with anything but some kind of massive TRIM induced lock up. Maybe I'll actually be sitting in front of the system the next time it happens, but I've upped the Min GiB free to 12, so we'll see if it makes a difference.
Another thing to consider is that sometimes the drive feels quite warm... so if it feels warm outside, it might be sizzling hot inside. Maybe the controller is overheating?
Based on what I was able to gather from the first time this happened, from looking at the numbers, it seemed that it happened just at the end of the loop. Unfortunately, the ASU running totals don't seem to update during the loops, but only after you quit ASU, so if it hangs you'll lose the totals back to the program launch. If the numbers updated durning the loop it might help to pin down when it occurs.
Factory shipped defects in NAND are perfectly normal and are not an issue as long as they are within certain parameters. With IMF NAND I believe an invalid block is one that contains one or more bad bits. Out of 4,096 blocks at least 3,936 must be available throughout the endurance life of the product. [Speculation; maybe that is why the MWI appears so conservative] Invalid blocks are identified following worst case condition testing in the factory and are marked 00h, which enables a bad block table to be created so that the controller can avoid using them.
I understand that it's normal, which is why I'm wondering whether Intel and Micron (possibly others) either have 1) some process to zero out factory identified bad blocks in SMART data OR 2) really good NAND processes that just generate 100% good blocks (or a process to effectively bin NAND). Only my Indilinx drives have bad blocks now, and that's because they came from the factory like that, and so I have to wonder which scenario it is.
I'm pretty sure it is (1), as I explained previously (except I don't think Intel does it, at least not the the extent that Samsung and Micron do). Assuming you are talking about the SSD level, and not the bare flash level. As I said in my previous post, I think it is done after the SSD circuit board has been assembled.
I guess Ao1 didn't see my post.
It is a rather arbitrary choice whether to include factory failed/reallocated blocks in the initial SMART attribute values. Samsung apparently chose to include it. Perhaps Micron chose not to.
I don't think it is necessarily "awesome" compared to the other SSDs. It is likely they have similar quality of flash, but some have just chosen to fail the marginal blocks early, and others have chosen to let the marginal blocks be used until they fail, which they tend to do earlier than the other blocks. Either way, the longevity of the SSD is primarily determined by the longevity of the vast majority of the flash blocks, not by a small number of marginal blocks that may or may not be counted as reallocated sectors during the early part of the SSD's lifetime.
Micron does have it as a SMART value. "Factory Bad Block Count" is present in the C300 and m4. It's a seemingly static number.
Cool.
I was looking for some CDI posts for the Microns. I gave away my M4 to a family member, so I don't have one to look at anymore.
So how many factory bad blocks did the Microns have? I don't remember seeing the full smart info, but I'm going to look back and see if it was posted.
Just for the record, is it irrelevant that a drive hasn't reallocated any flash after 1/2 a PB?
I don't think so, but I guess it's a matter of opinion. I just take that to mean that whatever flash wasn't factory flagged bad is really consistent.
Yes, so do I. The problem comes in when you say that the m4 is awesome because it has no reallocated sectors, while some other SSD is not so good because it does have reallocated sectors. That is a conclusion drawn from insufficient information. Most likely, the m4 had all of the marginal blocks failed at the factory, while the other SSD may not have gone through as stringent a test for bad blocks at the factory (or any test at all). Without knowing that, you are comparing apples and oranges.
I wasn't saying that the M4 is better because it has no reallocations, just that it has no reallocations and Micron rates the nand at 3000PEs (and I think that's pretty interesting, if not -- dare I say it -- awesome). You'd assume that some of the flash would die after a few thousand PEs and some would die after a few hundred, but that on balance most flash would be pretty resilient. I don't think a drive is inferior because it has some reallocated blocks after many, many GB worth of writes. That would be silly - that's what it's supposed to do. I do think the M4 is impressive just because you would expect it to[have many reallocations], and it doesn't, which in this sample size of one doesn't mean much, but would still tend to indicate that either it's going to last a long time or will start having a rash of reassignments all at once. The controller doesn't know when a block is getting close to failure, so it must be the flash right?
EDIT
I went back to the beginning and found the initial M4 CDI sceen cap. It looks like it had 58 factory bad blocks.
sounds like the crashing of the drive is a typical SF crash. Your description of the restarts is about on par with everything that i have read on the ongoing SF issues.
I poked around the interwebs, but I mostly saw descriptions of hard freezes/BSODing and not spontaneous resets, where the drive shows in the BIOS, but the UEFI can't find the OS (until a full manual power cycle). It just won't happen while I'm sitting in front of it. I feel pretty confident that under my typical desktop and laptop usage, I'd not encounter the issue. I made a post about it in the Mushkin forums, but that's probably not the place to go. Unsurprisingly, no one else there is encountering issues while writing 10+ TB a day :rolleyes:
The Mushkin is my sole experience with Sandforce, so I did some looking around while I was waiting on it, and found that most users on the OCZ forums were getting freezes and then BSODs. I'm looking around now -- but I'm thinking I should probably use another drive as the OS drive like Anvil is. Anvil said a few posts back that he was having a similar problem, but his drive is secondary. I'm using the Mushkin as the system drive while pounding it, which is probably just asking for trouble.
My post was based on an Intel NAND spec sheet. The raw NAND is subject to onerous testing at the factory. If a bad bit is detected the block is marked as bad. When the NAND is assembled into a SSD the controller maps out the bad blocks. It is possible for a SSD vendor to remove the defect value that the factory sets if it detects a bad bit. In addition a read disturb event might flip the value over time, however the defect map should prevent the SSD from using it, so in theory this should not be an issue.
When the SSD is assembled the bad blocks need to be mapped out. Maybe additional NAND testing is carried out, but I suspect only the functionality of the SSD as an assembled unit is checked. The defect list will of course grow over time. The controller picks this up and adds any defects to map.
I’ll check out the Samsumg/ Toshiba spec sheets later, but I believe it will be the same procedure. The ONFI specifications set out some of the above.
I'm 100% sure that this is the SF bug that leads to BSOD if the drive is a boot drive, if it's not the boot drive then it will not BSOD, it does need to be power cycled though.
The drive is powered off or it goes into a state where it stops counting hours, I noticed that the other day when my drive disconnected. It should have counted 4-5 hours but it stopped accumulation when it disconnected, about 1 hour after restarting the test.
It continues to accumulate writes, it's just that there are separate counters, one detailed and one with the summary.
The summary needs to be updated that's all, I can do that for you or you can wait for an update where you can do it yourself.
It's not an issue for other drives but it's very handy keeping the SF2 based drive as a secondary drive, until the issue is resolved that is.
Kingston SSDNow 40GB (X25-V)
348.03TB Host writes
Reallocated sectors : 8
MD5 OK
32.22MiB/s on avg (~25 hours)
--
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 36 (Wear range delta)
E6 100 (Life curve status)
E7 86 (SSD Life left)
E9 61910 (Raw writes)
F1 82510 (Host writes)
104.73MiB/s on avg (~18 hours)
Uptime 237 hours. (power on hours)
SSDLife estimates lifetime to be 1 month 17 days. (November 14th)
Anvil,
Regarding the disconnect, it's basically just like the drive gets turned off -- vaguely like unplugging the power to the drive while it's operating. I'm going to get it set up today as a secondary.
I've been watching it like a hawk since the last crash, hoping to catch it in the act. I'm not sure what else to try aside from dumping the Intel RST drivers.
Anvil, is there anyway to extrapolate more detailed time information from the smart data? Like power on minutes and seconds in addition to hours?
Typical SandForce controller behaviour. Just disappears out of the blue. That is why I choose to avoid SF controllers. Why no major vendor took on SF ? Only small rebranders like Corsair, OCZ, Mushkin etc. Because it is too untested etc. At least that is my opinion.
Everybody trying to make a quick buck with untested Sandforce controllers. Only controllers like Intel or Samsung are really tested properly by big companies. SF cannot afford that. With SF, the vendors ( OCZ mostly etc. ) and the customers test the drives so the profit is bigger. You are basically opting in to be guinea pigs with SF.
I expect he meant Corsair, not Crucial :)
Using it as a secondary drive is much less hassle, just make sure that you have quick/easy access to the drive for physical disconnect, otherwise there is no way but to reboot.
Yes, there is a way to read minutes and seconds, the OCZ Toolbox or smartmontools or maybe both.
To be fair to SF, I have no idea why this is happening and could have more to do with endurance testing. I've made a few adjustments and hopefully this will stop the crash behavior.
I may pick up another 2281 to use as a normal drive to see what happens -- the 2281 may be uniquely unsuitable for endurance testing in this fashion.
I decided to try uninstalling the Intel RST drivers... Maybe something will happen (or not). If anything it seems to be running a little faster, but time will tell. At the Mushkin forums it was suggested that disabling trim might stop the behavior, but that's a huge performance penalty... The drive just can't keep up and I lose about 16MBs, so that's a no go.
M225->Vertex Turbo 64GB Update:
381.21 TiB (419.15 TB) total
1068.49 hours
8046 Raw Wear
117.69 MB/s avg for the last 16.69 hours (on W7 x64)
MD5 OK
C4-Erase Failure Block Count (Realloc Sectors) at 6.
(Bank 6/Block 2406; Bank 3/Block 3925; Bank 0/Block 1766; Bank 0/Block 829; Bank 4/Block 3191; Bank 7/Block 937)
Attachment 120569
Come on, this behaviour ( sandforce sudden death syndrome to be precise ) was present all the way from SF1 to SF2 now. It is not related to any endurance test otherwise the other controllers being tested would have done the same. It is a SF specific issue that they just cannot be bothered to fix. SF is the fastest out there ATM but if you want reliability go Intel. This kind of stuff happens with fast SF2 drives. And they even dared to say all was fixed in SF2. What a joke !
Don't get me wrong, I'm not hating SF. After all, it really is the fastest controller out there and one of the most advanced ( easily gets typical 0.6 WA by compression ). The problem is with the vendors ( like OCZ ) messing the SF reputation and trying to shaft consumers making a quick buck. Somebody big like Intel should buy out SF and drop OCZ out the gate. SF + big company resources like Intel's or Samsung = pure win.
If uninstalling RST doesn't do it, maybe try it on the 3gbps port?
Disabling TRIM is an interesting idea, but for Endurance testing and also using it as your boot drive, IDK if it's so prudent; WA will go up and speeds will go down (and who knows how far down) and things could get really ugly for your system seat-of-your-pants-speed.
Yeah, after1 loop speeds had dropped dramatically. I shudder to think how bad it could get.
EDIT: I tried it on a Sata II port for a while, and speeds didn't go down but by 2 or 3mbs, but only for a few hours on the second day.
Well, as fate would have it, I had a first row seat to a BSOD not but an hour ago... but a completely different and repeatable cause.
I use my mouse and keyboard connected to the usb hub on my monitor. Last night I decided that I would turn the monitor off, just on the chance that the system would restart and be left sitting at a bios prompt for hours on end, that it wouldn't burn in my screen. So I turned the monitor back on, and while the system was recognizing the usb devices, I got a hang/freeze followed by a BSOD/spontaneous reboot. Unlike my previous reboots, the drive was booted without a full manual power cycle on reboot.
This sounds more like the SF BSODs I've heard about. I'm about to try to replicate it.
i agree in general, except a slight modification of my own:Quote:
SF + big company resources like Intel's or Samsung = pure win.
SF without DuraWrite) + big company resources Intel's or Samsung = pure win.
I agree, i feel the intel or the M4 will rule them all though :)Quote:
I expect all the SSDs to outlast the Samsung 470.
sorry for doublepost
replication would be fantastic! That is the main thing with the SF bug, it usually isnt replicable. Happens spontaneously. SF has stated that if someone can get a replicable BSOD with their drive that they will pay handsomely for the entire computer it happens on.Quote:
This sounds more like the SF BSODs I've heard about. I'm about to try to replicate it.
I have been thinking of late that it seems SF isnt really being totally forthcoming about the issue...all these seperate teams working on it and no one can find what the root cause is? get real!
Wondering if it were an inherent issue with the conroller, would SF be liable for all this mess (as in, mass recall, huge payments to mfrs?)
and if so...would they disclose it if they knew that it was their fault, and expose themselves to liability lawsuits? :shrug:
m4:
546.0762 TiB
2024 hours
Avg speed 90.89 MiB/s.
AD gone from 42 to 38.
P/E 9565.
MD5 OK.
Still no reallocated sectors
Attachment 120572Attachment 120573
Kingston:
I'm going to use the next 2 hours to SE and change controller.
There it went again...
Hard lock this time, no blue screen.
EDIT:
On the bright side, I've figured out what ASU write errors 0 and 6 are....
Just as an aside, and I swear this it true... not making this up... when my system hard locked, it killed my 802.11N router through ethernet...
I'm starting to wish I had the M4...
I got one whiff of the horsesh*t getting served up over at the OCZ SF2281 forums, and decided there's something rotten in Denmark. My experiences with OCZ's Indilinx forums were good, but the attitude really shifts once you look at the Vertex3 owners getting miffed at the questionable support practices and general lack of information. Seriously, one moderator (who shall remain nameless because he seems to be a decent guy) made a series of statements that blew my mind... Imagine all of these brand new Vertex 3 owners discovering that the shiny new drive they just shelled out for will cause constant BSoDs and prolonged irritation. Then, the one support outlet, a forum moderator, says that he knows 5 scenarios/configurations that will cause BSODing (implying that it's easy to replicate), and that he has a solution for three of them... but he can't tell you. Other statements, like it's probably all Intel's fault so complain to them, and that SF didn't know this was happening, but OCZ was the first to show them what was wrong weeks ago, and that Corsair sucks and wasn't the first to educate SandForce... all sorts of stuff that comes off really badly to (rightfully) concerned customers. Maybe all these other companies would have similar customer perception problems if they had been SFs high-profile, preferred partner. The new FW releases do seem to really help, and Mushkin's forums don't seem to be rife with SF BSOD problems... so maybe it is fairly limited now. At this point, I don't know enough to say whether a normal usage scenario would result in the same problems. I do know enough to understand that it must really, really suck to have to be the one person getting all of the customer complaints, but that doesn't excuse some of what you'll see if you dig in to the support forums.
I'm thinking there is something fundamentally flawed with the SF2281 and that after a few FW revisions it doesn't rear it's ugly head as often, but they'll never be able to solve it completely without a new processor and new firmware. Not to say that SF is acting in bad faith, but it's clear there's something going on.
I had a hard lock as well, totally unexpected. (20 min ago)
Write error results are standard windows error codes returned by WriteFile or CreateFile.
Well, I am very happy I chose to avoid SF right now. There have been fundamental problems with SF drives since the first generation. Just look at Newegg reviews. Even if they are not reliable, the failure rate is much higher than other SSDs. Too bad they have not ironed it out by the SF2.
The OCZ forums used to be my daily entertainment until a couple of weeks back when I realised that SF2 had not fixed these issues and the joke was getting old already. Just look at Newegg and other OCZ forum threads and you realise that SF has fundamental problems that are left unfixed. OCZ issues hundreds of FW updates but it still seems like the issues are present. Credit has to be given to some of the forum staff as they are doing all they can but the management should be fired, methinks. OCZ ( Sandforce is the problem here mainly as the Indilinx drives were quite decent after the inclusion of TRIM and lowering the WA ) SSD : free easter egg inside - random crashes and lockups FTW !
Kingston V+100 update:
I've done a SE, copied back all the data and set up ASU just as before the last dropout. I've also put the ssd on the Esata controller (Jmicron JBM36X) to make sure they don't interfere with each other. Here are an AS SSD test on that controller:
Attachment 120578
Here are the last update from Kingston V+100:
Attachment 120579
Kingston V+100
70.8428 TiB
? hours
Avg speed 74.28 MiB/s.
AD gone from 129 to ?.
P/E ?.
Only problem with the Jmicron controller is the lack of smart values. The Kingston won't show up in SSDLife Pro or CDI
I got it to crash twice in ten minutes, but can't get it to happen again.
Mushkin Chronos Deluxe 60 Update
05 0
Retired Block Count
B1 19
Wear Range Delta
F1 59206
Host Writes
E9 45616
NAND Writes
E6 100
Life Curve
E7 78
Life Left
Average 122.90MBs.
147 Hours Work Time
12GiB Minimum Free Space, 3 crashes today
SSDlife expects 19 days to 0 MWI
Attachment 120580
I'm going to make some changes to the configuration tonight. Unfortunately, it might not crash for days after this, or it could crash 3 minutes from now, but if I can get 72hrs of uninterrupted activity I feel like I can declare some small victory.
Updated charts :)
Host Writes So Far
Attachment 120583
Attachment 120584
Normalized Writes So Far
The SSDs are not all the same size, these charts normalize for available NAND capacity.
Attachment 120585
Attachment 120586
Write Days So Far
Not all SSDs write at the same speed, these charts factor out write speeds and look at endurance as a function of time.
Attachment 120587
Attachment 120588
Host Writes vs. NAND Writes and Write Amplication
Based on reported or calculated NAND cycles from wear SMART values divided by total writes.
Attachment 120589
Attachment 120590
Attachment 120591
Vapor,
Thanks for updating the charts... it's good to see actual progress :clap:.
Are normalized writes (host writes/available nand) using the amount of flash or usable amounts?
The two SF 2281 drives have about 12.8% spare area, while most others are ~7% (is the SF 1200 60GB is ~7%)? I couldn't find it when I looked back, but I remember seeing the information somewhere.
UPDATE:
I set the Mushkin up as a secondary drive about an hour ago, and it just disconnected... this is irritating.
Normalized writes are based on user available NAND...so for the Sandforces, that's 40/60/120GB.
Kingston SSDNow 40GB (X25-V)
350.91TB Host writes
Reallocated sectors : 8
MD5 OK
36.61MiB/s on avg (~15 hours)
--
Corsair Force 3 120GB
01 92/50 (Raw read error rate)
05 2 (Retired Block count)
B1 39 (Wear range delta)
E6 100 (Life curve status)
E7 84 (SSD Life left)
E9 68494 (Raw writes)
F1 91286 (Host writes)
105.53MiB/s on avg (~15 hours)
Uptime 261 hours. (power on hours)
SSDLife estimates lifetime to be 1 month 16 days. (November 14th)
Let's see how long it lasts before another disconnect, the one last night was very strange as it left the pc totally unresponsive.
It's still working :)
What is interesting is that Wear Range Delta has changed from 39 to 38, first time it has decreased.
Too bad the Intel X25-V is such a slow writer. We might have to wait a long time till it dies :)
It's not an issue at all imho, it's a low capacity SSD, one does not do much sequential writes at all and for random writes they do very well.
I've got 3x (1x Kingston and 2x Intel) running as boot drives, the one thing I do notice is that 40GB is on the small side and so I might have to replace one or two of them with a 60-64GB drive.
I moved the Mushkin last night to a SATA II notebook. I didn't have a Win7 image for it, so I had to install Win7 from a flash drive.
It took two attemps. The drive disconnected/became unresponsive half way through file expansion the first time. I gave it the benefit of the doubt and just assumed that it could have been something else (but not likely).
It's been running nonstop since, but at a greatly reduced speed -- 26MBs+ less average.
Anvil, I have seen the wear range delta go down with the Mushkin, but infrequently.
You'd have to pry my X25-V's out of my cold, dead hands. I love those things. They are perfectly capable of giving you the random performance and read performance to make a great boot drive, and are nifty when RAIDed. I tried to buy two more at an auction, but got out bid. I think they're a better boot drive than any Indilinx was at the time.
Kingston SSDNow 40GB (X25-V)
351.99TB Host writes
Reallocated sectors : 8
MD5 OK
34.91MiB/s on avg (~25 hours)
--
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 38 (Wear range delta)
E6 100 (Life curve status)
E7 84 (SSD Life left)
E9 71321 (Raw writes)
F1 95054 (Host writes)
106.29MiB/s on avg (~25 hours)
power on hours : 271.
@Christopher
I've checked where/when it stops in the loop and it is not any particular part, it does look to occur mostly during deleting files or within a few 1000 files after a new loop is started.
I've seen it happen at the end of the loop as well so it is not related to any fixed part, it's just looks to be more likely to happen when the drive is stressed.
If/when :) it disconnects again have a look in the TEST folder, there will be remains of the previous session.
(you'll know where it was by looking at the number of files and the file names)
Anvil,
Also you can tell just by the available free space.
The first three crashes did occur at the end of the loop during deleting, but after that the crashes were different as I've explained before. I've moved the Mushkin from the laptop back to the main rig, so we'll see...
Also, I initially tried using the Mushkin on one of the SATA II ports on my P67 motherboard and speeds were only a few MBs less. On a Core 2 Duo's SATA II ports, you lose >20%.
I have a good feeling about the drive's prospects from now on.
Update
m4:
553.9945 TiB
2049 hours
Avg speed 91.17 MiB/s.
AD gone from 38 to 34.
P/E 9696.
MD5 OK.
Still no reallocated sectors
Attachment 120603Attachment 120604
Kingston V+100
77.1746 TiB
? hours
Avg speed 73.26 MiB/s.
AD gone from 129 to ?.
P/E ?.
MD5 OK.
Anyone got a tip how I can extract the smartinfo when the Kingston runs from the Jmicron controller?
Mushkin Chronos Deluxe 60 Update
05 0
Retired Block Count
B1 18 (Down from 19)
Wear Range Delta
F1 67092
Host Writes
E9 51710
NAND Writes
E6 100
Life Curve
E7 75
Life Left
Average 123.44MBs.
171 Hours Work Time
12GiB Minimum Free Space, 0 crashes today
SSDlife expects 19 days to 0 MWI
Attachment 120605
one thing that would be a tiny bit helpful (if someone has the inclination to do so) would be a list of who is running which SSD, as i am getting lost on a few of them.
We need a chart!
A chart for the chart.
With the multiple entries, it could be useful. For instance, before Vapor's chart update, I didn't realize how many drives were actually running.
Anvil
Force 3 120GB
X25-V 40GB (Kingston flavored)
One Hertz
Intel 320 40GB
B.A.T.
M4 64GB
Kingston 100+ 64GB
Bluestang
M225 -> Vertex Turbo 64GB
Vapor
C300 64GB
SF-1200 nLTT 64GB
SymbiosVyse
Corsair Force F40-A 40GB
Christopher
Mushkin Chronos Deluxe 64GB
-- Retired --
johnw
Samsung 470 64GB
Those are the currently running drives, and owners, that I can think of off the top of my head.
SymbiosVyse doesn't ever put the name of the drive or an SSDLife/CDinfo shot, so I can never remember what he's running.
The Samsung's not running anymore, on account of being dead and all.
CT, Does that help?
Vapor, could you please label the charts some where with this info?
459.8TB. 4145 reallocated sectors. Erase failure at 96. Reserve space at 26.
I had another disconnect. Since it's a secondary drive there was no hang and I was just able to unplug the drive's power and plug it back in.
Is it possible turning off hot-swapping for Intel RST would be able to alleviate some of my problems?
I've heard it mentioned as a "fix" but not in a way that gives me confidence -- and at least now I can fix it without a reboot.
There is also some LPM stuff...dont know as i dont have any SF. but there is a long detailed process you might try over at OCZ.
I did look at the OCZ forums... I kinda ranted about it the other day.
I've seen it mentioned in the Mushkin and OCZ forum -- I think there's a BIOS setting for it, so next time I restart the machine I'll try disabling it.
EDIT
My Intel DP67BG doesn't have bios options. Disabling hot plug stops the drives from being treated as external drives according to OCZ. Furthermore, they claim Intel's chipset 'madness' is to blame, saying that not all P67 chipset 6gbps are the same, and that they don't manage power properly.
Sounds fishy as hell, but maybe there is something to it. What kind of power management do the drivers try to do besides turn the drive on or off? It does kinda act like an external drive at times. I'll see if I can disable hot plug through regedit.
Looks like the Intel BIOS just enables it by default -- which is good because OCZ says to enable hotplug. All my LPM stuff is fixed anyway in the registry... so that doesn't really leave much. It crashes more with the msahci drivers so that doesn't leave much that I can do besides try a different motherboard, or get a Marvel controlled PCIe adapter to put the drive in.
What is the next SSD victim ?
Kingston SSDNow 40GB (X25-V)
353.66TB Host writes
Reallocated sectors : 8
MD5 OK
--
This time it lasted for ~33 hours.
It disconnected again last night 07:01 CET, so about 6 hours ago and again while deleting files, this time all files were deleted.
I'll move it to another rig this weekend just to see what happens. (another Z68)
Corsair Force 3 120GB
01 120/50 (Raw read error rate)
05 2 (Retired Block count)
B1 40 (Wear range delta)
E6 100 (Life curve status)
E7 83 (SSD Life left)
E9 73936 (Raw writes)
F1 98538 (Host writes)
power on hours : 281.
I've lost about 18 hours in the past week to disconnects... It disconnected again again last night about an hour after I went to bed. The worst time that it can happen.
This weekend I'm going to swap out motherboards for an H67 and just hope that helps.
I might try one of the other (non SB/PCH) ports on the current rig, not sure what it is but it's one of the typical add-on controllers.
If that doesn't work out I'll try the lsi 9211, if SMART still works that is.
It's annoying alright but I'm not giving up on the drive.
No, I won't be giving up either, but that doesn't mean I'm at all pleased with some of the nonsensical explanations and voodoo magic solutions. I think this is how many of the Vertex 3 owners felt after the products' launch, and part of me just wonders if the some of the SF 2281 controllers are just faulty. If worse comes to worse, I'll swap it out under warranty and try with another... But that's the worst case scenario.
Until I switch to another motherboard, I've switched to a SATA II port, so we'll see if it helps. I tried it last week, but just to see what kind of speeds I'd achieve.
With the P67's 6gbps port I was getting 122 - 124 MBs avg.
With the P67's 3gbps port I'm getting 117 - 119 MBs avg.
With a C2D ICH8m SATA II laptop I was getting about 94 - 96 MBs.
I can live with 118MBs if it means I can make it through the night, but I'm not sure that Sata II ports are any less prone.
I would be interested to know if the issues are more hardware related or there's a combination of hardware-software (drivers with issues that occur under certain conditions). For ruling out software, you could install a Linux distribution and run any program that would generate continuous writes, as SMART parameters are already tracking everything. I know this might be a too far suggestion but it might worth trying.
Update
m4:
561.2012 TiB
2073 hours
Avg speed 88.88 MiB/s.
AD gone from 34 to 30.
P/E 9818.
MD5 OK.
Still no reallocated sectors
Attachment 120636Attachment 120637
Kingston V+100
80.6501 TiB
323 hours
Avg speed 79.01 MiB/s.
AD gone from 129 to 116.
P/E ?.
MD5 OK.
Attachment 120638Attachment 120639
While I was at work ASU reported errors and froze. I could open folders and files on the SSD but ASU had no response. I shut it down and tried to open it again. Now the startscreen came up but the program froze while trying to load information. I powered down the rig and tried a restart. Next problem now was that the JMicron optionrom couldn't find the Kingston so I disconected it and took a new restart.
I moved the Kingston til SB850 and got help for Anvil to restore the log and now all runs normal again. The MD5 check showed no errors so I don't know what caused all of this.
Here is a screenshot of ASU with the error message.
Attachment 120640
Kingston SSDNow 40GB (X25-V)
355.18TB Host writes
Reallocated sectors : 8
MD5 OK
36.97MiB/s on avg (~12 hours)
--
Corsair Force 3 120GB
01 90/50 (Raw read error rate)
05 2 (Retired Block count)
B1 42 (Wear range delta)
E6 100 (Life curve status)
E7 82 (SSD Life left)
E9 77332 (Raw writes)
F1 103064 (Host writes)
107.48MiB/s on avg (~12 hours)
power on hours : 293
@sergiu
Not very likely, it would be the last thing I'd try, if there is something to the SF/SB platform mix that stinks I'll make sure to test on other SB computers and then go down the X58 route as well before trying another OS/platform.
Mushkin Chronos Deluxe 60 Update
05 0
Retired Block Count
B1 19 (Up from 18)
Wear Range Delta
F1 74681
Host Writes
E9 57563
NAND Writes
E6 100
Life Curve
E7 72
Life Left
Average 116.41MBs. P67 SATA II
119 Hours Work Time
12GiB Minimum Free Space
SSDlife expects 18 days to 0 MWI
Attachment 120644
I've lost a lot of time with the issues I've had. I'm running it on SATA II until it crashes again, then I'm switching motherboards to see if it helps.
EDIT
I've heard that the Marvel controlled SATA III ports have problems too, but instead of switching to another 1155 board, I think I'll get a PCIe SATA III controller. If that doesn't work, I'll buy some kind of HBA. If that doesn't work, I'm flying to Sandforce HQ to yell at someone.
Looks like the SATA II ports don't help either; I just got my first disconnect on the P67's SATA II ports.
I've increased PCH voltage... not sure if that'll help...
Anvil, what kind of Z68 board are you using?
96.10 hours
228.8066 TiB written
57.86 MB/s
MD5 ok
05: 0
B1: 130
E7: 10%
E9: 186624
EA/F1: 235456
F2: 384
@christopher
It's currently running of an ASRock Extreme4 Z68, either I will change ports or it will run off an Asus M4E-Z for the weekend only.
@synbiosvyse
96 hours can't be right, that is just 4 days and you've been running for months? (or is it this session only)
Otherwise it looks great, no LTT and no reallocated sectors, one should think there would have been movements on 05 by now.
Looks like the M4 still has some way to go.
Kingston SSDNow 40GB (X25-V)
356.41TB Host writes
Reallocated sectors : 9 (from 8)
MD5 OK
35.08MiB/s on avg (~23 hours)
--
Corsair Force 3 120GB
01 95/50 (Raw read error rate)
05 2 (Retired Block count)
B1 44 (Wear range delta)
E6 100 (Life curve status)
E7 81 (SSD Life left)
E9 80407 (Raw writes)
F1 107163 (Host writes)
107.51MiB/s on avg (~23 hours)
power on hours : 304
Update
m4:
565.0567 TiB
2085 hours
Avg speed 89.57 MiB/s.
AD gone from 30 to 27.
P/E 9880.
MD5 OK.
Still no reallocated sectors
Attachment 120652Attachment 120653
Kingston V+100
83.8624 TiB
335 hours
Avg speed 77.14 MiB/s.
AD gone from 116 to 114.
P/E ?.
MD5 OK.
Attachment 120654Attachment 120655
There have been no problems since yesterday so I'm crossing my fingers that it will continue.
Yeah that's since I switched back to 46%. I had been running 0-Fill for a while at the beginning, so I ran 100% to help offset that. Personally I find it a little confusing that the app resets the timer even if you set it to count totals. So if you stop the app, it still continues to carry over the data written, but then it resets the timer. Have you considered changing this? Or perhaps at least have two timers, a global and session only.
Overall I've been very impressed with this drive. The speeds are still almost the same as day one, and no reallocated sectors. The MWI has been stable at 10% for a little while now. I will continue to monitor this and note when it drops.
WOW, imagine what the results of the C300 will be if the M4 got this far ! Really impressive. and no SF-like bugs of random dissapearances etc.
Dang she's going to crack 10k on that M4. 650TB no issues I'd say at least before it runs out on SSDLife Pro.
At least hes not testing a 128Gig drive... that thing would be going for months more!
That assumes the 34nm NAND is better, and maybe it is, but from what I've seen so far it might be a moot point -- anything's better than random dropouts.
I'm rapidly starting to lose all the goodwill I feel towards the Mushkin. Not that it's Mushkin's fault -- It's probably Sandforce's, but that doesn't make it any less annoying. It dropped out last night again. I'm not really sure that anything I try is going to help.
Mine is past 30 hours, I'm sort of expecting it to disconnect within a few hours. (that would fit most of the other disconnects)
I have seen this problem with the SF-1200's last year (on Corsair's). Disabling TRIM did not help in most cases. Have you tried disabling TRIM? I just don't think it will fix the problem. I think there's a worse underlying problem with the firmware.
I don't think it will either. I tried it for a little while, but from what I've seen with users with the exact same problem, it won't help. What I was saying was, if it did help but dropped avg speed 25%, it would still be a net gain. Every night, at some point after I go to sleep, it drops out. So I'm losing several hours every night. My backup AM3 rig only has SATA II, but I'm considering switching to it as well instead of another 1155 board.
I only started looking into TRIM because the first three crashes happened during the delete section of the loop. Since then it's happened seemingly at random. I'm not going to give up, but I'm thinking RMA. I'm also thinking about trying a Samsung 470 with double the spare area to see if it would last twice as long, but not until I get the SF sorted out.
It's not worth it to disable TRIM in my opinion.
You may have mentioned this before but what SATA controller are you using again? If you have an Intel SATA II (like an ICH7-ICH10 or something) then see if that works.
I would definitely return it and get a refund and get a whole new drive, or at the very least RMA it and get a new drive of the same one to see if there is any difference.
Yeah, I tried it on my P67's SATA II port. While I didn't lose much speed, it didn't help either, and disconnected just the same. Right now none of these drives are in stock and so I wouldn't think of RMAing until I can get another one. Plus, if I RMA it, and the drive gets tested it will work, so then they're going to send me the drive back. I'm not sure how best to work this out, but there is definitely something wrong.
Next time it crashes, I'm putting it in the AM3 SATA II ports on my backup system.
EDIT
If anyone has any ideas of another drive to test, I'm all ears.