All C0 Yorkfields have potential instability problem, confirmed by Intel [Archive]

Cronos

02-28-2008, 09:25 PM

From the last "Intel® Core™2 Extreme Processor QX9650 and Intel® Core™2 Quad Processor Q9000 Series Specification Update"
http://www.intel.com/design/processor/specupdt/318727.htm

Errata AV51

AV51. Front Side Bus GTLREF Margin Results Are Reduced for Die-to-Die
Data Transfers in Intel® Core™2 Extreme Processor QX9650, Which
Can Lead to Unpredictable System Behavior

Problem: In a synthetic testing environment, Intel has observed that some processor, chipset, and motherboard configurations may experience reduced Front Side Bus (FSB) voltage margin during some certain die-to-die data transfers. This combination of configurations and data transfers is rare. This lower voltage margin could lead to FSB data bit errors, which can lead to unpredictable system behavior.

Implication: When this erratum occurs, it leads to FSB marginality in the system during processor die-to-die transactions, which can lead to unpredictable system behavior. Intel has not observed this erratum with any commercially available software. Workaround: None identified.

Status: For the steppings affected, see the Summary Tables of Changes.

Fixed in C1.

I'd say, pretty good chance this errata can be triggered in Linpack 64b, though i have not tested it myself.

Warboy

02-28-2008, 09:54 PM

ouch

emoners

02-28-2008, 10:25 PM

guess waiting for the new quads will take longer than i thought .... :(

JumpingJack

02-28-2008, 10:32 PM

From the last "Intel® Core™2 Extreme Processor QX9650 and Intel® Core™2 Quad Processor Q9000 Series Specification Update"
http://www.intel.com/design/processor/specupdt/318727.htm

Errata AV51

AV51. Front Side Bus GTLREF Margin Results Are Reduced for Die-to-Die
Data Transfers in Intel® Core™2 Extreme Processor QX9650, Which
Can Lead to Unpredictable System Behavior

Problem: In a synthetic testing environment, Intel has observed that some processor, chipset, and motherboard configurations may experience reduced Front Side Bus (FSB) voltage margin during some certain die-to-die data transfers. This combination of configurations and data transfers is rare. This lower voltage margin could lead to FSB data bit errors, which can lead to unpredictable system behavior.

Implication: When this erratum occurs, it leads to FSB marginality in the system during processor die-to-die transactions, which can lead to unpredictable system behavior. Intel has not observed this erratum with any commercially available software. Workaround: None identified.

Status: For the steppings affected, see the Summary Tables of Changes.

Fixed in C1.

I'd say, pretty good chance this errata can be triggered in Linpack 64b, though i have not tested it myself.

Unlikely, this is a signaling problem within the package. http://download.intel.com/design/processor/datashts/31872602.pdf, GTLRef sets the reference level for the common signals on the FSB, as such if the GTLRef is marginal (i.e. not enough voltage -- it is a differential signal bus), then the die to die communciations can be disrupted. Intel's specs for the signal is +/- 0.1 volts depending on whether it is high or low... if a this voltage is marginal, the errata states that core to core comms can be interrupted.

Stressing the CPU with software does not trigger this.... pumping up bus speed and the associated voltage drop may ... i.e. if the board vendors do not give enough supply voltage, then the signal die to die can be a problem and would trigger the problem.

In my opinion, it would appear that Xbit had this original december rumor right:
http://www.xbitlabs.com/news/mainboards/display/20071221231218_Mainboards_Found_Guilty_of_Delaying _Intel_s_New_Quad_Core_Microprocessors.html

I serious doubt if you put this CPU into a high quality board you would see a problem ... at least that is my interpretation of the errata.

Cronos

02-28-2008, 11:24 PM

Stressing the CPU with software does not trigger this.... pumping up bus speed and the associated voltage drop may ... i.e. if the board vendors do not give enough supply voltage, then the signal die to die can be a problem and would trigger the problem.

You HAVE TO stress CPU with software to trigger the error. But you are right, the higher the FSB, the more likely this will happen. In fact, this may be one of the causes for relatively low FSB wall on many Yorkfields.

Lets wait for mass availability of newer C1 stepping, all Q9450/Q9550 are supposed to be based on C1. Another strong reason not to buy now super expensive QX9650.

hersounds

02-28-2008, 11:53 PM

sorry but what about wolfdale ? have bugs too? only the temp?

Leeghoofd

02-28-2008, 11:57 PM

Could this be that one of my QX9650 needs more juice at stock speed then the other to be stable, really 1.19 volts for 3ghz is really crap... it clocks up nicely to 4ghz at 1.38 but at stock it really worries me... Vid is the same for both 1.125...

antari

02-29-2008, 02:18 AM

I'd say, pretty good chance this errata can be triggered in Linpack 64b, though i have not tested it myself.

:rolleyes: Do you ever not talk about Linpack 64-bit? :stick:

Cronos

02-29-2008, 02:43 AM

:rolleyes: Do you ever not talk about Linpack 64-bit? :stick:

I'll stop as soon as everyone start using it :) I am selfish here - the more people will routinely use Linpack, the more reliable information i
will have.

MarlboroMan

02-29-2008, 05:10 AM

sorry but what about wolfdale ? have bugs too? only the temp?

no, its a bug between the core dies link. Wolfdales are single die, so there is no bug at all... (some high temps thou..)
Thats why Intel are NOT selling cheap quad-cores and a limited amount of X Edition.
On March 3 we will start to see some ES C-1 stepping chips benches...

The question is:
Will it be overclock monster? like when Q6600 series changed B-3 to G-0

The QX9650 C-0 stepping have some kind of low FSB wall, thats why you always see extreme benches with high multi and low fsb...

Description of Change to the Customer:

Reason for Revision: Correct the post conversion MM number
The Boxed Intel® Core™2 Extreme Processor QX9650 will undergo the following changes for the
C-0 to C-1 stepping conversion:
- New SSPEC and MM numbers for the converting product
- CPUID will change from: 0x00010676 to: 0x00010677
- C-0 package is pin compatible with C-1 package
- There are no changes to Electrical, Mechanical and Thermal processor Specifications.
- Intel anticipates no changes to customer platforms designed to Intel guidelines.

Customer Impact of Change and Recommended Action:

There are no feature set changes between the C-0 and C-1 steppings.
The Intel® Core™2 Extreme Processor QX9650 C-1 stepping will require motherboard
manufacturers to update their BIOS to support the new C-1 processor stepping. Minimal re-
qualification and/or validation is expected for the stepping conversion. Processor engineering
samples will be made available to channel motherboard manufacturers prior to general customer
availability to facilitate BIOS updates and validation activities.

QX9650 3 GHZ C-0 BX80569QX9650 S LAN3 894493 C-1 BX80569QX9650 S LAWN 897412

jonny_ftm

02-29-2008, 05:29 AM

This is all intel dirty marketting.
The rumors of Intel yorkfield delayed because of bugs were revealed early december by anonymous intel internal sources, just when AMD revealed the bug in their quadcore CPUs. AMD, released enough data on the errata for motherboard manufactures to implement a bios workaround. That workaround caused a great loss of performance, and contributed to the "death" of AMD. AMD, at least, were fair and honest. Also their bug appeared in only very rare circumstances.

Now Intel: they immeadiately OFFICIALLY denied any bug targetting the QX9650 CPUs and claimed the delays in yorkfields were due to the delay in X48 chipset and because of their roadmap. They firmly denied any bug affecting the QX9650. As a result, the QX9650 stocks were sold at unbelievable prices, their credit increased, while AMD was pushed to the abyssal zone.

Now, they come and claim: hey, we discovered a bug in the QX9650, but of course, we won't call them back.

The looser: customer
The winner: the rich intel. Sadely, customers are too poor to call them in justice. They would for sure loose the verdict

That's what happen when no more competition is on the market. Intel already did that with pentium (do you recall?), and now, they don't hesitate to do it again...

JumpingJack

02-29-2008, 06:51 AM

You HAVE TO stress CPU with software to trigger the error. But you are right, the higher the FSB, the more likely this will happen. In fact, this may be one of the causes for relatively low FSB wall on many Yorkfields.

Lets wait for mass availability of newer C1 stepping, all Q9450/Q9550 are supposed to be based on C1. Another strong reason not to buy now super expensive QX9650.

In this 'bug', not really ... it is not a logical error, it is a physical error. Certainly, if you stress with software the chance will increase, but this is a marginality in the signalling within the package ... it could happen just sitting there.

C'DaleRider

02-29-2008, 08:00 AM

OOooooooooo, the sky is falling....let's go hang Intel for lying.

Of course, what everyone is missing is that this erratum was documented by Intel on Dec. 20, 2007, so this is NOTHING NEW!!!! Then again, why not get outraged today by something that's been known for almost two months....and has been public for as long.

As for the "rumor" that was linked, if one spends $1000 on a cpu, why in the heck would one then cheap out and buy a 4-layered PCB motherboard, an admittedly cheaper motherboard, vs. a higher quality built 6-layered PCB motherboard? That's what I gleaned from the "rumor", that the cheaper built mb's have some signal noise problems with cheaper built motherboards:

"...issues with quad-core code-named Yorkfield processors occur on affordable mainboards that utilize 4-layer print-circuit boards (PCBs) and do not affect expensive platforms that are based on 6-layer PCBs."

"Since many mainboards based on Intel P35 chipset that are based on 4-layer PCB are already available and are utilized by large system vendors.... As a result, Intel has decided to slightly alter its chips so that they could work in existing infrastructure..."

I notice nowhere mentioned is any problem with X38/X48 based motherboards in that rumor....only P35 mbs.....

So, cry the sky is falling........cause it ain't. It's only hysterical anti-Intel fanbois crying about nothing new.........

Mav451

02-29-2008, 08:09 AM

Regarding the 4 vs. 6 layer, since my DFI board is 6 layer, then the problem shouldn't occur, even though I'm on P35 (not x38) right?

*actually this is all moot, if I buy a C1 anyway haha.

jonny_ftm

02-29-2008, 09:30 AM

OOooooooooo, the sky is falling....let's go hang Intel for lying.

Of course, what everyone is missing is that this erratum was documented by Intel on Dec. 20, 2007, so this is NOTHING NEW!!!! Then again, why not get outraged today by something that's been known for almost two months....and has been public for as long.

As for the "rumor" that was linked, if one spends $1000 on a cpu, why in the heck would one then cheap out and buy a 4-layered PCB motherboard, an admittedly cheaper motherboard, vs. a higher quality built 6-layered PCB motherboard? That's what I gleaned from the "rumor", that the cheaper built mb's have some signal noise problems with cheaper built motherboards:

"...issues with quad-core code-named Yorkfield processors occur on affordable mainboards that utilize 4-layer print-circuit boards (PCBs) and do not affect expensive platforms that are based on 6-layer PCBs."

"Since many mainboards based on Intel P35 chipset that are based on 4-layer PCB are already available and are utilized by large system vendors.... As a result, Intel has decided to slightly alter its chips so that they could work in existing infrastructure..."

I notice nowhere mentioned is any problem with X38/X48 based motherboards in that rumor....only P35 mbs.....

So, cry the sky is falling........cause it ain't. It's only hysterical anti-Intel fanbois crying about nothing new.........

Didn't see where you got your sources for affected motherboards based on PCB layers :confused:
It doesn't figure on intel errata sheets

Also, intel are just like any manufacture: when there's no competition, they put them selves in an abuser situation. And why the hell they didn't offcially announce it in december? Well, just to sell their chips as most would have waited for new revisions, dot

celemine1Gig

02-29-2008, 12:08 PM

The real question, instead of moaning and whining about how bad Intel is, should be:

Who has already encountered that stability issue at stock speed?

If you can name more poeple than your fingers can count, then it would perhaps remotely seem to be a real problem. ;) Think about it.

Do you really believe they would still be selling these CPUs if they had serious stability issues?

JumpingJack

02-29-2008, 04:57 PM

The real question, instead of moaning and whining about how bad Intel is, should be:

Who has already encountered that stability issue at stock speed?

If you can name more poeple than your fingers can count, then it would perhaps remotely seem to be a real problem. ;) Think about it.

Do you really believe they would still be selling these CPUs if they had serious stability issues?

Quite true, errata are always taken out of context by those who do not understand what they mean....

Intel has not observed this erratum with any commercially available software.

Essentially, they created it in the lab under specific conditions and cannot observe it using any commercial software...... it is a marginality in the package, if the board gives it enough voltage it will never be observed.

If you have a QX9650, just have a high quality board, sell any of that ECS junk.

btdvox

02-29-2008, 05:18 PM

K Most the stuff said here confused the crap out of me lol.
I have a QX 9650. It says C0 on CPUZ. I have it running on my EVGA 780i. I run it at 400 FSB ( 1600QDR). I dont think this is a "extreme high" fsb but its more than than the 333 stock fsb.

Will I encounter any of these issues? What are the issues anyways- i never got GTL refs and just left the volts to auto for GTLrefs.

Anyone who could shed some light on this would be much apprciated Im kinda worried now after spending 1K:(

btw when i search 780i - i see that the reference xfx motherboard is 6 layer PCB- so i am assuming the EVGA which is suppose to be identical is also 6 layer?

Cronos

02-29-2008, 05:30 PM

This errata has two implications.

First, the probability of error showing itself is considerably higher on low-end boards than on high-end ones.

And second, even on high-end board we can expect reduced FSB oc potential
from all C0 stepping Yorkfields, which in fact is supported so far by experimental data.

The possible good news for all those who are waiting for more affordable Q9450/9550, is that we can expect C1 stepping to be better FSB clocker. But this is only speculation for now, only the practice will tell.

Higher FSB is especially important for those who want to fully realize all DDR3 potential, as having fast DDR3 with slow FSB is totally pointless.

mrcape

02-29-2008, 05:34 PM

Low multis for the new, affordable quads is such a bummer, especially if they can't do 500+ fsb comfortably.

btdvox

02-29-2008, 06:07 PM

This errata has two implications.

First, the probability of error showing itself is considerably higher on low-end boards than on high-end ones.

And second, even on high-end board we can expect reduced FSB oc potential
from all C0 stepping Yorkfields, which in fact is supported so far by experimental data.

The possible good news for all those who are waiting for more affordable Q9450/9550, is that we can expect C1 stepping to be better FSB clocker. But this is only speculation for now, only the practice will tell.

Higher FSB is especially important for those who want to fully realize all DDR3 potential, as having fast DDR3 with slow FSB is totally pointless.

Couple of questions:
I still dont get what the error is. Is it the fact that it wont have a high FSB potential because of the FSB voltage problem?
What are these errors and would you know if you got one as in your Computer keeps rebooting or BSODs etc. ?

Secondly I dont see anywhere stating that it only has to deal with 4 Layer PCB's. Sorry to ask these questions but I think I am in the clear. Im only using DDR2 which a higher fsb would always be nice but im sticking with my 400 FSB. Which i dont consider to be very high as ive been using this setting for the last year and a half on my MB's.

Secondly If someone could show where it states it only affects 4 Layer pcbs that would be great, As that would rule me out pretty much ( as stated above dont care too much about achieving higher than a 400 fsb) as I have a 780i and seeing as the 680i was a 6 layer PCB i can only assume that the 780i is also LOL.

Funny thing is this errata has been out since dec 20th. but its the first im seeing of this- and it states many places that its only in experimental situations theyve seen this- Im assuming if my Chip was having issues i wouldnt be able to have it Prime95 stable for 8 hrs on both blend and Small FFT (Which is what i use to "test" for stability and have passed) ..

Thanks for any help in advance!

Cronos

02-29-2008, 06:29 PM

Couple of questions:
I still dont get what the error is. Is it the fact that it wont have a high FSB potential because of the FSB voltage problem?
What are these errors and would you know if you got one as in your Computer keeps rebooting or BSODs etc. ?

The margin in voltage level between GTL low and high states is too low in C0, which may lead, for certain FSB transactions, to CPU not properly recognizing 0 from 1 in data stream. As a result, data corruption may occur.
This may be greatly aggravated by the low-quality noisy PCB on some cheap boards and by FSB overclocking even on high-quality motherboards.

Actual manifestation will not necessary lead to any "eye" visible errors, as most errors are not visible and only most severe and rare ones lead to bluescreens/hangs.

JumpingJack

02-29-2008, 08:17 PM

K Most the stuff said here confused the crap out of me lol.
I have a QX 9650. It says C0 on CPUZ. I have it running on my EVGA 780i. I run it at 400 FSB ( 1600QDR). I dont think this is a "extreme high" fsb but its more than than the 333 stock fsb.

Will I encounter any of these issues? What are the issues anyways- i never got GTL refs and just left the volts to auto for GTLrefs.

Anyone who could shed some light on this would be much apprciated Im kinda worried now after spending 1K:(

btw when i search 780i - i see that the reference xfx motherboard is 6 layer PCB- so i am assuming the EVGA which is suppose to be identical is also 6 layer?

Actually, you could do us all a favor and run prime95 for 48 hours and let us know... :) The errata is not entirely clear, the xbit article seems to suggest the lower end 4 layer PCBs can cause the problem.

However, I doubt you will see anything... it has been produced in the lab, and all the 9650 reviews, even the overclocking reviews, all claim stability even for the 4+ GHz OCs.

Errata, as mentioned above, are almost always over blown. Don't take that as they don't exist, but just because an errata entry exists does not mean that you will lock up ever day.

JumpingJack

02-29-2008, 08:24 PM

Couple of questions:
I still dont get what the error is. Is it the fact that it wont have a high FSB potential because of the FSB voltage problem?
What are these errors and would you know if you got one as in your Computer keeps rebooting or BSODs etc. ?

Secondly I dont see anywhere stating that it only has to deal with 4 Layer PCB's. Sorry to ask these questions but I think I am in the clear. Im only using DDR2 which a higher fsb would always be nice but im sticking with my 400 FSB. Which i dont consider to be very high as ive been using this setting for the last year and a half on my MB's.

Secondly If someone could show where it states it only affects 4 Layer pcbs that would be great, As that would rule me out pretty much ( as stated above dont care too much about achieving higher than a 400 fsb) as I have a 780i and seeing as the 680i was a 6 layer PCB i can only assume that the 780i is also LOL.

Funny thing is this errata has been out since dec 20th. but its the first im seeing of this- and it states many places that its only in experimental situations theyve seen this- Im assuming if my Chip was having issues i wouldnt be able to have it Prime95 stable for 8 hrs on both blend and Small FFT (Which is what i use to "test" for stability and have passed) ..

Thanks for any help in advance!

So here is the story ... a rumor broke out end of november/beginning of december that Intel was delaying their quad launch. People speculated, there are wild rumors all over the net. Here is the Xbit article: http://xbitlabs.com/news/mainboards/display/20071221231218_Mainboards_Found_Guilty_of_Delaying _Intel_s_New_Quad_Core_Microprocessors.html

Here is another speculations:
http://www.engadget.com/2007/12/19/intel-to-delay-yorkfield-chips-because-of-amds-struggles/

Several stories were written, Xbitlabs has a unique story explaining that Intel discovered that if the FSB signal is marginal then the processor could lock up on lower quality boards, typically 4 layer PCB (el cheapo boards).

This makes sense... if the errata were a logic problem, then dual cores should suffer too, but for quad cores, there are two bus agents sharing the bus, think of it like you and your wife each using a hair dryer, run one hair dryer fine, but run them together .. pull down too much and trip a breaker. The extra core on the die adds an extra load to the bus, this will create a voltage drop... if the board does not have the margin for that voltage drop, then the voltages will not be enough to generate the appropriate differential to GTLref.

Your board is not in this class, you have no worries. You are getting worried over nothing. Question, does it boot? Have you run software on it?

Also, Intel publishes on a fixed monthly schedule... if any new errata are found, they will appear on the website under technical documents and specification updates: http://www.intel.com/design/processor/specupdt/318727.htm

If a new revision is created, it will be posted per the following 2008 schedule: January 16, February 13, March 12, April 16, May 14, June 11, July 16, August 13, September 10, October 15, November 12, December 10

You probably had not heard of it because the Inquirer has not picked up on the update yet.

Jack

btdvox

02-29-2008, 08:30 PM

Sounds way to finicky too me, Seems like its just a bunch of PR vs PR making statements because they found something and tried to test for a fault and found one in prob the most odd situation.
Anyways I can already tell you that i am running the Chip on my 780i @ 400 FSB and have already primed for a day. I have also overclocked and prime95'd for 8 hours on both blend and small fft.

To tell you the truth it was the most pleasurable and easy OC I have ever done. Pretty much because i knew what i wanted my 24/7 target to be and what volts i should try to input. Im stable on my 780i which has bad vdroop for any chip. Im running my FSB volt @ 1.4 and my CPU @ 1.475 (1.42 when idle in windows and 1.368 when under load) And have had no issues, i first tried at 1.45 but got a prime95 crash and then upped the voltage and am now stable.
I have almost the same results as AnandTech does on there website with the QX9650 and have really heard NOTHING bad ever about this chip (in fact read any of the reviews ive read about 10-15 and you'll see everyone raved about this Chip and how it overclocks) until this errata (which does sound like a certain spec' systems are having issues with. They ran there 9650 @ 4.2 GHZ with a watercool setup (which i am too PA 120.2 and dtek fuzion) and got temps of 68 at load- (Mine loads at 67 lol)

Data corruption occuring is something you can usually spot and know- And the errors are usually caught by windows or a dskcheck if its on the Harddrive,

So the fact that some of you are saying "you prob wont even know its happening" and there is a fault is very very unlikely and no offense if no one knows its happening and everything is working ok- It prob means everything is Ok.

Im not the only one with this Chip and someone stated they prob made a limited run- If you check X38 or 780i boards you'll see ALOT of people with QX9650's its not really limited... And of course alot of others have G0 Q6600s.

btdvox

02-29-2008, 08:33 PM

Actually, you could do us all a favor and run prime95 for 48 hours and let us know... :) The errata is not entirely clear, the xbit article seems to suggest the lower end 4 layer PCBs can cause the problem.

However, I doubt you will see anything... it has been produced in the lab, and all the 9650 reviews, even the overclocking reviews, all claim stability even for the 4+ GHz OCs.

Errata, as mentioned above, are almost always over blown. Don't take that as they don't exist, but just because an errata entry exists does not mean that you will lock up ever day.

I understand it doesnt exist; in fact hey prob only saw it with a certain setup or set of motherboards but If I get any lock ups because of this- I will be dealing it- If it was a e8400 or a q6600 i wouldnt care as much but i spend a grand on this Chip.
that brings my next point- If somehow i do see something which i doubt, and if things go under my nose which is almost high unlikely as problems occur when you "notice them" lol,
What does intel do about this?

btdvox

02-29-2008, 08:40 PM

So here is the story ... a rumor broke out end of november/beginning of december that Intel was delaying their quad launch. People speculated, there are wild rumors all over the net. Here is the Xbit article: http://xbitlabs.com/news/mainboards/display/20071221231218_Mainboards_Found_Guilty_of_Delaying _Intel_s_New_Quad_Core_Microprocessors.html

Here is another speculations:
http://www.engadget.com/2007/12/19/intel-to-delay-yorkfield-chips-because-of-amds-struggles/

Several stories were written, Xbitlabs has a unique story explaining that Intel discovered that if the FSB signal is marginal then the processor could lock up on lower quality boards, typically 4 layer PCB (el cheapo boards).

This makes sense... if the errata were a logic problem, then dual cores should suffer too, but for quad cores, there are two bus agents sharing the bus, think of it like you and your wife each using a hair dryer, run one hair dryer fine, but run them together .. pull down too much and trip a breaker. The extra core on the die adds an extra load to the bus, this will create a voltage drop... if the board does not have the margin for that voltage drop, then the voltages will not be enough to generate the appropriate differential to GTLref.

Your board is not in this class, you have no worries. You are getting worried over nothing. Question, does it boot? Have you run software on it?

Also, Intel publishes on a fixed monthly schedule... if any new errata are found, they will appear on the website under technical documents and specification updates: http://www.intel.com/design/processor/specupdt/318727.htm

You probably had not heard of it because the Inquirer has not picked up on the update yet.

Jack

Lol third post in a row - Thanks though for the info I get it now- if the voltage is too low obv from die to die then errors will happen, But Wouldnt running Prime or Orthos find these instabilities like they do for everything else?

My system was as stated above has been primed for while to make sure of a good 24/7 overclock as I dont care to "push" the limits of my chip lol.
I decided to run a FSB of 400 instead of 300 (to get DDR2-1200) - I had it originally as QDR 1200 (FSB 300) to get 1:1 ram but wanted a little higher FSB lol- Glad I didnt go up to 420-450 which was my original plan:) Though im sure id still have no problems haha.

Thanks for the help.

HDCHOPPER

02-29-2008, 08:45 PM

v-droop mods anyone?

JumpingJack

02-29-2008, 08:57 PM

Lol third post in a row - Thanks though for the info I get it now- if the voltage is too low obv from die to die then errors will happen, But Wouldnt running Prime or Orthos find these instabilities like they do for everything else?

My system was as stated above has been primed for while to make sure of a good 24/7 overclock as I dont care to "push" the limits of my chip lol.
I decided to run a FSB of 400 instead of 300 (to get DDR2-1200) - I had it originally as QDR 1200 (FSB 300) to get 1:1 ram but wanted a little higher FSB lol- Glad I didnt go up to 420-450 which was my original plan:) Though im sure id still have no problems haha.

Thanks for the help.

To answer this takes a bit of understanding of bussing technology. Real quick -- I am writing this assuming you are not knowledged in the topic, my intentions are not to be demeaning in any way.

Parallel busses are just as they state, everything goes parallel. You have one physical wire for each bit, some timing lines, and voltage lines. However, parallel busses have a few draw backs, they generate EMI (electromagnetic interference) and can result in cross talk -- i.e. one wire will create inteference the trace right next to it. As the frequency goes up and voltages to drive it goes up, the risk for cross talk goes up. Study your motherboard, and look at the traces on the top layer... you will see some zigzag, loop around, wiggles .. these are design tricks to do primarily two things a) match the impedance for one line to all lines in the bus and b) break up any potential EMI due to running to many or to far (in length) in parallel.

The best example, in fact, are hard drives. The older PATA drives at 33 Mhz started with 20 pin IDE ribbons, as tech progressed, and drive/chipset vendors worked to increase data rates, the frequency and EMI on the 20 pin ribbon was too high, hence they went to a 40 pin ribbon, the extra 20 lines are simply duplicates of each line to carry more signal, to avoid x-talk. Ultimately, the PATA inteface simply could not be designed with high enough reliability at higher ferquencies, and problems (other than x-talk), drove the industry to serial comms for HDs.

The Intel bus is a 64 bit bus, meaning they pin up at least 64 lines between chpset and CPU, each clock tick will therefore send 8 bytes of data... as Intel ramps up the FSB, they also must contend with x-talk so little annoyances like this do not surprise me. Recall above, I mentioned wiggles and zigzags .. this is one reason why some MB makers OC FSB clocks better than others, simply better electrical engineering in stabilizing the FSB signalling.

Now, going away from EMI and x-talk -- parallel busses do have the advantage of enabling multple bus agents easily. As opposed to Serial busses which can also have multiple bus agents, but it is harder to negotiate on those busses, so many high performance serial interconnects are simply point to point (i.e. SATA ports only give one drive -- port replication is hard and expensive). This also allows Intel to package 2 die within the CPU and have uniform access to memory (we can get into all the pro's and con's of an MCM in a different discussion). Nonetheless, adding multiple bus agents to the parallel bus, there also must be mechansims to manage bus contentions ... i.e. the bus mastering algorithms must also account for traffic on the bus from all agents.

What does this mean, even if your processor is sitting idle, activity is occuring on the bus to manage any requests or status of the bus agents. Hence, my statement, you do not necessarily need to stress the CPU for this errata in order to trigger a physical marginality in the bus signalling.

If you built your CPU, and idled it and and stressed it and it is stable... you have no worries. In fact, to you should expect a lock up now and then ... I have never had a computer that never locked up or BSODed... but if it did, I don't go off to errata assuming the CPU is bad... it could have been any number of problems, usually software or it could have been hardware. However, the frequency of issues is nothing that is abnormal and unless it does it every hour on the hour (well, I exaggerate) say every few days... a random lock up here and there is nothing to fret over.

EDIT: I dug through some old favorites links and found some cross-talk, it is infact a problem in many facets:
http://www.sigda.org/Archives/ProceedingArchives/Date/papers/2000/date00/pdffiles/06c_2.pdf
http://dropzone.tamu.edu/~jhu/teaching/elen689Spring05/CongASPDAC01.pdf
http://www.sigrity.com/papers/2005/s17p6.pdf (Intel paper discussing power validation, but also briefly mentions info on crosstalk)

Jack

HDCHOPPER

02-29-2008, 09:06 PM

THANK YOU JumpingJack !
thats why eye read these forums good info....

btdvox

02-29-2008, 09:07 PM

^^^ Thanks for the explanation.
Yeah ive had the PC Idle for about 6-8 hours with no issues or lockups (assuming you mean freezes and needs a restart) Ive had some petty things happen but they have explanations-
The only main issues ive had are Prime 95 crashes when testing for stability (Which i was surprised as they werent BSOD's or freezes) I havent had a Prime 95 core fail though where the test would come up with an error just it crashing- I dont think this has to do with the issue though as it was most likely my CPU not being stable and needing more Vcore lol.
I will be doing my final stress using Prime 95 Blend today to make sure everything is ok.
Thanks.
(Is it odd that Prime 95 crashes and doesnt give an error? I had that happen 3 times, expecting a certain error (such as a rounding one etc)

Cronos

02-29-2008, 09:19 PM

What does this mean, even if your processor is sitting idle, activity is occuring on the bus to manage any requests or status of the bus agents. Hence, my statement, you do not necessarily need to stress the CPU for this errata in order to trigger a physical marginality in the bus signalling.

I don't understand why are you insisting on the obvious and at the same time denies obvious.
In hypotetical absolute idle state (which is never happening in reality) you have your GTL voltage level always the same, zero or 1, does not matter. Sure there is some background noise, but the probability of data corruption due to wrong 0 from 1 recognition is practically nonexistent.

You have to pump some data through your FSB bus to make it constantly switching from 0 to 1 and vice versa to trigger the error, and for this you have to load your CPU with some job. More to this, to increase the probability of error, you have to run several processes simultaniosly and make them share common memory or anyway exchange data.

JumpingJack

02-29-2008, 09:22 PM

THANK YOU JumpingJack !
thats why eye read these forums good info....

I am by no means a bussing expert :) ... however, I have spent a bit of time reading up on this some time ago... quite embarrassingly actually. When the UltraDMA 100 HDs starting showing up I bought one, when installing it I simply grabbed the closest cable, a 20 pin 20 wire cable.

When I started it up, and measured the throughput, it ran at 33 ... scratched my head, forgot about it, then went back and rebuilt the computer... this time I grabbed the 20 pin / 40 wire cable (I should correct above, the 40 pin is actual a 2 conductor 40 line cable) ... wolla ... duh. :)

I was curious as to why this was the case.... I pretty much spilled my guts on all that I know. :)

David Kanter is much more knowedgable on this topic, he wrote a great bussing tech article here:
http://www.realworldtech.com/page.cfm?ArticleID=RWT082807020032 (CSI)
http://www.realworldtech.com/page.cfm?ArticleID=RWT011303183140&p=1 (Bus basics)

Jack

JumpingJack

02-29-2008, 09:24 PM

I don't understand why are you insisting on the obvious and at the same time denies obvious.
In hypotetical absolute idle state (which is never happening in reality) you have your GTL voltage level always the same, zero or 1, does not matter. Sure there is some background noise, but the probability of data corruption due to wrong 0 from 1 recognition is practically nonexistent.
You have to pump some data through your FSB bus to make it constantly switching from 0 to 1 and vice versa to trigger the error, and for this you have to load your CPU with some job. More to this, to increase the probability of error, you have to run several processes simultaniosly and make them share common memory or anyway exchange data.

The bus master will actuate the bus to keep status on the bus agents... sitting with the CPU does not mean activity is not occuring on the bus... just not much of it.

This errata is relating to a physical problem with the signalling, not a logical one... hence, software does not need to be running in order for the bus to be interrupted. That is all that I am saying... meaning, a small but finite probability exists that a marginal voltage on the bus will cause a hiccup.

Your bolded statement is false.

jack

Cronos

02-29-2008, 09:26 PM

(Is it odd that Prime 95 crashes and doesnt give an error? I had that happen 3 times, expecting a certain error (such as a rounding one etc)

Well, thats because not every error leads to rounding error, and there are so many possible errors that no single test programm can possibly trigger them all. What is worse, there is often no way to even detect every triggered error.

Cronos

02-29-2008, 09:32 PM

The bus master will actuate the bus to keep status on the bus agents... sitting with the CPU does not mean activity is not occuring on the bus... just not much of it.

Status is set/monitored by some constant levels, not switching. Only activity initiated by some agents leads to bus switching activity.

Anyway, this is not really important as absolute idle bus state is never happening.

JumpingJack

02-29-2008, 09:35 PM

Status is set/monitored by some constant levels, not switching. Only activity initiated by some agents leads to bus switching activity.

Anyway, this is not really important as absolute idle bus state is never happening.

Ok. ok... you're right, I am wrong... I learned something new. Thanks.

Cheers
Jack

btdvox

03-01-2008, 01:04 AM

Well, thats because not every error leads to rounding error, and there are so many possible errors that no single test programm can possibly trigger them all. What is worse, there is often no way to even detect every triggered error.

I just meant to say that Rounding is one of them, I have had many errors in the past with prime and orthos :P haha

anyways my last 3 questions for you guys and im done with the subject as talking about it makes me more anxious for no reason lol:

1) Do we know that the Xbit labs article is talking directly about the errata?
2) If there is such a error dont you think we would be seeing people talk about it other than this one errata, I have searched alot of the main sites i visit and not one hiccup about these chips other than some people put 1.6 volts in them and fried it. Litteraly i have not seen one about this error or that could be this error...
3) Im not use to reading these documents really, but people above stated there usually for a certain spec of people and this one in example was done only in the labs and not with any software... in the errata it self it states there is no fix and all there doing is creating a new revision so that it would work with lower end mobos so to state- will this fix the problem? lol I guess what im also asking not just for me but for everyone; What FSB speed would be too high? I see alot of review sites using 333 and 400 ( Anandtech uses 400 FSB on there qx9650 review with there x38)

Ok and lastly before you all are bored with my comments, it states any C0 yorkies- Isnt QX9770 a C0 yorkfield- according to toms hardware it is and it uses a 1600 FSB...
It looks like the errata was probably talking about older Motherboards and the VRM not being able to push out enough juice- leading to the FSB having failures not being able to push all the data through. (gtlref)

HDCHOPPER

03-01-2008, 01:11 AM

thanks for the links JumpingJack

JumpingJack

03-01-2008, 01:23 AM

I just meant to say that Rounding is one of them, I have had many errors in the past with prime and orthos :P haha

anyways my last 3 questions for you guys and im done with the subject as talking about it makes me more anxious for no reason lol:

1) Do we know that the Xbit labs article is talking directly about the errata?
2) If there is such a error dont you think we would be seeing people talk about it other than this one errata, I have searched alot of the main sites i visit and not one hiccup about these chips other than some people put 1.6 volts in them and fried it. Literally i have not seen one about this error or that could be this error...
3) Im not use to reading these documents really, but people above stated there usually for a certain spec of people and this one in example was done only in the labs and not with any software... in the errata it self it states there is no fix and all there doing is creating a new revision so that it would work with lower end mobos so to state- will this fix the problem? lol I guess what im also asking not just for me but for everyone; What FSB speed would be too high? I see alot of review sites using 333 and 400 ( Anandtech uses 400 FSB on there qx9650 review with there x38)

Ok and lastly before you all are bored with my comments, it states any C0 yorkies- Isnt QX9770 a C0 yorkfield- according to toms hardware it is and it uses a 1600 FSB...
It looks like the errata was probably talking about older Motherboards and the VRM not being able to push out enough juice- leading to the FSB having failures not being able to push all the data through. (gtlref)

1) No, I am inferring ... based upon the timing of the publication of the article and the appearance of the errata. Of all the rumors, several popped up why Intel pushed out the mainstream quads, this one makes the most sense.

2) Errata are always over-blown, people hear the word 'mistake' or 'defective' and the assume the worse. Intel stated in the errata report they have not observed it on any software... the AMD TLB errata was way overblown as well, all CPUs have errata .. AMD's K8 has something like 157 (some fixed some not), C2D has 90 or over 100 (have not counted in a while), this CPU (the 9650) has something like 67 and more will turn up no doubt. This does not make the CPU defective.... it simply means that under highly specific situations a very specific bit pattern can turn on an error, most errors are trapped by the MCA corrected and/or are benign, in some cases they can be fixed with a BIOS update and some microcode. Cronos is just trying to scare you.

These chips OC like a bat out of hades --- Cronos again was misrepresenting the data above... and they are achieving remarkable results:
http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3184 (probably the most thorough on the net)
http://www.legitreviews.com/article/583/11/ (4 Ghz/ 400 FSB stable)
http://www.techspot.com/review/75-intel-core2-extreme-qx9650/page6.html (4 GHz / 400 FSB stable)
http://www.ocworkbench.com/2007/intel/Core-2-Extreme-QX9650/b6.htm (4 GHz / 445 FSB stable)

3) Kinda see my response above, when AMD or Intel prepare a new CPU for launch they under go a battery of tests, which are very thorough ... in fact, there is a whole field of study dedicated to sort, test and validation of these type of devices, you can be very sure that when each company is ready to stamp their name on it then it has been thoroughly put through the ringer. Once launched though, this does not stop them from testing and analyzing the CPUs and as they discover the highly specific test cases, then they publish the errata ... when they do discover a problem, they estimate the probability that the errata could occur ... if they feel it is necessary they will fix it in a new stepping but that does not necessarily mean that the existing product is bad.... it is sorta like software, you may use MS Excel and discover a bug... does that mean MS recalls Excel?? or you can't use Excel.. of course not, you just know not to do that particular series of things again to avoid the bug.

Errata should not bother you, in fact, I always look for errata and check them often, many times giving good info on things that help me understand what the heck is going on (for example, AMD just - finally - updated their errata, and 319 now helps me make sense of my temps).... Errata are simple obscure problems, some more severe than others, but for the most part should not be weighed into a panic mode.

Jack

JumpingJack

03-01-2008, 01:24 AM

thanks for the links JumpingJack

You are welcome.

btdvox

03-01-2008, 02:06 PM

^^ JumpingJack thanks alot for your knowledge-
It seems the Xbit article is talking about of the issues on that errata specifically- And it would make sense for them to delay there mainstream chips so they work with older mobos. But As the article stated- Why would anyone buy a $1000 QX9650 and have an old mobo that technically was stated not to work with Yorkies in the first place lol.
Anyways makes sense- And yeah i did alot of research b4 i bought the chip and about 10 review sites got AMAZING OC' and all of them stated they did 15-24 hr testing, even new motherboard reviews usually use QX9650 and test them with it.

I can be one to go along with it as I have mine at 4.2 (400X10.5) and it runs like a dream, in fact funny enough i have the exact same temps as Anandtech (68 @ load.)

I know alot of these things get blown out of proportion because some people want to make it seem worse then it is- But I didnt know there were so many erratas! haha.
The thread should state more specficially all yorkies have instability problems with older 4 layer PCB mobos lol and if your pushing out 475-500 FSB (which would have instability issues anyways as these chips dont go that high as seen from reviewers....)

Thanks again jumpingjack. Im now 8 hr prime stable on Blend and Small fft- :)

-iceblade^

03-01-2008, 06:06 PM

ha ha... at least AMD are honest about their flaws...

btdvox

03-01-2008, 07:16 PM

ha ha... at least AMD are honest about their flaws...

Youd be a fool to think AMD's PR is any different than Intels.

All companies run the same- they want one thing only- your money.

OBR

04-08-2008, 09:59 AM

from good source i know, this problem is not fixed in C1 revison ...:mad:

BenchZowner

04-08-2008, 11:15 AM

from good source i know, this problem is not fixed in C1 revison ...:mad:

Problem is... I've never faced this issue, and every single guy that I know with a QX9650/QX9770 has yet to face this issue.

I'm not bothered at all.

camouflage

04-08-2008, 12:23 PM

:D I'm happy with mine - sometimes the C0 rev. shows the "low voltage bug":p: :

http://img257.imageshack.us/img257/8111/48ghzgq1.jpg

:up:

tenax

04-08-2008, 03:44 PM

well, if there's truly a quad bug that can cause major disfunction, my ES 9450 with a B1 stepping (only B1 i've seen) should have been hit with it i'd think. i've seen enough data on the 9450 and the xeon equivalent now to say that aside from the fact my sensors are all locked, i can overclock my fsb as much as the next guy without errors. (and running 24/7 stable since i built my system in early january.)