Also effects single core A64 "DH-E6" too.
Also effects single core A64 "DH-E6" too.
Still using the ADA4400DAA6CD CCBWE 0517MPMW
No, it affects dual cores, both Toledo/Opteron (JH-E6) and Manchester (BH-E6), at 2.2GHz and higher.Originally Posted by NickK
No single cores.
See page 11.
But this is not likely to be triggered by any power saving stuff, as the problem only happens when going from FULL LOAD to STPCLK.
And there's no mention of "frying your CPU".
Last edited by terrace215; 06-21-2005 at 09:30 PM.
Looks like amd is going the way of intel. i'm sure their genius engineers will fix this in no time. i fully trust amd
.
...
![]()
Hector Ruiz... circa 2011
Your quite right - perhaps reading the forums at 5:30am isn't such as good idea!Originally Posted by terrace215
![]()
Still using the ADA4400DAA6CD CCBWE 0517MPMW
'Also effects single core A64 "DH-E6" too.' -- It effects
BH-E4 (only X2's)
and JH-E6 (X2's and Dual Core Opterons).
"wonder if this could be related to p4 dualcores cooking mobos due to power transients when going into thermal protection modes (its the motherboards at fault really since power transients are caused by the vcore vregs not being able to cope with the power requirements of the CPU).
Besides, if going from full load to all stop is that dangerous, then so is going from all stop to full load." --
I agree, it seems likely.
Personally I am very disappointed at the penny-pinching,
quality-corner-cutting philosophy of consumer electronics design including
motherboards. As a PCB designer I've always been inclined to select
PCBs with full ground plane & power planes for each major voltage when
designing boards a lot simpler and lower speed than 2GHz + PC motherboards!
It really boggles my mind that they'd have such a push to try to get PCBs
down to 4 layers for Socket 939 motherboards with PCI, AGP, DDR,
etc. etc. that's all quite high speed and high pin count.
I very much suspect that it's more of a matter of *luck* than *engineering*
that they work as well as they do. Even though STPCLK / STPGNT type
power throttling is an easy to generate extreme test case of power load
fluctuation from ~ Zero power to full load, I suspect that many software
situations involving CPU / I/O bursts do about as much variation of loading
on the time scale of nanoseconds to microseconds. If the EMI design and
power filtering, regulation, capacitors, PSU, etc. isn't up to the task of
providing clean transitions from STPCLK to Full Run I suspect it's not REALLY
stable running common software programs under stress / bursty conditions
either, and that's probably a major reason for things crashing / flaking /
burning out on many systems.
What's needed is just better quality motherboards, 6+ layer PCBs, more
conservative design margins of capacitors / power regulators, thicker PCB
power traces, etc.
I'm still trying to figure out all the common cases when this could happen.
I think from what I've read that
(a) STPCLK# can be set in chipset hardware semi-automatically
(e.g. via CPU overheat detection & then resultant clock-throttling to protect
the CPU's temperature but allow continued operation at a stuttered percentage
of normal clock operating time).
(b) STPCLK# signal is set by the chipset and can also be set via software
by causing certain chipset registers to be used to enter the STPCLK state.
(c) I think STPCLK# is the "hardware signal" (which can be set also by
software) that causes the STPGNT state in the CPU as an effect of its presence. I think STPGNT is what the CPU does when it sees STPCLK# active. This is a little confusing since these seem to get referred to as
if they're quasi-independent in some cases but often times there seems to
be a causal link implied.
(d) I've often seen it said that ACPI power state (C2) is what's effectively
in operation when the AMD CPU is in STPCLK# STPGNT state. I think
that's the usual case anyway from what I've read. I think the ACPI BIOS
writers might be at liberty to define what C1, C2, C3, S1, S2 etc. actually
do as long as they follow the overall rules of ACPI but I guess C2 is commonly
chosen to be the "STPCLK, STPGNT" mode.
(e) I've further seen it said that (S1) ACPI state is, for example, commonly
implemented using the STPCLK / STPGNT mode which would really mean
that in this case (S1) causes (C2) mode which is set up to work via STPCLK.
e.g. ACPI specification 3.0: pp 414:
"15.1.1.1 Example 1: S1 Sleeping State Implementation
This example references an IA processor that supports the stop grant state through the assertion of the
STPCLK# signal. When SLP_TYPx is programmed to the S1 value (the OEM chooses a value, which is
then placed in the \_S1 object) and the SLP_ENx bit is subsequently set, the hardware can implement an S1
state by asserting the STPCLK# signal to the processor, causing it to enter the stop grant state.
In this case, the system clocks (PCI and CPU) are still running. Any enabled wake event causes the
hardware to de-assert the STPCLK# signal to the processor whereby OSPM must first invalidate the CPU
caches and then transition back into the working state."
Given these findings it seems quite possible that, depending on the
BIOS involved, one could enter this problematic STPCLK# state from
"full speed" (>= 2000 MHz CPU clock) mode based on:
(a) BIOS / chipset / utility software thermal monitoring detection of
overheating on the CPU or other component.
(b) Any kind of power management / ACPI / APM type BIOS function or utility
that causes STPCLK# which might include (C2) or (S1) states depending
on your utility / BIOS settings.
I'm just not sure about "Cool N Quiet" I know there are tables of
power / clock / VCore settings and transition timings that it uses to
work. It's possible that some of those might use STPCLK to work; if
so I'd guess it'd be the ones that operate the CPU at "almost full speed"
but still reduced speed by some percentage like 88% or 75% of full speed.
At least on Intel chips this kind of clock dividing does indeed use
STPCLK to "burst" the clock between full speed and totally off in a given
duty cycle to achieve an average effective percentage throttling.
If AMD64 X2 CnQ uses this too then it's seemingly a risk for this
bug also.
What Socket 939 motherboards have the most robustly designed
Vcore power regulators with the most number of "phases"?
I've heard about 3-phase, 4-phase, 5-phase, etc. power converters
for Vcore generation but I don't know what common motherboards
for X2 / FX-53 / FX-55 Socket 939 really have the best regulator capacity.
Anyone know? How about for S-939 AGP boards?
AGP boards -- Asus A8V deluxe vs. MSI Neo2 plat vs. NVidia NF3 Ultra D?
Every time I see a reference to someone "over-volting" something to
"increase overclock stability" I sort of grimace and laugh. As an EE I know
that "too many volts" usually just fries ICs, and doesn't really help them
work "better/faster" in most ways. I think the MAIN effect of
"increasing the volts" to demonstrably increase platform stability is really
just to provide greater CHARGE / ENERGY into the "too-small" power filter
capacitors so that when there's a large fast-rising load like a high MHz burst
of I/O or computation that there's a better chance there'll be enough energy
in the little filter capacitors to sustain at least the minimally necessary
current / voltage to keep the electronics working correctly until the relatively
slow (in comparison) voltage regulators can squirt out more energy / current
to recharge the voltage on the filter capacitors up to the proper level again.
Thus what's really needed isn't MORE voltage, it's BIGGER / BETTER / MORE
capacitors and voltage regulators to provide more current / energy more
frequently to the CPU / chips when those burts of power are needed. If
this was done one should be able to run at 100% nominal Vcore / Vdimm / etc
and have as much of a overclock with no risk of overclocking causing
transient undervolting as is possible limited by the actual speed of the ICs.
More Vcore regulator phases with higher quality and amperage FETs per
phase should help this.
Evidentally this kind of STPCLK transient causing power regulator
instabliliy has been true for some time:
http://unixmafia.port5.com/news/00211001.html
"Overheating problems on some KT133(A) Motherboards, namely Asus
Last update: 2002-12-05
...The problem
The BIOS on several via KT133 and KT133A motherboards disable the HLT , STPCLK and STPGNT instructions of the processor (apparantly, some A7M266 boards have it too). These states are responsable for power savings (and heat production) under the APM and ACPI specifications. The boards that have these disabled therefor do not completely implement those specifications, although they do claim so in their propaganda.
This can have several reasons, but one of the most common is to hide the fact that a particular board has an inferior power-supply. Which seems to be the case with my Asus.
Asus uses a 2-phase power supply with 4 capacitators onboard, while most boards have a 3-phase supply with 6 capacitators. Especially the STPCLK instruction, which calls the C2 Power Management state of the Athlon processor, puts a heavy burden on the power supply, because switching between the lowest and highest power consumption can occur several times a second. The disadvantage of the C2 state is that it can interfere in realtime applications like video and audio, because it takes the processor a fraction of time to come out of it. Asus hides behind 'choppy audio' as a reason to disable both C1 and C2. If this were valid, why not supply a BIOS option to turn it on or off? "
I suspect this is the way it is happening.Originally Posted by synergy
In theory adding more voltage until the described affect creates a highly logarithmic increase in need for voltage shows how much "power" your board is capable of handling.
For example:
A processor scales well with voltage on one board, but not as well with another.
This implies that the board is at fault, and if placing an X2 onto the weaker board you may pass the maximum the board can deliver at even default settings (assuming first tested CPU was a single core A64).
This problem isnt exactly new either. Older KT133/KT133a boards only supported up to a certain CPU due to the power draw higher models required. Even KT266/266A had this problem. Some which stopped at a certain point were just because the manufacturer was too lazy to update the BIOS microcode (board was fully capable of going beyond "max" suggested CPU) but many (Epox 8kta3+, Asus A7V133, MSI...) couldnt handle it, even some running recommended CPU's couldnt.
What i'm trying to point out is that this is not a new issue at all.. and its not being hyped at all if you remember past experiences.
EDIT
I should mention that I very much dislike temperature protection that is dependant on the BIOS.
If a small IC were used to monitor PWM/Choke/CPU temperatures (accurately) and connected to the power switch (or a secondary one) so it could power off the machine even if the CPU had failed and was locked up this would be a much better solution.
better yet if a temperature resistant co-processor were placed on the motherboard or CPU die that would function and maintain power loads (high power resistor may be required intermittantly) to reduce transients when the "master" processor(s) failed we wouldnt have this problem at all..
And why does asus always skimp on parts? If memory serves correctly they completely removed the vref circuitry from their GeForce 3 boards feeding 3.3v direct to the I/O of the memory chips (not VDDR which can be taken as high as 3.8v at times with voltage modifications)..
Last edited by STEvil; 06-21-2005 at 10:48 PM.
All along the watchtower the watchmen watch the eternal return.
I think it's "underhyped" in that many motherboards have
very marginal quality / capacity design for their PCBs, voltage
regulators, and filters.
AMD designed the X2 so that it'd be compatible with the Socket 939
"design envelope" of heatsink and motherboard design so that it should
be supportable properly on all well designed motherboards and PSUs of
adequate capacity with only a BIOS change.
However the X2 *does* consume just a bit more power than ANY other
AMD Socket 939 CPU that has ever existed including the FX-53/FX-55 or whatever, and certainly uses a fair bit more power than a common
single core Athlon 64.
So as STEvil said, any motherboard that is "marginal" with respect to
stability / current / voltage regulation for ANY other AMD processor
even under overclocking conditions will perform even worse and more
marginally / unstably given the use of the even more power demanding X2.
The main issue that limits PCs stress test or overclock stability is probably
EMI (noise causing data corruption) and voltage / current limitations causing
glitches of the logic and undue stresses on the ICs and the integrity of the
electrical waveforms.
If one model of motherboard can consistently overclock a given CPU
to a higher frequency while using lower (e.g. stock / nominal) voltages than
another model of motherboard, the one that achieves the best overclock
at lowest voltages must have superior timing PCB layout and EMI / power
design.
This STPCLK issue is probably a decent "acid test" of a motherboard's Vcore
power supply regulator phases and PCB / capacitor layouts to keep the CPU
happy even under frequent and extreme load spikes in the case of working
when going from STPCLK mode to RUN mode. It wouldn't make a bad
"benchmark" type stress test, really, if someone actually made software to
check for corruption / glitches and to stimulate this "on off on off" behavior
many times a second.
Will it fry your CPU / motherboard? Well the CPU could get overvolted
if the Vcore regulators / traces "ring / spike too high" when the load
is suddenly shut off (STPCLK happens from full load).
It could also overvolt / ring too high when the CPU is stopped and all of
a sudden there's a huge load demand (STPCLK is removed starting running
again) and the regulator feeds the maximum possible voltage onto the CPU
to compensate for the detected undervolting due to greatly increased load.
At the least it's a high stress on the "pass current" and dynamic loading of
the MOSFETs, CAPs, and could cause these to fail quickly.
At the worst, it could glitch the CPU in a way that might fry it due to either
overvolting spikes or undervolting "reverse voltage" situation where VIO is
set to a normal level while Vcore is very much too low.
I agree with STEvil that the "over-temperature" protection design
isn't really something that is something that should normally occur
(since even overclocked X2s tend to run fairly cool), and that it's not
the best thing to rely on anyway. However if you DO have a fan
failure or over clock generation glitch or whatever that DOES cause the CPU
to run way too hot, you'll be in for a potentially nasty surprise
(possibly smoke, flames, melted solder, ruined CPU / motherboard) if
the "last resort" C2 / STPCLK thermal throttling does NOTHING because,
according to AMD's suggested workaround for the problem, the BIOS
just DISABLED the STPCLK mode!
And in a more common / serious situation, I am still unconvinced that:
(a) Cool N Quiet or other ACPI functions on BIOSs that *DON'T* disable
STPCLK mode cannot trigger this kind of crash inducing glitch if they're
trying to run the CPU at 90%, 80%, 75% or similar frequency since
in those cases one could still be at 2000, 2200 MHz clock and maybe using
STPCLK throttling to achieve the reduction.
(b) that other kinds of power saving states other than CnQ won't enter
C2 or other states that use STPCLK.
Furthermore even hypothesizing if STPCLK is disabled in BIOS,
even if CnQ can't cause this problem, any motherboard that'll fail
Vcore regulation BECAUSE of STPCLK issue is *STILL GUARANTEED*
to be unstable under stress testing of running normal software since
there is very likely *SOME* combination of software and I/O related
events that'll cause a similarly large variation in "low CPU activity to high
CPY activity" that'll cause too much of a Vcore current demand transition
and STILL crash / corrupt the PC. And THAT will be on a day-to-day
gaming / computing / overclocking basis what may cause system
crashing / flakiness.
Personally I want a motherboard that is well enough designed that
it CAN pass the STPCLK test at 2200+ MHz and not glitch!
I agree as with what you have said as well, synergy. However, we can't place all the blame on the mobo manufacturers' for cost-cutting. In my opinion, most of the blame should be placed on the consumer.Originally Posted by synergy
Why? Because in my experience the public at large is a bunch of cheap bas$%ds. People are ALWAYS complaining about how much something costs. They want the best quality and best best performance but they don't want to pay for it. Need proof? Overclocking. Long-time AMD users are probably also at fault too.
Now you can't blame anyone for wanting to get the best price/performance ratio (who doesn't?). No one wants to pay more than they have to, or more than they feel is fair. But having worked in the high-performance aftermarket auto industry for many years (anyone ever heard of Neuspeed?), then PC and Mac computer hardware, and now having years of experience as a Realtor in California I have dealt with literally thousands of people. Some don't mind paying the premium that quality usually demands, but most will attempt to grind you down.
Rant over.
No, I am not Xtreme but I do have a sweet machine!
Lian Li V1000 Plus II Silver w/Cougar CF-V12HPB Fan x 4 | Core i7 3930K | Thermalright True Spirit 120 w/2x Cougar CF-V12HPB Push-Pull | ASUS Sabertooth X79 | 4 x 8GB Patriot Viper 3 Intel Extreme Masters Limited Edition DDR3 1600 | MSI GTX670 Power Edition | 2x Raptor 300GB RAID 0 | LG BHLS20 Blu-ray | Corsair 850TX | Samsung SyncMaster 204B Silver | 64MB X-Fi PCIe| Logitech G15 Keyboard & G7 Cordless Mouse | Logitech Z-640 5.1 Speakers | APC Smart-UPS 1500 Black | Windows 7 Ultimate SP1
Bookmarks