PDA

View Full Version : AMD Updates Errata Spec



JumpingJack
02-25-2008, 07:24 PM
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.pdf

Published today....

Errata 298 is now documented, kudo's to Justapost for figuring it all out.... there was some initial ambiguity in how the fix was working was it 298 or 254, but they share the MSR fix for one entry.

Errata 319 is a good one to know, I was going bonkers why my Phenom would idle below room temperature.



The internal thermal sensor used for CurTmp (F3xA4[31:21]), hardware thermal control (HTC), software thermal control (STC), and the sideband temperature sensor interface (SB-TSI) may reportinconsistent values.

Potential Effect on System

HTC, STC and SB-TSI do not provide reliable thermal protection. This does not affect THERMTRIP.

NOTE: This will not affect the processor, it will not hit a thermal runaway condition... PROCHOT will still trip with the external sensor control.

informal
02-26-2008, 02:01 AM
Release date Sept. 2007 :confused: .No errata #309 in it,at least not in the file i DLed from the link Jack gave :shrug:

KTE
02-26-2008, 02:42 AM
Thanks Jack. The link is not working for the updated guide. ;)

There are many erratas, a simple reader of guides will see this.

I mentioned the temp. issue maybe a week back, it was in the 3.16 revision guide which software developers already have for a while. Some Phenoms are known to report inconsistent and inaccurate values, a user only needs to be honest and do basic testing to find out (such as measure your 10W TDP NB temperature compared to 95W TDP Phenom). As I said, I know from AMD that if you throw 1.45V on a stock Phenom CPU on air, it will not be actually idling below 40C at 20C heatsink ambients, not a chance.

The new BIOSes and AOD was working on fixing these the last I knew.

justapost
02-26-2008, 04:02 AM
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.pdf

Published today....

Looks like AMD changed the index page but not the revision guide. :)


Errata 298 is now documented, kudo's to Justapost for figuring it all out.... there was some initial ambiguity in how the fix was working was it 298 or 254, but they share the MSR fix for one entry.
I wonder why 298 became so public, 254 was known long before and disabling the tlv cache is required since then.
Can be 254 occures much more seldom than 298. However both have nothing todo with virtualisation as far as i understand the errata descriptions.

KTE
02-26-2008, 05:00 AM
They have another major errata they need to fix. Their RD790 clock generator has major problems.

justapost
02-26-2008, 05:20 AM
They have another major errata they need to fix. Their RD790 clock generator has major problems.
What type of problem? Do you have a source for it? :rolleyes:

KTE
02-26-2008, 05:25 AM
I have the experience and the evidence. Look at everyones posts around the web, look at guys claiming crazy speeds, look at my latest post in MSI thread. ;)

JumpingJack
02-26-2008, 05:36 PM
Release date Sept. 2007 :confused: .No errata #309 in it,at least not in the file i DLed from the link Jack gave :shrug:

Wow... they pulled and put the old one up.... it was fresh was fresh for a while, they rev.ed it 3.16 and dated it 2/25
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_15343,00.html you can see the HTM label here..

My cache files still had a copy so when I click on it on my main rig, it is still 3.16, just saved it will email it to you if you want.

Errata always contains useful info, not sure why they changed it..

EDIT: I just tried the link again, and it appears the right rev.


Jack

JumpingJack
02-26-2008, 07:43 PM
Can be 254 occures much more seldom than 298. However both have nothing todo with virtualisation as far as i understand the errata descriptions.

That is what I read into them, though I have not tried to setup any VMs to test the 'bug', my experience has been that all these errata are irrelevant... I have not seen any issues over 6 weeks now.

I have been running F@H extensively on it, I have also been running all the futuremarks, sisandra. Over this time, only two hard locks -- one I convinced myself was graphics related, the card eventually went bad. The other was likely software related... it is hard to tell. But all in all, the system is rock solid stable.

In these two particular errata though, they seem to be squarely related to resource sharing and opportunities in time where the cache can be mismanaged if one or more cores are out of sync...

justapost
02-27-2008, 03:41 AM
That is what I read into them, though I have not tried to setup any VMs to test the 'bug', my experience has been that all these errata are irrelevant... I have not seen any issues over 6 weeks now.

I have been running F@H extensively on it, I have also been running all the futuremarks, sisandra. Over this time, only two hard locks -- one I convinced myself was graphics related, the card eventually went bad. The other was likely software related... it is hard to tell. But all in all, the system is rock solid stable.

In these two particular errata though, they seem to be squarely related to resource sharing and opportunities in time where the cache can be mismanaged if one or more cores are out of sync...
Thank you for sending me the pdf JumpingJack.:up:

EDIT: Tried to DL on an other pc whom never accesed the AMD site and i'm still getting the old version of the pdf. Can be some faulty proxy cache between.

I contacted one of those people whom where affected by the tlb bug. He said he ran gromacs together with an other scientific simulation app on a single node.
I asked for some sort of test case and he answered that it happend during producrive simulations and he has no special test case to reproduce the errata occurence. :shrug:
I compiled linpack under debian and had it running on a single node and shared between the host and two xen-oss-guest systems.
Also i installed xen-express and ran prime95 on different flavours of windows guest osses together with an linux guest os running stress (simple stress test app under linux).
Had only one hangup whom was more likely related to the fglrx driver.



309 Processor Core May Execute Incorrect Instructions on Concurrent L2 and Northbridge Response

Description
Under a specific set of internal timing conditions, an instruction fetch may receive responses from the L2 and the Northbridge concurrently. When this occurs, the processor core may execute incorrect instructions.

Potential Effect on System
Unpredictable system behavior.

Suggested Workaround
BIOS should set MSR C001_1023h[23].


That register needs some further inspection. :)
I was looking for some registers with L3-timings, but the BKDG does not mention them. Those C001_10xxh registers look like a good startingpoint.

Did you notice that they changed the fix for the 254 errata, now it only sets bit #21 in the MSR c001_1023h register.

JumpingJack
02-27-2008, 08:15 PM
Thank you for sending me the pdf JumpingJack.:up:


You are welcome. EDIT: it is very odd, I tried downloading this from a colleague/friends computer and got the old rev, I come home and download on my computer and got the new rev. I then try downloading on a 'fresh' build, thinking maybe (not knowing exactly how Adobe reader may cache files) that it was a document posted for a short period, then pulled and I may have a cached document loading... not so, the fresh build never having seen the file also downloaded the 3.16 version... so this is very odd.



That register needs some further inspection. :)
I was looking for some registers with L3-timings, but the BKDG does not mention them. Those C001_10xxh registers look like a good startingpoint.

Did you notice that they changed the fix for the 254 errata, now it only sets bit #21 in the MSR c001_1023h register.

I leave it in your (obviously) capable hands -- you have already demonstrated the skill and mastery of CrystalCPUID to extract useful info :) ...

One thing I noted about AOD is that it changes other MSRs than just the two in question for errata 298, depending on yellow/red, elsewhere a review site stated that the difference was disabling some power saving features in leu of performance... not sure how correct, but I wonder if this is one of them.

Again, nonetheless, 309 sounds bad but, as is the case with all errata, the rate of occurence is likely so low that it would never exhibit itself or be noticed. (Linus Torvald posted at RWT his take on it when the C2D TLB errata hit the headlines -- paraphrased 'AMD/Intel subject CPUs to such rigorous testing before releasing them, it is unlikely most any of the errata are of any significance' -- http://www.realworldtech.com/forums/index.cfm?action=detail&id=80552&threadid=80534&roomid=2

When I dig through errata though I always find interesting info that is good to know, like the 319 errata .. ok, so temps are inaccurate ... what else is new, it also points out that the redundancy protects against thermal runaway which is nice thus asuaging any concerns.... my point -- publishing errata is not a bad thing. :)

jack

KTE
02-28-2008, 04:06 AM
One thing I noted about AOD is that it changes other MSRs than just the two in question for errata 298, depending on yellow/red, elsewhere a review site stated that the difference was disabling some power saving features in leu of performance... not sure how correct, but I wonder if this is one of them.Yes it does, Achim and I both have this verified in testing before. ;)

Performance difference is minor from yellow to red, if at all any (boost mainly enables extra performance enhancing algorithms affecting anything very heavily L3 cache dependent). MSR's would be changed as they hold the respective values which software manipulates. If you can use RWEverything or OllyDbug to write breakpoints you can see which MSR is affected by the button (with lengthy investigations).

KTE
03-06-2008, 11:49 AM
The guide 3.16 hasn't been intentionally pulled as we assumed, I asked AMD myself. It is up but there's been a mixup as 4 separate links host the same content. Revision guide can be downloaded here if you're looking for an AMD link: http://vincent.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.pdf

:)