Results 1 to 13 of 13

Thread: AMD Updates Errata Spec

  1. #1
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978

    AMD Updates Errata Spec

    http://www.amd.com/us-en/assets/cont...docs/41322.pdf

    Published today....

    Errata 298 is now documented, kudo's to Justapost for figuring it all out.... there was some initial ambiguity in how the fix was working was it 298 or 254, but they share the MSR fix for one entry.

    Errata 319 is a good one to know, I was going bonkers why my Phenom would idle below room temperature.


    The internal thermal sensor used for CurTmp (F3xA4[31:21]), hardware thermal control (HTC), software thermal control (STC), and the sideband temperature sensor interface (SB-TSI) may reportinconsistent values.

    Potential Effect on System

    HTC, STC and SB-TSI do not provide reliable thermal protection. This does not affect THERMTRIP.
    NOTE: This will not affect the processor, it will not hit a thermal runaway condition... PROCHOT will still trip with the external sensor control.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  2. #2
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Release date Sept. 2007 .No errata #309 in it,at least not in the file i DLed from the link Jack gave

  3. #3
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    Thanks Jack. The link is not working for the updated guide.

    There are many erratas, a simple reader of guides will see this.

    I mentioned the temp. issue maybe a week back, it was in the 3.16 revision guide which software developers already have for a while. Some Phenoms are known to report inconsistent and inaccurate values, a user only needs to be honest and do basic testing to find out (such as measure your 10W TDP NB temperature compared to 95W TDP Phenom). As I said, I know from AMD that if you throw 1.45V on a stock Phenom CPU on air, it will not be actually idling below 40C at 20C heatsink ambients, not a chance.

    The new BIOSes and AOD was working on fixing these the last I knew.

  4. #4
    Xtreme Addict
    Join Date
    Sep 2007
    Location
    Munich, DE
    Posts
    1,401
    Quote Originally Posted by JumpingJack View Post
    Looks like AMD changed the index page but not the revision guide.
    Quote Originally Posted by JumpingJack View Post
    Errata 298 is now documented, kudo's to Justapost for figuring it all out.... there was some initial ambiguity in how the fix was working was it 298 or 254, but they share the MSR fix for one entry.
    I wonder why 298 became so public, 254 was known long before and disabling the tlv cache is required since then.
    Can be 254 occures much more seldom than 298. However both have nothing todo with virtualisation as far as i understand the errata descriptions.

  5. #5
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    They have another major errata they need to fix. Their RD790 clock generator has major problems.

  6. #6
    Xtreme Addict
    Join Date
    Sep 2007
    Location
    Munich, DE
    Posts
    1,401
    Quote Originally Posted by KTE View Post
    They have another major errata they need to fix. Their RD790 clock generator has major problems.
    What type of problem? Do you have a source for it?

  7. #7
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    I have the experience and the evidence. Look at everyones posts around the web, look at guys claiming crazy speeds, look at my latest post in MSI thread.

  8. #8
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by informal View Post
    Release date Sept. 2007 .No errata #309 in it,at least not in the file i DLed from the link Jack gave
    Wow... they pulled and put the old one up.... it was fresh was fresh for a while, they rev.ed it 3.16 and dated it 2/25
    http://www.amd.com/us-en/Processors/..._15343,00.html you can see the HTM label here..

    My cache files still had a copy so when I click on it on my main rig, it is still 3.16, just saved it will email it to you if you want.

    Errata always contains useful info, not sure why they changed it..

    EDIT: I just tried the link again, and it appears the right rev.


    Jack
    Last edited by JumpingJack; 02-26-2008 at 07:41 PM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  9. #9
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by justapost View Post
    Can be 254 occures much more seldom than 298. However both have nothing todo with virtualisation as far as i understand the errata descriptions.
    That is what I read into them, though I have not tried to setup any VMs to test the 'bug', my experience has been that all these errata are irrelevant... I have not seen any issues over 6 weeks now.

    I have been running F@H extensively on it, I have also been running all the futuremarks, sisandra. Over this time, only two hard locks -- one I convinced myself was graphics related, the card eventually went bad. The other was likely software related... it is hard to tell. But all in all, the system is rock solid stable.

    In these two particular errata though, they seem to be squarely related to resource sharing and opportunities in time where the cache can be mismanaged if one or more cores are out of sync...
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  10. #10
    Xtreme Addict
    Join Date
    Sep 2007
    Location
    Munich, DE
    Posts
    1,401
    Quote Originally Posted by JumpingJack View Post
    That is what I read into them, though I have not tried to setup any VMs to test the 'bug', my experience has been that all these errata are irrelevant... I have not seen any issues over 6 weeks now.

    I have been running F@H extensively on it, I have also been running all the futuremarks, sisandra. Over this time, only two hard locks -- one I convinced myself was graphics related, the card eventually went bad. The other was likely software related... it is hard to tell. But all in all, the system is rock solid stable.

    In these two particular errata though, they seem to be squarely related to resource sharing and opportunities in time where the cache can be mismanaged if one or more cores are out of sync...
    Thank you for sending me the pdf JumpingJack.

    EDIT: Tried to DL on an other pc whom never accesed the AMD site and i'm still getting the old version of the pdf. Can be some faulty proxy cache between.

    I contacted one of those people whom where affected by the tlb bug. He said he ran gromacs together with an other scientific simulation app on a single node.
    I asked for some sort of test case and he answered that it happend during producrive simulations and he has no special test case to reproduce the errata occurence.
    I compiled linpack under debian and had it running on a single node and shared between the host and two xen-oss-guest systems.
    Also i installed xen-express and ran prime95 on different flavours of windows guest osses together with an linux guest os running stress (simple stress test app under linux).
    Had only one hangup whom was more likely related to the fglrx driver.

    309 Processor Core May Execute Incorrect Instructions on Concurrent L2 and Northbridge Response

    Description
    Under a specific set of internal timing conditions, an instruction fetch may receive responses from the L2 and the Northbridge concurrently. When this occurs, the processor core may execute incorrect instructions.

    Potential Effect on System
    Unpredictable system behavior.

    Suggested Workaround
    BIOS should set MSR C001_1023h[23].
    That register needs some further inspection.
    I was looking for some registers with L3-timings, but the BKDG does not mention them. Those C001_10xxh registers look like a good startingpoint.

    Did you notice that they changed the fix for the 254 errata, now it only sets bit #21 in the MSR c001_1023h register.
    Last edited by justapost; 02-27-2008 at 04:13 AM.

  11. #11
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by justapost View Post
    Thank you for sending me the pdf JumpingJack.
    You are welcome. EDIT: it is very odd, I tried downloading this from a colleague/friends computer and got the old rev, I come home and download on my computer and got the new rev. I then try downloading on a 'fresh' build, thinking maybe (not knowing exactly how Adobe reader may cache files) that it was a document posted for a short period, then pulled and I may have a cached document loading... not so, the fresh build never having seen the file also downloaded the 3.16 version... so this is very odd.

    That register needs some further inspection.
    I was looking for some registers with L3-timings, but the BKDG does not mention them. Those C001_10xxh registers look like a good startingpoint.

    Did you notice that they changed the fix for the 254 errata, now it only sets bit #21 in the MSR c001_1023h register.
    I leave it in your (obviously) capable hands -- you have already demonstrated the skill and mastery of CrystalCPUID to extract useful info ...

    One thing I noted about AOD is that it changes other MSRs than just the two in question for errata 298, depending on yellow/red, elsewhere a review site stated that the difference was disabling some power saving features in leu of performance... not sure how correct, but I wonder if this is one of them.

    Again, nonetheless, 309 sounds bad but, as is the case with all errata, the rate of occurence is likely so low that it would never exhibit itself or be noticed. (Linus Torvald posted at RWT his take on it when the C2D TLB errata hit the headlines -- paraphrased 'AMD/Intel subject CPUs to such rigorous testing before releasing them, it is unlikely most any of the errata are of any significance' -- http://www.realworldtech.com/forums/...80534&roomid=2

    When I dig through errata though I always find interesting info that is good to know, like the 319 errata .. ok, so temps are inaccurate ... what else is new, it also points out that the redundancy protects against thermal runaway which is nice thus asuaging any concerns.... my point -- publishing errata is not a bad thing.

    jack
    Last edited by JumpingJack; 02-27-2008 at 08:25 PM.
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  12. #12
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    Quote Originally Posted by JumpingJack View Post
    One thing I noted about AOD is that it changes other MSRs than just the two in question for errata 298, depending on yellow/red, elsewhere a review site stated that the difference was disabling some power saving features in leu of performance... not sure how correct, but I wonder if this is one of them.
    Yes it does, Achim and I both have this verified in testing before.

    Performance difference is minor from yellow to red, if at all any (boost mainly enables extra performance enhancing algorithms affecting anything very heavily L3 cache dependent). MSR's would be changed as they hold the respective values which software manipulates. If you can use RWEverything or OllyDbug to write breakpoints you can see which MSR is affected by the button (with lengthy investigations).

  13. #13
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    The guide 3.16 hasn't been intentionally pulled as we assumed, I asked AMD myself. It is up but there's been a mixup as 4 separate links host the same content. Revision guide can be downloaded here if you're looking for an AMD link: http://vincent.amd.com/us-en/assets/...docs/41322.pdf


Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •