MMM
Results 1 to 25 of 4519

Thread: AMD Zambezi news, info, fans !

Threaded View

  1. #11
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    Quote Originally Posted by xsecret View Post
    Was a P4 slower than a P3 ? Sometimes no, sometimes yes, depending on the software. The absolute performance is not something so important for AMD. The most important thing is money. And just money. Spending gazillions dollars in R&D to reach the performance of a CPU sold in very low quantities (and generating very low incomes) like the 990X is ridiculous. Bulldozer must solve two problems : 1/ Be able to gain performances (with frequency increases) at mid-term without spending more gazillions in another ľarch 2/ Compete with Intel *mainstream* CPUs (and not Extreme CPU) with a similar price/performance ratio.
    I agree. But, in BD architecture has less tradeoffs than Netburst. I expect per module min. same level of performance of K10, not 40-50% lower. Look at horrific chineese results of wprime. It is 65% slower than Thuban core per core and per clock. Something is wrong here. I still can't believe that it is true.

    Quote Originally Posted by xsecret View Post
    High raw throughput for an FP unit is nice. But in order to use this power in real-world application, you need a frontend able to feed it correctly.
    Average IPC of most workloads isn't much more than 1 IPC on Thuban core. 4-way front end is more than enough to feed two threads.

    And keep in mind the horribly slow L1 Write-Through, probably added in order to remove a bottleneck in frequency scaling. Write-Through means your writing from the frontend to the L2 "through" the L1.
    No, that doesn't mean WT.
    Write Trough means that every write to the cache causes a synchronous write to the backing store. Because L2 is slower than L1, L1 must wait for L2 to write out data. But there is WCC (Write Coalescing Cache) to hold on data for later writing out. I can't see why the WT policy cache is so much issue with BD core. Ratio between loads and stores is arround 2:1. For every two loads, we have one store.

    So, seen from the frontend, the L1 write bandwidth is as "slow" as the L2 write bandwidth.
    Not quite, because of WCC.

    The last ľarch to use that horrible trick was Netburst, with high frequencies in mind. Bulldozer comes with a L1 WT too and that point only could explain many disappointments from a performances point of view.
    Again, I don't think so there is the problem with WT. L1D is WT, WCC is write buffer, and L2 is probably WB. Because of WCC there can be some issues with multiple write out streams. Also, we don't know what is behaviour of WCC when two integer cores writing data. There is probably WCC cache trashing.
    Last edited by drfedja; 09-13-2011 at 08:45 AM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •