MMM
Results 1 to 25 of 343

Thread: AMD Ontario APU pictured,die size ~77mm^2

Hybrid View

  1. #1
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by informal View Post
    Yes.
    I'll believe when I see it. I have some doubts about these tests. The max int instruction throughput for bobcat at 1.6 Ghz is 3.2 GIPS (limited by two decoders). I am very sceptical that bobcat can reach nearly max instruction throughput in this synthetic test (while conroe & athlon64 can't). Still very impressive if true.

  2. #2
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Quote Originally Posted by kl0012 View Post
    I'll believe when I see it. I have some doubts about these tests. The max int instruction throughput for bobcat at 1.6 Ghz is 3.2 GIPS (limited by two decoders). I am very sceptical that bobcat can reach nearly max instruction throughput in this synthetic test (while conroe & athlon64 can't). Still very impressive if true.
    Even if it had, say, 8 decoders it wouldn't be any faster. It could in theory run at 12.8 "GIPS", but in practice it wouldn't run any faster(only in cases it would actually exploit ILP > 2, which I believe is quite rare with the given code).

    But haters gonna hate.

  3. #3
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by Calmatory View Post
    Even if it had, say, 8 decoders it wouldn't be any faster. It could in theory run at 12.8 "GIPS", but in practice it wouldn't run any faster(only in cases it would actually exploit ILP > 2, which I believe is quite rare with the given code).

    But haters gonna hate.
    But if your cpu has only two decoders it doesn't mean that it has an equal IPC to cpu with 4 decoders when executes code with ILP <= 2. A simple example (code with sequence of 4 arithmetic operations):
    a = b + c
    a = a + d
    e = g + h
    e = e + f
    Cpu with 4 decoders can execute first and third instructions in the same cycle, while cpu with 2 decoders will need one more cycle for that. Of cause in reality things are a bit more complex because of OutOfOrder buffer but again, i really doubt bobcat has bigger OOO instruction window then Conroe/Athlon64.
    Last edited by kl0012; 09-08-2010 at 08:53 AM.

  4. #4
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Quote Originally Posted by kl0012 View Post
    But if your cpu has only two decoders it doesn't mean that it has an equal IPC to cpu with 4 decoders when executes code with ILP <= 2. A simple example (code with sequence of 4 arithmetic operations):
    a = b + c
    a = a + d
    e = g + h
    e = e + f
    Cpu with 4 decoders can execute first and third instructions in the same cycle, while cpu with 2 decoders will need one more cycle for that. Of cause in reality things are a bit more complex because of OutOfOrder buffer but again, i really doubt bobcat has bigger OOO instruction window then Conroe/Athlon64.
    Two words: vectorization and SIMD.

  5. #5
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by Calmatory View Post
    Two words: vectorization and SIMD.
    Vectorization is good, but it is not a panacea. Replace third operation in my example with "mul", "and", "shift", "test" or "sub" and SIMD wont help (while these ops are still independent). But my point was simple - as far as some cpu has a bigger pool of uops available for execution, so the cpu's OoO logic has a better chance to explore ILP. This is way i'm surprised by bobcat results (if these are real). I would guess that they have used a loop buffer, but such a buffer would consume a lot of space on the cpu die.
    Last edited by kl0012; 09-09-2010 at 12:47 AM.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •