Page 1 of 3 123 LastLast
Results 1 to 25 of 52

Thread: SC09: Intel speaks about 3D Web, demonstrates LRB.

  1. #1
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366

    SC09: Intel speaks about 3D Web, demonstrates LRB.

    Here it is:
    http://www.theregister.co.uk/2009/11...ttner_keynote/

    On the SGEMM single precision, dense matrix multiply test, Rattner showed Larrabee running at a peak of 417 gigaflops with half of its 80 cores activated; and with all of the cores turned on, it was able to hit 805 gigaflops. As the keynote was winding down, Rattner told the techies to overclock it, and was able to push a single Larrabee chip up to just over 1 teraflops, which is the design goal for the initial Larrabee co-processors.
    80 cores? Probably autor's mistake. But rather interesting is a 805 GFLOPS in SGEMM. As a reference point, Tesla (GTX280) hits 370 GFLOPS in the same task.

  2. #2
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    So Fermi is faster from GPGPU point then?
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  3. #3
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by zalbard View Post
    So Fermi is faster from GPGPU point then?
    Will see. But if I remember correctly, Fermi only doubled Single Precission resources of GT200. Also SGEMM greatly depends on mem bandwidth so doubling FP resources dosn't always double perf. Probably Intel designed LRB with the great mem bandwidth.

  4. #4
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    On the SGEMM single precision...overclock...just over 1 teraflops
    Failage?

    I duno what the HD5870 or the 285GTX gets on that benchmark, but 1TFlop SP is weaksauce on larrabee side.

  5. #5
    all outta gum
    Join Date
    Dec 2006
    Location
    Poland
    Posts
    3,390
    So Larrabee still fails to run rasterized DirectX graphics with acceptable performance.
    www.teampclab.pl
    MOA 2009 Poland #2, AMD Black Ops 2010, MOA 2011 Poland #1, MOA 2011 EMEA #12

    Test bench: empty

  6. #6
    Xtreme Member
    Join Date
    Feb 2008
    Posts
    340
    ...and for the OpenGL environment being pushed by Advanced Micro Devices and others.
    I spy a typo.

    I wish there were pictures or more information on this. They might be aiming to try and reach people who already have Xeon based computing systems. Again I wish they'd put out more details on it.
    αποστασία

  7. #7
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by Hornet331 View Post
    Failage?

    I duno what the HD5870 or the 285GTX gets on that benchmark, but 1TFlop SP is weaksauce on larrabee side.
    Failage? Big win, imho. This time it is "real GFLOPS".
    As I said Tesla C1060 (GTX280) with theoretical peak of 1TFLOP, hits only 370 GFLOPS in matrix multiplication.
    http://www.idre.ucla.edu/events/2009...avid_Tesla.pdf

  8. #8
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by kl0012 View Post
    Failage? Big win, imho. This time it is "real GFLOPS".
    As I said Tesla C1060 (GTX280) with theoretical peak of 1TFLOP, hits only 370 GFLOPS in matrix multiplication.
    http://www.idre.ucla.edu/events/2009...avid_Tesla.pdf
    Well, I said i didn't knew how to compare it to others, so if the 280gtx hits 370Gflops in real, it doesn't look that bad. Any numbers on the HD4xxx/5xxx?

  9. #9
    Xtreme Addict
    Join Date
    Jan 2008
    Location
    Puerto Rico
    Posts
    1,374
    Quote Originally Posted by xoqolatl View Post
    So Larrabee still fails to run rasterized DirectX graphics with acceptable performance.
    Well what else you can expect from Intel in the graphic department
    ░█▀▀ ░█▀█ ░█ ░█▀▀ ░░█▀▀ ░█▀█ ░█ ░█ ░░░
    ░█▀▀ ░█▀▀ ░█ ░█ ░░░░█▀▀ ░█▀█ ░█ ░█ ░░░
    ░▀▀▀ ░▀ ░░░▀ ░▀▀▀ ░░▀ ░░░▀░▀ ░▀ ░▀▀▀ ░

  10. #10
    Xtreme Addict
    Join Date
    Jul 2009
    Posts
    1,023
    well, I think this is win, chances are your app is x86, not Open_L, so Larrabee wins in the consumers perspective.
    i7 920 @ 4GHz 1.25v
    GTX 470 @ 859MHz 1062mv

  11. #11
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Another interesting paper:
    http://techresearch.intel.com/UserFi...2009_FINAL.PDF

    Results Summary: Our parallel implementation of ray-casting delivers
    close to 5.8x performance improvement on quad-core Nehalem
    over an optimized scalar baseline version running on a single core
    Harpertown. This enables us to render a large 750x750x1000 dataset
    in 2.5 seconds. In comparison, our optimized Nvidia GTX280 implementation
    achieves from 5x to 8x speed-up over the scalar baseline.
    In addition, we show, via detailed performance simulation, that
    a 16-core Intel Larrabee [26] delivers around 10x speed-up over single
    core Harpertown, which is on average 1.5x higher performance
    than a GTX280 at half the flops. At higher core count, performance
    is dominated by the overhead of data transfer, so we developed a lossless
    SIMD-friendly compression algorithm that allows 32-core Intel
    Larrabee to achieve a 24x speed-up over the scalar baseline.

  12. #12
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    they just said single precision was just over a tflop?
    and how big is 80 cores?

  13. #13
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    ^^intel doesnt like to tell the truth about gpgpu or ray tracing so dont trust any blogs, articles or whitepapers from them.
    Quote Originally Posted by Hornet331 View Post
    Well, I said i didn't knew how to compare it to others, so if the 280gtx hits 370Gflops in real, it doesn't look that bad. Any numbers on the HD4xxx/5xxx?
    i have seen 880 GFLOPs with dense matrix on a 4870 and the peak flops in that situation would be 960 GFLOPs. i would expect a 5870 to be a lot faster too. nvidia definitely needs to fix something here. probably memory access but shoddy code is always going to run very slow.

  14. #14
    Xtreme Addict
    Join Date
    Apr 2004
    Posts
    1,640
    Quote Originally Posted by xoqolatl View Post
    So Larrabee still fails to run rasterized DirectX graphics with acceptable performance.
    Why, because they didn't demonstrate it at an HPC event?
    DFI LANParty DK 790FX-B
    Phenom II X4 955 BE (1003GPMW) @ 3.8GHz (19x200) w/1.36v
    -cooling: Scythe Mugen 2 + AC MX-2
    XFX ATI Radeon HD 5870 1024MB
    8GB PC2-6400 G.Skill @ 800MHz (1:2) 5-5-5-15 w/1.8v
    Seagate 1TB 7200.11 Barracuda
    Corsair HX620W


    Support PC gaming. Don't pirate games.

  15. #15
    Xtreme Mentor
    Join Date
    Apr 2005
    Posts
    2,550
    well Larrabee is definitely on the path of Merced!
    Adobe is working on Flash Player support for 64-bit platforms as part of our ongoing commitment to the cross-platform compatibility of Flash Player. We expect to provide native support for 64-bit platforms in an upcoming release of Flash Player following the release of Flash Player 10.1.

  16. #16
    Xtreme CCIE
    Join Date
    Dec 2004
    Location
    Atlanta, GA
    Posts
    3,842
    Quote Originally Posted by article
    Intel is also cracking the issue of sharing data between Core and Xeon CPUs and Larrabee GPU co-processors. Future Core and Xeon chips will be able to create a virtual shared memory pool that both the CPU and GPU can access so datasets are not crunched down, serialized, and moved over the PCI-Express bus from the CPU to the GPU and then back again after calculations are done. The shared virtual memory allows the CPU and GPU to work off the same data in sequence without any movement, which should radically improve performance and smooth out simulations.
    ^ This looks interesting to me. It's also a good way to go for Intel because nVidia has no way to do it if they lock them out. Of course, AMD could do it too now, and they may even be able to do it better at the moment.

    As for performance... people can write all the blogs and take all the task-specific benches they want, I want to see actual numbers.
    Dual CCIE (Route\Switch and Security) at your disposal. Have a Cisco-related or other network question? My PM box is always open.

    Xtreme Network:
    - Cisco 3560X-24P PoE Switch
    - Cisco ASA 5505 Firewall
    - Cisco 4402 Wireless LAN Controller
    - Cisco 3502i Access Point

  17. #17
    Xtreme Member
    Join Date
    Feb 2008
    Posts
    340
    I too think the prospect of a shared memory pool is simply awesome. As with EVERY SINGLE HARDWARE RELEASE EVER EVER EVER you won't know till it's in the hands of the consumer. All other debate is just silly.
    αποστασία

  18. #18
    Xtreme Member
    Join Date
    Sep 2002
    Posts
    445
    Quote Originally Posted by xoqolatl View Post
    So Larrabee still fails to run rasterized DirectX graphics with acceptable performance.
    ok? they just demonstrated 1tflop on a single preproduction chip, running x86. can anyone else do that?
    Quote Originally Posted by Ket
    Erm, its a little weird how a lot of peeps dont have a case for their PC.....essentially thats a cheat because in a case things always run hotter, yet ppl will claim their OC "stable"

    Sorry, in my book nothing is valid unless its in a case, and hence, a "normal" environment, by all means go nuts on cooling not a problem, but an open top setup with an OC ppl claim to be stable when in all reality inside a PC it probably won't be? Thats just unacceptable to me.

  19. #19
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by Hornet331 View Post
    Well, I said i didn't knew how to compare it to others, so if the 280gtx hits 370Gflops in real, it doesn't look that bad. Any numbers on the HD4xxx/5xxx?
    Well, I did a little google research and found that with all the "stream computing" hype from AMD there are almost no official performance numbers. Still there are some 4870 numbers flying around on AMD forums. So here is it:
    Up to 200 GFLOPS using OpenCL:
    http://forums.amd.com/devforum/textt...readid=120413&
    540 GFLOPS using IL/Brook+ (from AMD official):
    http://forums.amd.com/forum/messagev...hreadid=105221
    Some guy stated he was able to extract 880 GFLOPS using L1 texture caches but no confirmation that this method is usable in general:
    http://cerberus.fileburst.net/showthread.php?t=54842
    Also nice sum by AMD guy:
    Although 7XX has multiple methods to access memory(a lot more than 2 if you read the ISA doc). OpenCL currenly only has one as the OpenCL programming model is pointer based, so all data has to be fully coherent(this is ignoring images which is read_only or write_only, not both). This does not allow the use of the texture unit in the same way that brook+/IL can use the texture unit. Brook+ does not allow you to alias pointers(unless you explicitly allow it) and IL you do so at your own risk. Writing to memory and reading from that same memory with the texture unit does not produce deterministic behavior. OpenCL requires that all writes and reads to global memory are coherent, so this approach is not feasible. This is a performance hit compared to a streaming model because the GPU is natively a streaming device. There is another performance hit for the R7XX since it was not designed with OpenCL in mind, our new HD5XXX series was.
    One of the goals of the Stream SDK is to provide a full software stack for many different types of programmers.
    That means if you want performance, AMD provides CAL/IL to do that. If you want ease of programming to the streaming model, we also provide Brook+ to do that. If you want to program in the same language across multiple devices from the same source, OpenCL.
    I think that the bigest advantage of Larrabee is its ISA so you don't need to deal with various proprietary APIs. Also its memory model (coherent caches, general purpose mem hierarhy) alows much higher flexability in code development.

  20. #20
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816
    here is what you need to know: Here
    the rest is not important ...

    Pommm pom pom pom
    DrWho, The last of the time lords, setting up the Clock.

  21. #21
    Xtreme Member
    Join Date
    Feb 2008
    Posts
    340
    Sigh, I got excited for...that.....
    αποστασία

  22. #22
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    If you tune the IL right the rv770 can put out a lot.

    I tried Brook+ DP but then i got two different types of instructions but the IL compiler did not combine the two together. I tried several workaround but it was a no go.
    Coming Soon

  23. #23
    Xtreme Member
    Join Date
    Feb 2008
    Posts
    340
    Honestly all I want to know is what the pricing is going to look like.
    αποστασία

  24. #24
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by kl0012 View Post
    *snip*
    So if you go with current best case (beside the theoretical 880gflops) intel larrabee can pull of twice the preformance of a HD4870. Now its to be seen how much more pefromance the HD5870 has brought to the table. If the scaling is the same the should be around 1.2TF.

  25. #25
    Registered User
    Join Date
    Jun 2007
    Location
    Croatia
    Posts
    34
    Anyone wondering about 40 deactivated cores?
    It seams to me that they didn't have a spare NPP nearby to power up the whole chip!

    If half of the chip is already on the limit of 300W then all this comparing with 4870/5870 is useless.
    LRB with half chip is on par with 4870 perf. wise and on par 5970 (8x 4870) TDP wise.
    So when it finally come out it will be say... 10x slower then R900 (very optimistic scenario for Intel) if they cut down power by half and bring 50% speed up in clocks.

    As GPU it would bee finally nice to see LRB with TWO DIGITS FPS in any modern game.

Page 1 of 3 123 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •