MMM
Page 1 of 3 123 LastLast
Results 1 to 25 of 59

Thread: Vector processing on nehelam?

  1. #1
    Xtreme Addict
    Join Date
    Apr 2005
    Location
    Wales, UK
    Posts
    1,195

    Vector processing on nehelam?

    http://theinquirer.net/default.aspx?article=35245

    http://en.wikipedia.org/wiki/Vector_processing

    Can anyone explain exactly what this will mean to us when applied?
    Last edited by onewingedangel; 10-20-2006 at 03:50 AM.

  2. #2
    Xtreme Cruncher
    Join Date
    Feb 2005
    Location
    Cleveland, Ohio
    Posts
    1,750
    Quote Originally Posted by brentpresley
    It would mean GREATLY improved vector graphics (Vista Interface and ANY Game that uses them), freeing up the GPU(s) for more texture mapping, shading, etc.

    It could also speed up certain scientific calculations (matrices, etc) buy ORDERS OF MAGNITUDE (10X +).

    This could VERY well be the first step in Intel's incorporation of Graphics Tech into the CPU.

    i would love that, the scientific aspect of it would be fantastic. especially when the people coding for it figured out precisely how to use it.. F@H and the like would see another 10X boost to complement the 40% boost they got from using ATI's GPU's

  3. #3
    Xtreme Addict
    Join Date
    Jul 2004
    Location
    U.S of freakin' A
    Posts
    1,931
    I guess they mean Nehalem could get a REAL vector unit. Right now, Intel uses SSEn which is basically an FPU that can do vector instructions aswell.

    If Intel takes technology from the Alpha EV8 for Vector, we could see a dedicated vector unit on the die which would be much more powerful than SSEn.

    The benefits would be that anything which could be vectorized would see a massive speed up..

    Faster encoding/decoding, frames per second, :banana::banana::banana::banana: surfing

  4. #4
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    So basically they are cloning the AIM AltiVec engine
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  5. #5
    Xtreme Member
    Join Date
    Jun 2006
    Posts
    289
    Jeez guys, chill with the vectoring, REAL TIME RAYTRACING on an 8 core tulsa :O !!! !!! !!! anyone see the implication here?



    Imagine nfs with that level of detail on the WHOLE scene. Since the clovertown is indeed faster then that tulsa setup, and yorkfiel is faster then clovertown... dun dun duuuunnnnn i say 2 years till we have ray traced games. Maybe 3 till ray traced, fully vectored games. Wow that means i'd have to learn calculus n to even concieve a vector engine ;O NOO!

  6. #6
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144
    Uh, Brent, you're confused. Vector graphics simply means the use of geometrical encodings to represent graphic objects rather than pixel bitmaps. A line defined by its endpoints instead of being defined by the full set of pixels between them at a particular resolution.

    Has nothing to do with vector processing, which is the ability of an instruction to perform the same operation on multiple operands at the same time.

  7. #7
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by brentpresley
    I'm not confused mate. Did a thesis project on Molecular Modeling of Protein structures and worked on the SW coding for it. We completely bypassed the driver layer to use the HW directly and trust me, the math for vector-based graphics is almost identical to the math for vector processing (they are both matrix calculations). Heck, the app I wrote would LOVE this feature in a processor. I wouldn't even have to recode it (unlike SSEn, which we would have to recode for)!
    what instruction set did you code it for?
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  8. #8
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by brentpresley
    Plain vanilla C baby. Runs on Mac OS 9, OS X, Win95 - Vista, Linux, and several Unix variants.

    All I would need is a compiler update to take advantage of these features.
    Then you could have step the compiler to optimize it for SSE without much recoding
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  9. #9
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    And javascript is built on java

    j/k, that one always bugs me.

    Sure, you can use vector instructions to crunch the matrices representing vector graphic objects, but the two uses of the word aren't related.

    Sounds like a fascinating project. When you bypassed the driver, was this an OpenGL driver? Would've expected that to be reasonably well optimized for every major platform.

  10. #10
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    Sounds like a false economy

    If you went to the trouble to optimize the performance of this math intensive app, it seems counterproductive to ignore the huge benefits from SSE/SSE2 that the compiler can give almost for free.

    Presumably it was only a few inner loops that really needed turboing, and a runtime selection of different code paths wouldn't make much of a dent in the 650KB. Not as if it would require significant Q/A either.

    Such is the PHB.

  11. #11
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    Wow, talk about trial by fire!

    Heckuva project to cut your teeth on. I can understand how real-world requirements would affect your choices, but if this was all in C then I don't see the boss's reluctance to compile with various optimization flags.

    And really, why wasn't it written in FORTRAN?

  12. #12
    Xtreme Addict
    Join Date
    Apr 2005
    Location
    Wales, UK
    Posts
    1,195
    Quote Originally Posted by n-sanity
    Jeez guys, chill with the vectoring, REAL TIME RAYTRACING on an 8 core tulsa :O !!! !!! !!! anyone see the implication here?



    Imagine nfs with that level of detail on the WHOLE scene. Since the clovertown is indeed faster then that tulsa setup, and yorkfiel is faster then clovertown... dun dun duuuunnnnn i say 2 years till we have ray traced games. Maybe 3 till ray traced, fully vectored games. Wow that means i'd have to learn calculus n to even concieve a vector engine ;O NOO!
    Look at the FPS - great for generating rendered objects for use in 3d modelling, CGI etc. But 1/20th the speed needed for a remotely playable game. Look for raytracing to make a big impact in the fields I mentioned above over the next few years, but real time gaming is a good while off yet.

    Also the 3.73ghz Tulsa is an awesome chip - better than even a stock 2.93ghz x6800 in a lot of tasks. Its monsterous cache makes it a performance beast when a task isn't penalised too much by latency such as streaming and rendering apps.
    Last edited by onewingedangel; 10-20-2006 at 10:42 AM.

  13. #13
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by onewingedangel
    Look at the FPS - great for generating rendered objects for use in 3d modelling, CGI etc. But 1/20th the speed needed for a remotely playable game. Look for raytracing to make a big impact in the fields I mentioned above over the next few years, but real time gaming is a good while off yet.

    Also the 3.73ghz Tulsa is an awesome chip - better than even a stock 2.93ghz x6800 in a lot of tasks. Its monsterous cache makes it a performance beast when a task isn't penalised too much by latency such as streaming and rendering apps.
    True, however hardware Vector engines, will seriously Crunch alot faster than just letting the processor crunch it.
    Quote Originally Posted by brentpresley
    LOL - b/c NEITHER of us knew Fortran.

    Compiler flags weren't the only thing at the time for SSE (IIRC). We ran code profiling w/ some Intel tools and they were telling us we would have to un-nest a lot of the loops, etc. for SSE to work properly. So we said screw it.

    After that project, I swore I was not coding again.
    Alot of Nested loops, that is just plain bad programming
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  14. #14
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    Very interesting

    With a parallelizable program so CPU-dependent, I bet he's really looking forward to affordable quad-core.

    You must've loved working on that project and seeing it further developed on all the platforms as the hardware just gets better and cheaper.

  15. #15
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by brentpresley
    Your really are a piece of work, you know that? Come back when YOU'VE written something as complex as that AFTER ONE programming class.

    You know, you have ZERO social skills. Did your parents ever tell you that if you can't say something nice to keep your G D@MN mouth shut!
    YES I have Zero social Skills but that should have been covered in the first 3 Chapters of your Book. Regardless of the language. Hell It is practically Page 1 material.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  16. #16
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    Nothing wrong w/nested loops

    It makes it clear to the human what's going on, but more important it makes it really clear to the compiler's optimizer how the data's being accessed. It can then use that information to replace entire spans of code with auto-parallelized equivalents taking advantage of SSEx and/or multiple threads.

    I think the compiler in use at the time just wasn't very smart.

  17. #17
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    You might be pleasantly surprised

    Something like:
    Code:
    for (i = 0; i < 10000; i++)
        a[i] = b[i] * c[i];
    might be rewritten by a smart compiler into:
    Code:
    for (i = 5000; i < 10000; i++)
        a[i] = b[i] * c[i];
    
    for (iprime = 0; iprime < 5000; iprime++)
        a[iprime] = b[iprime] * c[iprime];
    broken into two simultaneous threads. Kewl, eh?

  18. #18
    Xtreme Member
    Join Date
    Sep 2006
    Posts
    144

    PathScale and PGI

    http://www.pgroup.com/
    http://www.pathscale.com/

    But bring lotsa $$$, esp. for a sitewide license.

  19. #19
    Banned
    Join Date
    Oct 2005
    Posts
    1,533
    Quote Originally Posted by Carfax
    I guess they mean Nehalem could get a REAL vector unit. Right now, Intel uses SSEn which is basically an FPU that can do vector instructions aswell.

    If Intel takes technology from the Alpha EV8 for Vector, we could see a dedicated vector unit on the die which would be much more powerful than SSEn.

    The benefits would be that anything which could be vectorized would see a massive speed up..

    Faster encoding/decoding, frames per second, :banana::banana::banana::banana: surfing
    So I am not understanding. In nehalem your saying Intel could add a vector unit . But it would also keep the SSEn units. As we are all aware that Intel has added sse4 instruction set to penryn 30 instructions and nehalem another 20 instructions for a total of 50 instructions . Would this still work with the vector units?

  20. #20
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by Turtle 1
    So I am not understanding. In nehalem your saying Intel could add a vector unit . But it would also keep the SSEn units. As we are all aware that Intel has added sse4 instruction set to penryn 30 instructions and nehalem another 20 instructions for a total of 50 instructions . Would this still work with the vector units?
    let me explain it this way
    SSE, SSE2, SSE3...SSEn
    http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
    Do Floating point and SIMD math (aka vector)
    http://en.wikipedia.org/wiki/SIMD
    By seperating the Floating point math unit from the Vector unit, they can massively improve performance for BOTH.
    Since the Floating Point Unit can specialize for Floating point math (and not have to worry about vector math)
    And the Vector unit, only has to deal with Vectors.
    Now Altivec/VMX (depending on who you ask [Motorola or IBM])
    Basically does exactly that.
    Now what I am hoping for is that they follow the Altivec design, which is VASTLY superior to ANY Intel/AMD Streaming SIMD Extension
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  21. #21
    Xtreme Addict
    Join Date
    Dec 2004
    Location
    Flying through Space, with armoire, Armoire of INVINCIBILATAAAAY!
    Posts
    1,939
    Quote Originally Posted by LOE
    vector processor has nothing to do with vector graphics
    don't be foolish

    they mean vector in a mathematical way
    The vectors in vector graphics are also used in a mathematical way. How else? Vector geometry is a mathematical tool.
    Sigs are obnoxious.

  22. #22
    Banned
    Join Date
    Oct 2005
    Posts
    1,533
    Quote Originally Posted by nn_step
    let me explain it this way
    SSE, SSE2, SSE3...SSEn
    http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
    Do Floating point and SIMD math (aka vector)
    http://en.wikipedia.org/wiki/SIMD
    By seperating the Floating point math unit from the Vector unit, they can massively improve performance for BOTH.
    Since the Floating Point Unit can specialize for Floating point math (and not have to worry about vector math)
    And the Vector unit, only has to deal with Vectors.
    Now Altivec/VMX (depending on who you ask [Motorola or IBM])
    Basically does exactly that.
    Now what I am hoping for is that they follow the Altivec design, which is VASTLY superior to ANY Intel/AMD Streaming SIMD Extension
    Nice links nn. Now if I understand this correctly . Vector units to operate efficiently need there own registor. True or False. Is it possiable that the russian company intel bought a while back. Will aid intel with a much better compiler that could overcome FFU and vector units trying to use the register at the same time ? Anyone!
    Last edited by Turtle 1; 10-21-2006 at 03:08 PM.

  23. #23
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by Turtle 1
    Nice links nn. Now if I understand this correctly . Vector units to operate efficiently need there own registor. True or False. Is it possiable that the russian company intel bought a while back. Will add intel with a much better compiler that could overcome FFU and vector units trying to use the register at the same time ? Anyone!
    Ideally speaking you would want 32 registers JUST for the Vector Unit. 128 or 256 bits wide apiece.
    Which will cause a Double in the space needed for Floating point/Vector math But you will get up to (in theory) 4 Times the processing power. Which SHOULD make it a Floating point/Vector Monster
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  24. #24
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    It better be a monster, this chip could be (or will be) mine one day.

  25. #25
    Banned
    Join Date
    Oct 2005
    Posts
    1,533
    This is some good info on what SSE4 brings with it in the Intel 45nm processor. I think it is looking great.

    http://download.intel.com/technology...ions-paper.pdf

Page 1 of 3 123 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •