Vector processing on nehelam?

Printable View

Show 100 post(s) from this thread on one page

10-21-2006, 07:58 PM
Turtle 1

This is some good info on what SSE4 brings with it in the Intel 45nm processor. I think it is looking great.

http://download.intel.com/technology...ions-paper.pdf
10-21-2006, 09:51 PM
BlueBiker

Hey, very interesting stuff, thanks. I'm struck that almost all of the new SSE4 vector instructions appear to be integer. Just round and dot product for floats?

Funny, for all of Intel's braggadocio about leading the way in instruction set innovation, they neglected to mention AMD's groundbreaking 3DNow! and especially x86-64.

I didn't understand the significance of Application Targeted Accelerators. How are they different from any other special purpose CISC instructions? Are they updateable in microcode or something?
10-21-2006, 11:04 PM
IvanAndreevich

Quote:

Originally Posted by onewingedangel

Also the 3.73ghz Tulsa is an awesome chip - better than even a stock 2.93ghz x6800 in a lot of tasks.

X6800 will be about 1.5 times faster in rendering / enconding. Do I have to explain why?

Quote:

Originally Posted by onewingedangel

Its monsterous cache makes it a performance beast when a task isn't penalised too much by latency such as streaming and rendering apps.

Very wrong. L2/L3 cache size and or latency do not affect rendering performance. This is because good rendering algorithms will fit their data set into L1. And even if there is a miss into L2, even the smallest L2 will be more than enough so almost no difference from something with ~256KB L2 to any size L2/L3. If you take a look at actual benchmarks you will see that this is indeed the case for the vast majority.
10-21-2006, 11:33 PM
Carfax

Quote:

Originally Posted by nn_step

Now what I am hoping for is that they follow the Altivec design, which is VASTLY superior to ANY Intel/AMD Streaming SIMD Extension

Actually, with Core 2 Duo, Altivec is no longer superior to SSE2 as Core 2's throughput is similar to Altivec's when working on 32-bit single. SSE2 actually is superior to Altivec in many ways, since it can do 64-bit double precision math while Altivec cannot.

If Intel decides to improve on the vector capabilities of future processors, there's plenty that they can do.

They'll need to increase bandwidth though, as bandwidth is a severe limitation for vector performance.
10-31-2006, 06:17 PM
Turtle 1

There was some threads talking about vectoring today as it seems we moved to another level . I thought I would bump this back up front. as I think we can add to this thread now
11-01-2006, 10:40 AM
Turtle 1

This is the reason I went and got this thread guys go with it and lets talk about it as Enthusiast

I believe this guy was right.He just missed by a couple of Generatons'

Lets everyone on this subject forget about Intel Vs. AMD .

This belongs in this thread . If you read up on Elbrus tech . You will understand the implacations of whats this is all about in relationship to the thread title.

http://techknowledger.blogspot.com/2...r_archive.html :coffee:
11-01-2006, 12:32 PM
nn_step

ummm Turtle that Page is talking about Getting rid of x86 and Conroe being a Super RISC, which one could in theory design software specifically for it with massive performance. However Conroe is still x86
11-01-2006, 01:31 PM
Turtle 1

True I am talking maybe nehalemC . More likely Geshner . Nehalem2 is suppose to scale to 16 cores Geshner 32 cores. I think there's a very good chance . As in theory there could be a dedicated compiler core. We know there's shared cache now . There' room for dedicated cores to do what ever intel wants. using CSI .VLIW(epic) is a strong possiability
11-01-2006, 03:00 PM
HungryForHertz

Vector processing in desktops? Nice we're going to be owning super computers.
11-01-2006, 03:06 PM
nn_step

Quote:

Originally Posted by Turtle 1

True I am talking maybe nehalemC . More likely Geshner . Nehalem2 is suppose to scale to 16 cores Geshner 32 cores. I think there's a very good chance . As in theory there could be a dedicated compiler core. We know there's shared cache now . There' room for dedicated cores to do what ever intel wants. using CSI .VLIW(epic) is a strong possiability

Very true, however it is gonna eat Cache like no tomorrow and to compile a whole OS, that'll take a while. If I remember correctly FX!32 took about 7 Days to finish :eek: After that it was untouchable.
However most people aren't willing to wait for that
11-01-2006, 08:10 PM
Turtle 1

You better reread that link . It translates just a bit slower. Thats configured without a dedicated core a shared cache. This speeds up the translate to where won't be any slower than a present 1 thread x86 read. With parrallel reads and multi threads it will be even faster yet. Intel can put a lot of shared cache on die without x86 decoders on each processor.

__________________________________________________ __________________________________________________ _________

The New Architecture
To reduce power you need to reduce the number of transistors, especially ones which don’t provide a large performance boost. Switching to VLIW means they can immediately cut out the hefty X86 decoders.

Out of order hardware will go with it as they are huge, consumes masses of power and in VLIW designs are completely unnecessary. The branch predictors may also go on a diet or even get removed completely as the Elbrus compiler can handle even complex branches.

With the X86 baggage gone the hardware can be radically simplified - the limited architectural registers of the x86 will no longer be a limiting factor. Intel could use a design with a single large register file covering integer, floating point and even SSE, 128 x 64 bit registers sounds reasonable (SSE registers could map to 2 x 64 bit registers).
11-02-2006, 09:50 AM
nn_step

Turtle I am talking the amount of time it took for the system to convert from x86 Native to Alpha native
11-02-2006, 10:39 AM
onewingedangel

The conversion would only have to take place once though before the new binary is ready to be used, so if the conversion can take place in the background when your just using the net etc. I think it would be quite feasonable. The only issue may be HDD space, as having all your games, programs etc. that you run off cd's stored on your HDD would eat up space.

If the conversion could be done in real time, it would take up so much processing power, it wouild be better to just focus on improving x86 performance, up until the point where real time conversion can take place as a background process consuming little resources.

The idea of a software abstraction layer converting instructions is a good idea somewhere down the line, but I'm not sure were at that point quite yet.

Remember when intel said they'd have Itaniums on desktops by now, prehaps we shouldn't forget that their ultimate goal is to move away from x86 (to something that AMD doesn't have a legal right to use). Such a move wouldn't harm amd's hardware, they too would be free to innovate then, but you must remember the ammount of money intel spends on compilers and software that benefits users of all x86 hardware, intel or amd. Should Amd have to compete with intel on the software front, I dont think its a battle they could win alone (Would IBM bail them out again?)
11-02-2006, 11:06 AM
Turtle 1

True but on a 16 core processor. With one core used strictly as compiler and much shared cache. its very feasable. as far as computing power 16 cores gives more than enough power. As long as the instructions can be done in real time . I see Intel going this route. Will just have to watch development. I just see so many good things with intel going this route. Plus they will finally not have to share with amd. Intel spends billions on research than have to give tech away not good.
11-02-2006, 12:01 PM
nn_step

isn't that like saying Transmeta was right turtle?
11-02-2006, 05:04 PM
Turtle 1

Were not talking about transmeta. Intel has owned elbrus tech since 04.

Which gave intel plenty of time to tweak. Any way a 16 core cpu with dedicated elbrus compiler with shared cache on all cores should transalate x86 in real time. with lots of vector power I see a raytracing monster coming from intel.

If you read the article you know what tranmetas short comings were . Intel has the tech to over come those short comings.

The idea here isn't to point out were tranmeta failed. But to discuss how intel will over come those short comings. Just as the article was talking about. All of those short comings that were discussed in that article intel has the teck to overcome these problems. Than Intel will be free of the shackles
Which bind them to x86 and MS. I processer that can run any OS in true 64 bit binaries is a wonderful thing . For all of us.
11-02-2006, 05:43 PM
nn_step

yes we are, esp since They were the first ones to really make a good Code morphing product.
11-02-2006, 05:53 PM
Turtle 1

Quote:

Originally Posted by nn_step

yes we are, esp since They were the first ones to really make a good Code morphing product.

No really were not . Intel is going to use Elbrus compiler which isn't related to Transmetas morphing product . Either way Intel has always been good at taking others failed ideas(In the case of Elbrus they didn't have the infrarstructure or the money to take this tech to the next level .Intel does) and transforming that tech into viable products. Elbrus has been working on this tech since 1980-> see sparc
11-02-2006, 09:56 PM
zabomb4163

whose going to break it down and explain why the vector processor matters to average consumers?

Is it somthing that will make farcry 3 look nice and pretty?
11-03-2006, 12:49 AM
Turtle 1

Well if and I say if . Intel moves towards raytracing for gaming . Vector processor is a big deal. But not just for Gaming. In gaming hower raytracing running at 50fps will be a spectacular sight to see better than any of todays GPU's
11-03-2006, 06:39 AM
nn_step

Quote:

Originally Posted by zabomb4163

whose going to break it down and explain why the vector processor matters to average consumers?

Is it somthing that will make farcry 3 look nice and pretty?

well depends on how well it is used. But a range of 5-15fps increase isn't a stretch
11-03-2006, 07:16 AM
Turtle 1

Quote:

Originally Posted by LOE

turtle 1 - sometimes I have the impression you have no idea what you are talking about, you keep on repeating phrases, but the context of the things you say emits uncertainty

Whan kind of raytracing are you talking about? Reflect&Refract, Global Illumination, Radiosity, Caustics.. or ???

Todays games can do reflect and refract just as fine as it would be with actual raytracing. But they use shaders, and such things run at a very pleasent frame rate. I remember realtime reflect and refract back in the GF3 days, running at 60+ fps, with SM1 compilant shaders.

There are rumors that next gen GPUs will be able to do Radiosity in real time - that is not such a big deal. Keep that in mind - GPUs, not CPUs

But the stuff that really matters is GI and Caustics. And of course GI is nothing without a Final Gather filter. It takes a modenr dual core CPU about 2-4 hours to render one frame at HD resolution with only 4 photon bounces and a limited number of caustic photons.

So you are saying that:
1 - Nehelam is going to be as fast as a poor old GF3
2 - Nehelam will be 30000 times fasther than a 2.6 GHz conroe (in order to do some imitation of real life raytrace at 60FPS)

Loe we are simply discussing where intel may be heading . Since AMD has come out and said they will not be competing with intel on the number of cores in a cpu. Amd and Intel seem to be heading down differant roads.

If in a Intel 16-32 core cpu . Nehalem c and Geshner. It would have been much better for you to say . Ya if intel makes 1 or 2 cores dedicated as a FPU processor this is very feasable. And would likely be 30000 times faster than a 2.6 Conroe. Add to that cores dedicated to vector processor. Ya so your right I am saying it will be XXX more powerful than Conroe.
11-03-2006, 07:33 AM
Turtle 1

I don't know were this came from .I just pulled it out of Word but its thread related so I thought I would paste in here

As a result of what I've learned, coming from a realtime OpenGL/Direct3D background, I feel that raytracing (or other predominantly non-realtime techniques such as radiosity) will likely make some kind of appearance on future generations of graphics hardware. Optimised raytracing is amazingly fast, and handles incredibly detailed scenes at a logarithmic expense of processing power. In other words, it is far more efficient at rendering massive scenes than current techniques. While optimised raytracing is very memory intensive, RAM is relatively cheap, graphics processing power is not.

Standard realtime graphics systems in use today have a linear response to scene complexity. Because of this, Graphics Processing Units are getting immensely fast, but even with the advent of realtime Shaders, there are still many effects that cannot be achieved in realtime - some of which are achieved as part of the core algorithm of raytracing. Optimised raytracing can happily render the entire Earth from space in life-like detail, acounting for all surface detail and reflecting light off the moon - approximately the same number of rays are cast as when rendering a single low-detail teapot.

But enough hype, raytracing has many downfalls, such as massive tree structures (for optimisation) that require equally massive amounts of memory, a linear response to the number of lights in a scene, and the fact that indirect lighting isn't usually accounted for. Newer radiosity-based techniques such as the radiosity-based Global Illumination (GI) have already been implemented in realtime software (but slowly, and on very simple scenes), which does help solve the indirect lighting problem of raytracing. Within a few generations of 3D accellerators I wouldn't be surprised to see some GI, raytracing or a hybrid system appear in hardware. Until then, non-realtime raytracing is being contantly improved, and is still critical to many sectors, including the film industry, architectural firms, and other creative industries.
11-03-2006, 10:14 AM
Carfax

The Itanium uses something called software pipelining, which from what I hear, is superior to vectorization as it's more applicable.

But, real time ray tracing in games is many years off. If Intel decides to pursue this, it will most definitely be based on a specialized processor, and not a general purpose one like Nehalem.

Show 100 post(s) from this thread on one page