http://theinquirer.net/default.aspx?article=35245
http://en.wikipedia.org/wiki/Vector_processing
Can anyone explain exactly what this will mean to us when applied?
Printable View
http://theinquirer.net/default.aspx?article=35245
http://en.wikipedia.org/wiki/Vector_processing
Can anyone explain exactly what this will mean to us when applied?
Quote:
Originally Posted by brentpresley
i would love that, the scientific aspect of it would be fantastic. especially when the people coding for it figured out precisely how to use it.. F@H and the like would see another 10X boost to complement the 40% boost they got from using ATI's GPU's
I guess they mean Nehalem could get a REAL vector unit. Right now, Intel uses SSEn which is basically an FPU that can do vector instructions aswell.
If Intel takes technology from the Alpha EV8 for Vector, we could see a dedicated vector unit on the die which would be much more powerful than SSEn.
The benefits would be that anything which could be vectorized would see a massive speed up..
Faster encoding/decoding, frames per second, :banana::banana::banana::banana: surfing ;)
So basically they are cloning the AIM AltiVec engine
Jeez guys, chill with the vectoring, REAL TIME RAYTRACING on an 8 core tulsa :O !!! !!! !!! anyone see the implication here?
http://graphics.cs.uni-sb.de/~wald/P...ages/teas2.jpg
Imagine nfs with that level of detail on the WHOLE scene. Since the clovertown is indeed faster then that tulsa setup, and yorkfiel is faster then clovertown... dun dun duuuunnnnn i say 2 years till we have ray traced games. Maybe 3 till ray traced, fully vectored games. Wow that means i'd have to learn calculus n :banana::banana::banana::banana: to even concieve a vector engine ;O NOO!
Uh, Brent, you're confused. Vector graphics simply means the use of geometrical encodings to represent graphic objects rather than pixel bitmaps. A line defined by its endpoints instead of being defined by the full set of pixels between them at a particular resolution.
Has nothing to do with vector processing, which is the ability of an instruction to perform the same operation on multiple operands at the same time.
what instruction set did you code it for?Quote:
Originally Posted by brentpresley
Then you could have step the compiler to optimize it for SSE without much recoding :rolleyes:Quote:
Originally Posted by brentpresley
j/k, that one always bugs me.
Sure, you can use vector instructions to crunch the matrices representing vector graphic objects, but the two uses of the word aren't related.
Sounds like a fascinating project. When you bypassed the driver, was this an OpenGL driver? Would've expected that to be reasonably well optimized for every major platform.
If you went to the trouble to optimize the performance of this math intensive app, it seems counterproductive to ignore the huge benefits from SSE/SSE2 that the compiler can give almost for free.
Presumably it was only a few inner loops that really needed turboing, and a runtime selection of different code paths wouldn't make much of a dent in the 650KB. Not as if it would require significant Q/A either.
Such is the PHB. :doh:
Heckuva project to cut your teeth on. I can understand how real-world requirements would affect your choices, but if this was all in C then I don't see the boss's reluctance to compile with various optimization flags.
And really, why wasn't it written in FORTRAN? :hitself:
Look at the FPS - great for generating rendered objects for use in 3d modelling, CGI etc. But 1/20th the speed needed for a remotely playable game. Look for raytracing to make a big impact in the fields I mentioned above over the next few years, but real time gaming is a good while off yet.Quote:
Originally Posted by n-sanity
Also the 3.73ghz Tulsa is an awesome chip - better than even a stock 2.93ghz x6800 in a lot of tasks. Its monsterous cache makes it a performance beast when a task isn't penalised too much by latency such as streaming and rendering apps.
True, however hardware Vector engines, will seriously Crunch alot faster than just letting the processor crunch it.Quote:
Originally Posted by onewingedangel
Alot of Nested loops, that is just plain bad programmingQuote:
Originally Posted by brentpresley
With a parallelizable program so CPU-dependent, I bet he's really looking forward to affordable quad-core.
You must've loved working on that project and seeing it further developed on all the platforms as the hardware just gets better and cheaper.
YES I have Zero social Skills but that should have been covered in the first 3 Chapters of your Book. Regardless of the language. Hell It is practically Page 1 material.Quote:
Originally Posted by brentpresley
It makes it clear to the human what's going on, but more important it makes it really clear to the compiler's optimizer how the data's being accessed. It can then use that information to replace entire spans of code with auto-parallelized equivalents taking advantage of SSEx and/or multiple threads.
I think the compiler in use at the time just wasn't very smart.
Something like:
might be rewritten by a smart compiler into:Code:for (i = 0; i < 10000; i++)
a[i] = b[i] * c[i];
broken into two simultaneous threads. Kewl, eh?Code:for (i = 5000; i < 10000; i++)
a[i] = b[i] * c[i];
for (iprime = 0; iprime < 5000; iprime++)
a[iprime] = b[iprime] * c[iprime];
http://www.pgroup.com/
http://www.pathscale.com/
But bring lotsa $$$, esp. for a sitewide license.
So I am not understanding. In nehalem your saying Intel could add a vector unit . But it would also keep the SSEn units. As we are all aware that Intel has added sse4 instruction set to penryn 30 instructions and nehalem another 20 instructions for a total of 50 instructions . Would this still work with the vector units?Quote:
Originally Posted by Carfax
let me explain it this wayQuote:
Originally Posted by Turtle 1
SSE, SSE2, SSE3...SSEn
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
Do Floating point and SIMD math (aka vector)
http://en.wikipedia.org/wiki/SIMD
By seperating the Floating point math unit from the Vector unit, they can massively improve performance for BOTH.
Since the Floating Point Unit can specialize for Floating point math (and not have to worry about vector math)
And the Vector unit, only has to deal with Vectors.
Now Altivec/VMX (depending on who you ask [Motorola or IBM])
Basically does exactly that.
Now what I am hoping for is that they follow the Altivec design, which is VASTLY superior to ANY Intel/AMD Streaming SIMD Extension
The vectors in vector graphics are also used in a mathematical way. How else? Vector geometry is a mathematical tool. :stick:Quote:
Originally Posted by LOE
Nice links nn. Now if I understand this correctly . Vector units to operate efficiently need there own registor. True or False. Is it possiable that the russian company intel bought a while back. Will aid intel with a much better compiler that could overcome FFU and vector units trying to use the register at the same time ? Anyone!Quote:
Originally Posted by nn_step
Ideally speaking you would want 32 registers JUST for the Vector Unit. 128 or 256 bits wide apiece.Quote:
Originally Posted by Turtle 1
Which will cause a Double in the space needed for Floating point/Vector math But you will get up to (in theory) 4 Times the processing power. Which SHOULD make it a Floating point/Vector Monster
It better be a monster, this chip could be (or will be) mine one day.
This is some good info on what SSE4 brings with it in the Intel 45nm processor. I think it is looking great.
http://download.intel.com/technology...ions-paper.pdf
Hey, very interesting stuff, thanks. I'm struck that almost all of the new SSE4 vector instructions appear to be integer. Just round and dot product for floats?
Funny, for all of Intel's braggadocio about leading the way in instruction set innovation, they neglected to mention AMD's groundbreaking 3DNow! and especially x86-64.
I didn't understand the significance of Application Targeted Accelerators. How are they different from any other special purpose CISC instructions? Are they updateable in microcode or something?
X6800 will be about 1.5 times faster in rendering / enconding. Do I have to explain why?Quote:
Originally Posted by onewingedangel
Very wrong. L2/L3 cache size and or latency do not affect rendering performance. This is because good rendering algorithms will fit their data set into L1. And even if there is a miss into L2, even the smallest L2 will be more than enough so almost no difference from something with ~256KB L2 to any size L2/L3. If you take a look at actual benchmarks you will see that this is indeed the case for the vast majority.Quote:
Originally Posted by onewingedangel
Actually, with Core 2 Duo, Altivec is no longer superior to SSE2 as Core 2's throughput is similar to Altivec's when working on 32-bit single. SSE2 actually is superior to Altivec in many ways, since it can do 64-bit double precision math while Altivec cannot.Quote:
Originally Posted by nn_step
If Intel decides to improve on the vector capabilities of future processors, there's plenty that they can do.
They'll need to increase bandwidth though, as bandwidth is a severe limitation for vector performance.
There was some threads talking about vectoring today as it seems we moved to another level . I thought I would bump this back up front. as I think we can add to this thread now
This is the reason I went and got this thread guys go with it and lets talk about it as Enthusiast
I believe this guy was right.He just missed by a couple of Generatons'
Lets everyone on this subject forget about Intel Vs. AMD .
This belongs in this thread . If you read up on Elbrus tech . You will understand the implacations of whats this is all about in relationship to the thread title.
http://techknowledger.blogspot.com/2...r_archive.html :coffee:
ummm Turtle that Page is talking about Getting rid of x86 and Conroe being a Super RISC, which one could in theory design software specifically for it with massive performance. However Conroe is still x86
True I am talking maybe nehalemC . More likely Geshner . Nehalem2 is suppose to scale to 16 cores Geshner 32 cores. I think there's a very good chance . As in theory there could be a dedicated compiler core. We know there's shared cache now . There' room for dedicated cores to do what ever intel wants. using CSI .VLIW(epic) is a strong possiability
Vector processing in desktops? Nice we're going to be owning super computers.
Very true, however it is gonna eat Cache like no tomorrow and to compile a whole OS, that'll take a while. If I remember correctly FX!32 took about 7 Days to finish :eek: After that it was untouchable.Quote:
Originally Posted by Turtle 1
However most people aren't willing to wait for that
You better reread that link . It translates just a bit slower. Thats configured without a dedicated core a shared cache. This speeds up the translate to where won't be any slower than a present 1 thread x86 read. With parrallel reads and multi threads it will be even faster yet. Intel can put a lot of shared cache on die without x86 decoders on each processor.
__________________________________________________ __________________________________________________ _________
The New Architecture
To reduce power you need to reduce the number of transistors, especially ones which don’t provide a large performance boost. Switching to VLIW means they can immediately cut out the hefty X86 decoders.
Out of order hardware will go with it as they are huge, consumes masses of power and in VLIW designs are completely unnecessary. The branch predictors may also go on a diet or even get removed completely as the Elbrus compiler can handle even complex branches.
With the X86 baggage gone the hardware can be radically simplified - the limited architectural registers of the x86 will no longer be a limiting factor. Intel could use a design with a single large register file covering integer, floating point and even SSE, 128 x 64 bit registers sounds reasonable (SSE registers could map to 2 x 64 bit registers).
Turtle I am talking the amount of time it took for the system to convert from x86 Native to Alpha native
The conversion would only have to take place once though before the new binary is ready to be used, so if the conversion can take place in the background when your just using the net etc. I think it would be quite feasonable. The only issue may be HDD space, as having all your games, programs etc. that you run off cd's stored on your HDD would eat up space.
If the conversion could be done in real time, it would take up so much processing power, it wouild be better to just focus on improving x86 performance, up until the point where real time conversion can take place as a background process consuming little resources.
The idea of a software abstraction layer converting instructions is a good idea somewhere down the line, but I'm not sure were at that point quite yet.
Remember when intel said they'd have Itaniums on desktops by now, prehaps we shouldn't forget that their ultimate goal is to move away from x86 (to something that AMD doesn't have a legal right to use). Such a move wouldn't harm amd's hardware, they too would be free to innovate then, but you must remember the ammount of money intel spends on compilers and software that benefits users of all x86 hardware, intel or amd. Should Amd have to compete with intel on the software front, I dont think its a battle they could win alone (Would IBM bail them out again?)
True but on a 16 core processor. With one core used strictly as compiler and much shared cache. its very feasable. as far as computing power 16 cores gives more than enough power. As long as the instructions can be done in real time . I see Intel going this route. Will just have to watch development. I just see so many good things with intel going this route. Plus they will finally not have to share with amd. Intel spends billions on research than have to give tech away not good.
isn't that like saying Transmeta was right turtle?
Were not talking about transmeta. Intel has owned elbrus tech since 04.
Which gave intel plenty of time to tweak. Any way a 16 core cpu with dedicated elbrus compiler with shared cache on all cores should transalate x86 in real time. with lots of vector power I see a raytracing monster coming from intel.
If you read the article you know what tranmetas short comings were . Intel has the tech to over come those short comings.
The idea here isn't to point out were tranmeta failed. But to discuss how intel will over come those short comings. Just as the article was talking about. All of those short comings that were discussed in that article intel has the teck to overcome these problems. Than Intel will be free of the shackles
Which bind them to x86 and MS. I processer that can run any OS in true 64 bit binaries is a wonderful thing . For all of us.
yes we are, esp since They were the first ones to really make a good Code morphing product.
No really were not . Intel is going to use Elbrus compiler which isn't related to Transmetas morphing product . Either way Intel has always been good at taking others failed ideas(In the case of Elbrus they didn't have the infrarstructure or the money to take this tech to the next level .Intel does) and transforming that tech into viable products. Elbrus has been working on this tech since 1980-> see sparcQuote:
Originally Posted by nn_step
whose going to break it down and explain why the vector processor matters to average consumers?
Is it somthing that will make farcry 3 look nice and pretty?
Well if and I say if . Intel moves towards raytracing for gaming . Vector processor is a big deal. But not just for Gaming. In gaming hower raytracing running at 50fps will be a spectacular sight to see better than any of todays GPU's
well depends on how well it is used. But a range of 5-15fps increase isn't a stretchQuote:
Originally Posted by zabomb4163
Loe we are simply discussing where intel may be heading . Since AMD has come out and said they will not be competing with intel on the number of cores in a cpu. Amd and Intel seem to be heading down differant roads.Quote:
Originally Posted by LOE
If in a Intel 16-32 core cpu . Nehalem c and Geshner. It would have been much better for you to say . Ya if intel makes 1 or 2 cores dedicated as a FPU processor this is very feasable. And would likely be 30000 times faster than a 2.6 Conroe. Add to that cores dedicated to vector processor. Ya so your right I am saying it will be XXX more powerful than Conroe.
I don't know were this came from .I just pulled it out of Word but its thread related so I thought I would paste in here
As a result of what I've learned, coming from a realtime OpenGL/Direct3D background, I feel that raytracing (or other predominantly non-realtime techniques such as radiosity) will likely make some kind of appearance on future generations of graphics hardware. Optimised raytracing is amazingly fast, and handles incredibly detailed scenes at a logarithmic expense of processing power. In other words, it is far more efficient at rendering massive scenes than current techniques. While optimised raytracing is very memory intensive, RAM is relatively cheap, graphics processing power is not.
Standard realtime graphics systems in use today have a linear response to scene complexity. Because of this, Graphics Processing Units are getting immensely fast, but even with the advent of realtime Shaders, there are still many effects that cannot be achieved in realtime - some of which are achieved as part of the core algorithm of raytracing. Optimised raytracing can happily render the entire Earth from space in life-like detail, acounting for all surface detail and reflecting light off the moon - approximately the same number of rays are cast as when rendering a single low-detail teapot.
But enough hype, raytracing has many downfalls, such as massive tree structures (for optimisation) that require equally massive amounts of memory, a linear response to the number of lights in a scene, and the fact that indirect lighting isn't usually accounted for. Newer radiosity-based techniques such as the radiosity-based Global Illumination (GI) have already been implemented in realtime software (but slowly, and on very simple scenes), which does help solve the indirect lighting problem of raytracing. Within a few generations of 3D accellerators I wouldn't be surprised to see some GI, raytracing or a hybrid system appear in hardware. Until then, non-realtime raytracing is being contantly improved, and is still critical to many sectors, including the film industry, architectural firms, and other creative industries.
The Itanium uses something called software pipelining, which from what I hear, is superior to vectorization as it's more applicable.
But, real time ray tracing in games is many years off. If Intel decides to pursue this, it will most definitely be based on a specialized processor, and not a general purpose one like Nehalem.
YES CAN YOU TELL WE WHAT NEHALEM C IS ? were really speculating on geshner. But nehalem c is said to be a 16 core cpu. So it may be applicable to nehalem c. So with 16 cores on the cpu . I see no reason Intel can't include as you say a specialized processor. I would like to see a link that says Nehalem C is a general purpose processorQuote:
Originally Posted by Carfax
You seem to be saying you know what nehalem c is .It would be nice if you could elaborate on this and enlighten the rest of use. as for me I am just speculating. As thats whats this thread is.
This article touches abit on what were discussing here. Even tho its described as a x86 design . Witch really makes more since for Nehalem processor . But Geshner I believe is going into a whole new realm of processors.
The Otellini keynote stressed the proposition that processor performance is once again an important market factor. This is a return to a familiar message for Intel, and a departure from the "platformization" strategy that Intel has maintained through years of market share gains by AMD. Otellini's presentation also stressed the power-efficiency of the new processors. This move by Intel also highlights the continuing relevance of Moore's Law — which holds that the transistor density of integrated circuits will double every 18 months — and suggests that 16-core or even 32-core devices will likely be available by 2010.
http://www.theinquirer.org/default.aspx?article=35342
Even tho I may be getting ahead of myself on Geshner. What follows Geshner is extremely exciting also. Its hard to say were the industry is going but its a lot of fun tring to figure out or speculate on it. I really like the slides in this link . Eye poping stuff here.
http://www.tgdaily.com/picturegaller...0609264-1.html
can we please try to avoid mixing EPIC and x86 please turtle. They are not directly comparable
NN - it is applicable here. Thats what Elbrus compiler is VLIW(epic) . Do you think that Intel is going to announce to the world ya were ditching X86 in favor of VLIP(epic). I don't . I do believe Intel has a great desire to get away from the X86 processor. That could happen with Nehalem C but more likely Geshner.
This would give intel the ability to run any OS and not suffer a performance hit.
The logic of this is simplistic. As Intel spends billions on the x86 processor research. Everytime they come up with a design that will improve performance AMD is allowed to copy it. This bad for intel. By leaving the x86 processor they no longer have the monkey on their back.
I have no idea what Nehalem C is and I am simply speculating.Quote:
Originally Posted by Turtle 1
I think it's safe to say that no general purpose processor would be capable of real time ray tracing at high resolution and acceptable frame rates.
But, after reading that article at the Inq, it's possible Intel has some plans to seriously beef up the vectorization in future cores.
As it stands right now, no x86 based processor uses a true vector engine. SSEn functions as both an FPU and a vector engine, but it's not a true vector engine.
In the future, Intel could put real vector units (similar to Cell) on their processors and this would dramatically increase floating point capability.
The biggest problem would be supplying these units with enough bandwidth, but who knows?
As the article states, there's XDR, and I'm sure Intel has some other tricks up it's sleeve to be able to feed the units..
This is probably the future of computing. Putting both specialized and general purpose cores together on the same piece of silicon..
Gesher is the 2010 model, so by then, desktop processors could be approaching 1 TFLOP in processing power, which should be enough to run a real time ray traced engine at high resolution.
be careful to differentiate between the x86 instuction set and the hardware that processes it, whilst intel and amd have an agreement to share significant updates to the core x86 instruction set, they have no obligations to share the hardware that processes those instructions. The internal pipelines of amd and intel cpus are already very different, and a certain degree of 'code morphing' already goes on, such as with micro and macro op fusion, but at this time this takes place in dedicated hardware that decodes the x86 instuctions and feeds the pipeline, not in software form that may be used in future products(and is used for running x86 on itaniums). The software solution is superior in that it may allow for universal compatability, with different recompilers used to convert between various instuction sets, not just external instructions>internal instruction
Now you see where I am coming from . It is extremely interesting and holds much promise. I believe in this thread I have the white paper link on Intels SSE4 its very interesting . But don't just read the top read the entire white paper its really good stuff.Quote:
Originally Posted by Carfax
Turtle you are talking about something that would take over a decade to unfold. And even if you have specialized logic for direct conversion, it is still going to suffer from the same problems as a common micro-decode engine
edit:
I advise you to read this article:
http://www.aceshardware.com/read.jsp?id=60000308
it is a little outdated but it'll help
Quote carfax
In the future, Intel could put real vector units (similar to Cell) on their processors and this would dramatically increase floating point capability.
The biggest problem would be supplying these units with enough bandwidth, but who knows?
As the article states, there's XDR, and I'm sure Intel has some other tricks up it's sleeve to be able to feed the units..
__________________________________________________ __________________________________________________ __________________
You may find this interesting as I do . As I believe Intel will introduce bits and pieces of the tech advance in each new generation. So its possiable we could find this tech on Geshner.
http://www.tgdaily.com/picturegaller...0609264-7.html
Come on NN why do you think its 10 years away? As you know yourself Amd wants to add coprocessors as does intel on seperate sockets. Than we have Geneseo . This stuff isn't 10 years off only a 1 year.Ultimately making it on to the multi-core cpu sooner rather than latter.Quote:
Originally Posted by nn_step
That was a longtime ago and things have changed a lot.
However, there was one company which took a more radical approach and while its processor wasn’t exactly blazing fast it was faster than those using the stripped back approach, what’s more it didn’t include the x86 instruction decoder. That company was Transmeta and its line of processors weren’t x86 at all, they were VLIW (Very Long Instruction Word) processors which used "code morphing" software to translate the x86 instructions into their own VLIW instruction set.
Transmeta, however, made mistakes. During execution, its code morphing software would have to keep jumping in to translate the x86 instructions into their VLIW instruction set. The translation code had to be loaded into the CPU from memory and this took up considerable processor time lowering the CPU’s potential performance. It could have solved this with additional cache or even a second core but keeping costs down was evidently more important. The important thing is Transmeta proved it could be done, the technique just needs perfecting.
Intel on the other hand can and do build multicore processors and have no hesitation in throwing on huge dollops of cache. The Itanium line, also VLIW, includes processors with a whopping 9MB of cache. Intel can solve the performance problems Transmeta had because this new processor is designed to have multiple cores and while it may not have 9MB it certainly will have several megabytes of cache.
Most interestingly though is the E2K compiler technology which allows it to run X86 software. This is exactly the sort of technology Intel need and since last year they have had access to it and employ many of it’s designers.
You can of course expect all these cores to support 64 bit processing and SSE3, you can also expect there to be lots of them. Intel’s current Dothan cores are already tiny but VLIW cores without out of order execution or the large, complex, x86 decoders leave a very small, very low power core. Intel will be able to make processors stuffed to the gills with cores like this.
Intel will now be free to do as it pleases with X86 decoding done in software Intel can change the hardware at will. If the processor is weak in a specific area the next generation can be modified without worrying about backwards compatibility. Apart from the speedup nobody will notice the difference. It could even use different types of cores on the same chip for different types of problems.
The New Architecture
To reduce power you need to reduce the number of transistors, especially ones which don’t provide a large performance boost. Switching to VLIW means they can immediately cut out the hefty X86 decoders.
Out of order hardware will go with it as they are huge, consumes masses of power and in VLIW designs are completely unnecessary. The branch predictors may also go on a diet or even get removed completely as the Elbrus compiler can handle even complex branches.