Can Llano do AVX?

**saaya** · 04-27-2010, 06:41 AM

nn_step, but there are specific apps that would benefit from avx right? iirc intel already plans 256bit avx in haswell...

and about cpus... well i read that x86 instructions are broken down into actual simple math instructions which are then executed by specialized logic for those simple math instructions. originally the x86 cpus had kinda general purpose logic but they were not as efficient as breaking down x86 into basic math and having specialized math logic that then works on that... this is what i read a while ago, please correct me if im wrong

**informal** · 04-27-2010, 06:54 AM

AVX is already 256b ISA extension to the 128bit SIMD we have today.

**saaya** · 04-27-2010, 08:14 AM

Originally Posted by informal

AVX is already 256b ISA extension to the 128bit SIMD we have today.

oh then its 512bit for haswell? i just remember that its going to double the width once more

**~~terrace215~~** · 04-27-2010, 08:55 AM

Originally Posted by saaya

oh then its 512bit for haswell? i just remember that its going to double the width once more

Well, "currently" we're at 128b with officially launched parts.

AVX starts at 256b with Intel Sandy Bridge. BD is also to support this. So that's a doubling.

You've heard Haswell is going to 512b FP vector width?

**Dresdenboy** · 04-27-2010, 09:01 AM

Originally Posted by terrace215

You've heard Haswell is going to 512b FP vector width?

I've heard something like that too. Might be based on the assumption, that Haswell could support Larrabee's SIMD ISA.

**superrugal** · 04-27-2010, 09:30 AM

It's interesting. It seems AMD never disclose that Llano will equip with AVX or SSE5 or other features, but the changes in Llano doesn't look little. Will Llano have some extra pipeline-stage compared to K10?
I guess it's very possible that the changes are making for AVX or SSE5.

**saaya** · 04-27-2010, 10:23 AM

hmmm im not sure about 512 for haswell, but i remember from idf 2008 or 2009 that haswell would double avx of sb...
i think ive seen 512 but cant be sure, its been a while...
and yes, there was a lot of talk about haswell having elements of lrb, some interpeted that as it having an lrb block for an igp, others as a true hybrid design...
back then intels plan was to get game devs hooked on lrb, then merge lrb with the cpus, and to bring graphics back to the cpu that way...
without lrb im curious what intels plans are now...

**nn_step** · 04-27-2010, 02:34 PM

Originally Posted by saaya

nn_step, but there are specific apps that would benefit from avx right? iirc intel already plans 256bit avx in haswell...

and about cpus... well i read that x86 instructions are broken down into actual simple math instructions which are then executed by specialized logic for those simple math instructions. originally the x86 cpus had kinda general purpose logic but they were not as efficient as breaking down x86 into basic math and having specialized math logic that then works on that... this is what i read a while ago, please correct me if im wrong

The applications that benefit the most from AVX are those that are already embarrassingly parallel. Embarrassingly parallel applications might as well skip the CPU and go straight to the GPU because that is the work load it was designed for.

Now in the edge cases for code that is parallel but has considerable choke points, AVX in theory can improve performance but not double as would be expected from nearly double the computational resources.
So logically, we can expect only 5-80% utilization of a FULL AVX unit (Which is going to eat alot of transistors)

Now if we plan on getting the most benefit out of an AVX unit, it would need to be shared between two or more threads.

**Helmore** · 04-27-2010, 03:11 PM

Originally Posted by nn_step

Now if we plan on getting the most benefit out of an AVX unit, it would need to be shared between two or more threads.

Smells like Bulldozer

.

**Dresdenboy** · 04-28-2010, 10:41 AM

Originally Posted by Helmore

Smells like Bulldozer

.

Yeah, because of "shared"

But Sandy Bridges FP units are shared as well.

Any kind of sharing is good for longer latency pipelined units.

**saaya** · 04-28-2010, 11:55 PM

Originally Posted by nn_step

The applications that benefit the most from AVX are those that are already embarrassingly parallel. Embarrassingly parallel applications might as well skip the CPU and go straight to the GPU because that is the work load it was designed for.

Now in the edge cases for code that is parallel but has considerable choke points, AVX in theory can improve performance but not double as would be expected from nearly double the computational resources.
So logically, we can expect only 5-80% utilization of a FULL AVX unit (Which is going to eat alot of transistors)

Now if we plan on getting the most benefit out of an AVX unit, it would need to be shared between two or more threads.

well, you know intel...

they had lrb on one side and avx on the other and wanted them to touch at some point... and if anything goes wrong, the one or the other will serve as a backup plan... which was a smart strategy as lrb did fail

so highly parallel code, lets say video compression, will benefit a lot, but it wont be double as fast... thanks, thats good to know!

Originally Posted by Dresdenboy

Sandy Bridges FP units are shared as well.

they are? how? is there anything public to read about this?

**Helmore** · 04-29-2010, 12:10 AM

Originally Posted by saaya

they are? how? is there anything public to read about this?

He is referring to SMT (Hyper Threading). 2 threads sharing the same execution resource, including the FP units.

**saaya** · 04-29-2010, 01:08 AM

Originally Posted by Helmore

He is referring to SMT (Hyper Threading). 2 threads sharing the same execution resource, including the FP units.

ahhhhhh gotcha...

**zir_blazer** · 04-29-2010, 01:25 AM

Besides AVX, could those units be used for other purposes? AMD wanted to introduce their own x86 Instruction Set extensions a few years ago and there isn't a clean picture about how they were intending to do so (All points that they planned introducing them with Bulldozer, but maybe they can do it earlier with Llano), and besides, they aren't up to date supporting Intel standards. Currently, AMD lacks SSSE3, SSE 4.1 and SSE 4.2 support.
AMD wanted to introduce SSE5, but due to Intel announcing AVX, AMD revised their proposed extension to make sure that they don't overlap or are incompatible with the new Intel AVX opcodes. The revised instruction set was broken into three smaller extensions: XOP (That if I understand properly, groups the old SSE5 instructions that had an AVX equivalent), CVT16 (Can transform 32 Bits precision floating point numbers into 16 Bits precision and viceversa), and FMA (Fused Multiply Add). Adding to the mess is the fact that there are two proposed versions of FMA: FMA3, that works with 3 operands, and FMA4, that works with 4 operands (Long life redundancy!), and as is predictable, AMD and Intel had each sided with one of them.
Except Intel AVX, AMD extensions seems to be missing in action for the most part. I suppose that at least AMD could have included their very own extensions or at least the missing SSSE3/SSE4 ones.

**saaya** · 04-29-2010, 01:36 AM

i think amd can do most if not all of it, some better some less efficiently... they just arent promoting it because as soon as they will, intel will step on their foot again

i think they are just waiting for intel to push for whatever they think makes sense and then support it as well... and maybe announce that they support some few extra insutrctions on top of that...

whatever amd now claims they will do, intel will speak up with a much louder voice and proclaim that they will support the same or even more and call it something else...
so amd is doing the smart thing and playing the waiting game until intel puts their cards on the table i think...

**nn_step** · 04-29-2010, 07:16 PM

Originally Posted by saaya

well, you know intel...

they had lrb on one side and avx on the other and wanted them to touch at some point... and if anything goes wrong, the one or the other will serve as a backup plan... which was a smart strategy as lrb did fail

so highly parallel code, lets say video compression, will benefit a lot, but it wont be double as fast... thanks, thats good to know!

they are? how? is there anything public to read about this?

actually video compression when applied to large enough raw video is embarrassingly parallel, the same could be said for most video, but honestly because they are so parallel and GPUs are very common and extremely good at embarrassingly parallel work; video tends to utilize GPUs instead of SIMD units.

The applications that would actually see improvements via SIMD would be compression and encryption, but even those tend to see an order of magnitude better performance if there is explicit hardware support [which is ALOT cheaper than doubling the SIMD unit]

Once GPUs start supporting IEEE 754 (and a proper standard), the CPU will see absolutely no reason to have SIMD [except for legacy reasons]

**saaya** · 04-30-2010, 04:42 AM

Originally Posted by nn_step

Once GPUs start supporting IEEE 754 (and a proper standard), the CPU will see absolutely no reason to have SIMD [except for legacy reasons]

unless your intel and you dont HAVE a propper gpu

thats why avx exists to begin with, doesnt it?
if intel had a propper gpu avx would be part of opencl or direct compute i guess...

**~~terrace215~~** · 04-30-2010, 09:19 AM

Originally Posted by saaya

unless your intel and you dont HAVE a propper gpu

Even the Arrandale GPU ain't bad on a performance PER WATT basis, and SB's iGPU will probably be quite competitive perf/W-wise.

You can of course argue about how much power (and thus performance) gets allocated to an iGPU.

Presumably Intel considered this when designing SB.

**sergiojr** · 04-30-2010, 10:00 AM

Originally Posted by terrace215

Even the Arrandale GPU ain't bad on a performance PER WATT basis, and SB's iGPU will probably be quite competitive perf/W-wise.

As Intel IGPs don't support OpenCL, DX Compute or CUDA they don't qualify to be mentioned in this thread.

**saaya** · 05-01-2010, 03:31 AM

Originally Posted by terrace215

Even the Arrandale GPU ain't bad on a performance PER WATT basis, and SB's iGPU will probably be quite competitive perf/W-wise.

for 3d, perf per watt is "ok"... for 2d? hell no!

cuda? directcompute? opencl? pff, who cares?
what would you need this for again? oh right, there arent really any apps for it at all :P the only few aps that do exist would be so slow on an intel igp, its pointless supporting it...

just like igp/entry level dx11 gpus... as if they could actually render anything dx11 at a double digit fps :P

**informal** · 05-01-2010, 04:03 AM

Llano's GPU and SB's one are not in the same league perf. and feature wise. While Llano will have (practically a) GPGPU onboard,SB will have a tweaked IGP from Arrandale.One could argue it(SB's IGP) will be enough(as it probably will) for the average consumer,Llano will stomp it in games and whenever OpenCL/DirectX is used. There is a chance intel will do a whole rework of the GPU that goes into SB,but to expect it to be even close in performance to Llano in games is a fantasy dream IMO.

**zir_blazer** · 05-01-2010, 04:43 AM

Originally Posted by informal

Llano's GPU and SB's one are not in the same league perf. and feature wise. While Llano will have (practically a) GPGPU onboard,SB will have a tweaked IGP from Arrandale.One could argue it(SB's IGP) will be enough(as it probably will) for the average consumer,Llano will stomp it in games and whenever OpenCL/DirectX is used. There is a chance intel will do a whole rework of the GPU that goes into SB,but to expect it to be even close in performance to Llano in games is a fantasy dream IMO.

You are thinking wrong if you are understimating Intel. I think that no one would have expected that Clarkdale's GPU (1, 2) would have been capable of providing a competitor to the Radeon 3200/4200 IGPs considering that Intel history in GPUs had it as the graphic industry permanent punchbag and laughingstock. Sandy Bridge IGP is NOT to be understimated.
Remember than both AMD and Intel are not only Processor makers, but full platforms providers (Processor, Chipsets, and GPUs. OEMs likes all together). Intel seems willing to take seriously the GPU part of its platform, otherwise they would be later at a SERIOUS disadvantage should AMD set a strong baseline of GPU performance even on its cheapest platforms. What would happen if Intel wanted to stick to its old and stinky IGP in the full platform war? AMD would have an overally slower Processor that is still stupidly fast for the vast majority of the mainstream users, but a GPU that would crush in an epic way Intel GMA to mark the difference. From Core 2 Duo onwards, Intel is as greedy as always, but not stupid anymore.
The real advantage lies in that Fusion GPU, being either a direct derivate of a current budget GPU, or specifically made, is that it will be made with everything that ATI experience got to offer (Including software developing tools, driver compatibility, the always mentioned features that are currently used by no one, etc). That is something that Intel currently doesn't have, but they still have much more money to throw should they get ambicious on GPU R&D.

**informal** · 05-01-2010, 04:48 AM

480SPs of Cypress class my friend... If they come close to that with even "dual core IGP" as the rumors suggest,I'll tip my hat to them!

**zir_blazer** · 05-01-2010, 05:07 AM

Originally Posted by informal

480SPs of Cypress class my friend... If they come close to that with even "dual core IGP" as the rumors suggest,I'll tip my hat to them!

Basically, that would place it slighty above the Radeon 5570/5670 but a bit far from the 5750, with 400 and 720 SP respectively. That means that we can speculate quite accurately about Fusion GPU performance. With two exceptions: Having the GPU directly connected in the same piece of silicon to the CPU means that you have a benefict for basically eliminating their communication latency, and that is better, however, how much of an impact it could make the fact that it would be sharing Memory Bandwidth with the other Cores and with an increased latency compared to the Video Card own VRAM? Well, that is all what is left to know about Fusion GPU besides actual numbers.
Now... What we do know about Sandy Bridge? Do we have a remote idea of its performance? Else, I would still stay at bay until more info surfaces. The worst thing that you can do is saying that you are I N V I N C I B L E and get owned before finishing to say the classic sentence.

BTW... Where the hell is Hans de Vries? It should be useful his input in this Thread after soo many days.

**ajaidev** · 05-01-2010, 05:48 AM

SB doe snot have a dual core IGP well its tweaked and some additions have been done but its not dual core "as per say". Performance is improved quite a bit over the old one but still it is not something that can defeat llano's Cypress class GPU.

@informal the GPU is not the one from Arrandale directly yes it has similarities but it has much more

Thread: Can Llano do AVX?

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions