Intel Larrabee Roadmap 48 cores in 2010

**coffeetime** · 06-12-2007, 05:14 PM

http://pc.watch.impress.co.jp/docs/2.../kaigai364.htm

The CPU architect team takes charge of development

Larrabee which was seen as discrete GPU for a long time really is not GPU. It is メニイコア CPU which is specialized in the stream computing which processes many data in parallel. Gelsinger expresses as follows.

"Larrabee is the high-speed steel kale parallel machine. We load very many cores, our メニイコア (Many-core) become the first product "

Larrabee which adhered to x86 instruction set architecture

The largest point of Larrabee IA (x86 system) expanded instruction set architecture, it is the point which is the high parallel processor. There differs from GPU and the other stream processor of individual instruction set architecture largely.

"The core of Larrabee is IA instruction set interchangeable. This thinks that it is very important feature. However, floating point order is expanded to instruction set. It is the instruction set expansion which is specialized because of high parallel workload. In addition, cash coherency is taken in the form which extends over the core (joint ownership) it has cash. This (キャッシƒ…コヒーレンシ) when of プログラマビリティ is thought, it is very important. In addition, special use unit and I/O are loaded.

Larrabee, never, GPGPU (general purpose GPU) is not the traditional graphic pipeline in the space. General-purpose processor, in other words, it is the processor which is directed to the use where IA プログラマビリティ becomes important. But, it can answer to the workload of specification, with the expansion of instruction set, "(Gelsinger)

Like GPU, it is not the product which reforms the graphic pipeline in for general-purpose road, the fact that the approach whose is widely used is taken is Larrabee. Because of that, IA (x86 system) the rear interchangeability to instruction set architecture is taken. Starting from the IA core which is general purpose CPU, to the micro architecture which faces to the stream type computing of floating point arithmetic it is the processor which is expanded.

Actually, also discrete GPU plan of the Intel graphic core team exists to Intel. This differs Larrabee completely architecture and mounting, you say that it is discrete edition of Intel graphics. As for graphic integrated chip set of the CSI generation, because it is easy to derive discrete GPU, as for this it is the natural flow. Intel was advancing this project from the time before, but as for the concrete product road map because it is not audible, there is also a possibility of going out.

GPU the parallel processor of the approach which differs

The performance of Larrabee with graphic processing is unknown. As for graphic processing because it approaches to the execution efficiency of シェーダプログラム steadily, as for the possibility architecture of Larrabee type becoming advantage it is high. But, like rasterizing like processing and filtering and the luster operation where the functional unit which is locked completely is more effective the processing whose semi- fixed unit is effective is mainly included in the graphic pipeline. When these are processed with all processors, wastefulness increases performance/electric power mainly.

Because of that, efficiency with graphics, depending upon Larrabee how much has GPU hard, changes. Circuit scale there is also a possibility of having private hard concerning small unit.

It is clear present Larrabee not to be something which was focused to graphics, to be the architecture which approached to non graphics rather.

==================================================

http://www.tgdaily.com/content/view/32447/113/

Intel aims to take the pain out of programming future multi-core processors

Santa Clara (CA) – The switch from single-threaded to multi-threaded applications to take advantage of the capabilities of multi-core processors is taking much longer than initially expected. Now we see concepts of much more advanced multi-cores such as heterogeneous processors surfacing – which may force developers to rethink how to program applications again. Intel, however, says that programming these new processors will require a “minimal” learning curve.

As promising as future microprocessors with perhaps dozens of cores sound, there appears to be huge challenge for developers to actually take advantage of the capabilities of these CPUs. Both AMD and Intel believe that we will be using highly integrated processors, combining traditional CPUs with graphics processors, general purpose graphics processors and other types of accelerators that may open up a whole new world of performance for the PC on your desk.

AMD recently told us that it will take several years for programmers to exploit those new features. While Fusion - a processor that combines a regular CPU and a graphics core - is expected to launch late in 2009 or early in 2010, users aren’t likely to see a functionality that is different from a processor and an attached integrated graphics chipset. AMD believes that it will take about two years or until 2011 when the acceleration features of a general purpose GPU will be exploited by software developers.

Intel told us today that the company will be taking an approach that will make it relatively easy for developers to take advantage of this next generation of processors. The company aims to “hide” the complexity of a heterogeneous processor and provide an IA-like look and feel to the environment. Accelerators that are integrated within the chip are treated as processor-functional units that can be addressed with ISA extensions and a runtime library. Intel compares this approach with the way how multimedia extensions (MMX) were integrated into Intel’s instruction set back in 1996.

As a result, Intel hopes that developers will be able to understand these new processors quickly and develop applications almost immediately. “It is a very small learning curve,” a representative told us today. “We are talking about weeks, rather than years.”

Nvidia, which is also intensifying its efforts in the massively parallel computing space, is pursuing a similar idea with its CUDA architecture, which allows developers to process certain applications - or portions of them - through a graphics card: Instead of requiring a whole new programming model, CUDA can be used by a C++ based model and a few extensions that help programmers to access the horsepower of an 8-series Geforce GPU.

**Pinnacle** · 06-12-2007, 05:17 PM

**awdrifter** · 06-12-2007, 06:50 PM

That would be freaking cool, if the softwares can utilize them all.

**FghtinIrshNvrDi** · 06-12-2007, 08:02 PM

did somebody say reverse hyper-threading?

I read that in there, I swear.

Ryan

**adamsleath** · 06-12-2007, 08:10 PM

um

5-10 years from now "intel Developer forum pic"

the first pic in this thread is bogus imo.
but interesting pic's nonetheless...who and how to use all those cores simultaneously...?

no arguments from me that multi-multi cores is the way of the future....

i'm visualising 10 people at a LAN all using one server box with no separate computers needed

**nn_step** · 06-12-2007, 08:15 PM

I wonder how long it'll be for them to realize how hard it is to make use of all those cores

**safan80** · 06-12-2007, 08:40 PM

oh look they have something to replace moore's law with.

**nn_step** · 06-12-2007, 08:44 PM

Originally Posted by safan80

oh look they have something to replace moore's law with.

umm he stated that the number of transistors on an integrated circuit for minimum component cost doubles every 24 months. Which doesn't exactly have anything to do with how complex or simple the cores, nor the number of cores involved

**mcflurry4321** · 06-12-2007, 09:48 PM

Maybe they should start making software that uses 4 cores efficiently...then start talking about 100+ cores

Still very cool though.

**m0da** · 06-12-2007, 10:00 PM

Whoever figures out how to reverse-hyperthread in any fashion will become a wealthy wo/man.

Get to work.

**xlink** · 06-12-2007, 11:00 PM

Originally Posted by FghtinIrshNvrDi

did somebody say reverse hyper-threading?

I read that in there, I swear.

Ryan

it's called making it all one core.

that is reverse hyper threading.

the problem with that is that you end up with underutilized parts of the core.

my solution is just to add a bunch of functional units and allow cores to share them.

**nn_step** · 06-12-2007, 11:20 PM

Originally Posted by xlink

it's called making it all one core.

that is reverse hyper threading.

the problem with that is that you end up with underutilized parts of the core.

my solution is just to add a bunch of functional units and allow cores to share them.

Lets see, the approach of the IPC wall.
The Clock speed wall (already hit that)
The Thread wall (human limits)
and now the User wall.
Will people notice a benefit from more?

**adamsleath** · 06-13-2007, 12:04 AM

...perhaps a clever programmer could find a way to code that runs a program like a RAID 0...but as i have no idea how to code or how a cpu functions i suppose i'll just receive a flat NO on that one

just a bunch of gobbledy gook and no results.

so, will you be able to encode a dvd video in like 1 microsecond???

**nn_step** · 06-13-2007, 12:07 AM

Originally Posted by adamsleath

...perhaps a clever programmer could find a way to code that runs a program like a RAID 0...but as i have no idea how to code or how a cpu functions i suppose i'll just receive a flat NO on that one

just a bunch of gobbledy gook and no results.

so, will you be able to encode a dvd video in like 1 microsecond???

If you can find a way to parallel process
A = B + C
You can be a rich man

**adamsleath** · 06-13-2007, 12:15 AM

http://en.wikipedia.org/wiki/Parallel_programming_model

Example parallel programming models

Parallel programming models include:

* POSIX Threads
* PVM
* MPI
* OpenMP
* TBB
* Charm++
* Cilk
* Global Arrays
* HPF
* SHMEM
* Stream processing
* Pipelining
* Partitioned global address space: UPC,Co-array Fortran, Titanium
* Occam (programming language)
* Ease (programming language)
* Erlang (programming language)
* Linda coordination language

more languages than i can poke a stick at.

anyway all i'm really interested in is PC games...pathetic really.

**nn_step** · 06-13-2007, 12:20 AM

You don't seem to be able to understand that there is an absolute limit to how parallel you can thread something

**adamsleath** · 06-13-2007, 12:22 AM

depends what the something is; what sort of algorithms.

like i said i have no idea of programming limitations.

but there are many examples of parallel compute already crunching away on highly repetitive sequences i spose..like dna / protein stuff?

the more complex the sequence of algorythms the harder it is to split up and recompile?

absolutely limited by what? the language?

PC hardware might not be capable of handling parallel processing, in which case it's a total waste of time....unless you want to encode dvd videos in 1 microsecond

High-performance computing (HPC) clusters

High-performance computing (HPC) clusters are implemented primarily to provide increased performance by splitting a computational task across many different nodes in the cluster, and are most commonly used in scientific computing. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on HPC clusters. HPCs are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node's calculations will affect future calculations on other nodes.

One of the most popular HPC implementations is a cluster with nodes running Linux as the OS and free software to implement the parallelism. This configuration is often referred to as a Beowulf cluster.

Microsoft offers Windows Compute Cluster Server as a high-performance computing platform to compete with Linux.[1]

Many software programs running on High-performance computing (HPC) clusters use libraries such as MPI which are specially designed for writing scientific applications for HPC computers.

**adamsleath** · 06-13-2007, 12:33 AM

tell me what the absolute limit is, and then the hardware manufacturers can downgrade to that level.

or we could have 10 people all sharing one Multi-multicore PC

i do not understand, and yet no-one can explain it to me.

Parallel processing is used right now...but not in the PC?

**nn_step** · 06-13-2007, 12:35 AM

Originally Posted by adamsleath

tell me what the absolute limit is, and then the hardware manufacturers can downgrade to that level.

That wall depends on several factors
1) the application
2) the Number of Users
3) the Number of applications running concurrently

**adamsleath** · 06-13-2007, 12:43 AM

so a 48 core pc can be a Linux cluster?

anyway any ideas on what a 48 core pc could be used for?

3/ the number of "single threaded" applications running concurrently?

is there such a thing as a "multithreaded" application? - well actually i know there are.

and is there a way to further split up the threads into smaller parts to be offloaded to the various cores and recompiled at the other end and what is the "overhead"?

it seems to me that some applications are coded in such a way that it makes parallel ism difficult or impossible, while others can be split up and recombined.

i t seems that multithreaded applications are not necessarily parallel compute; but different types of algorythms being processed simultaneously with one another.

can you clarify/correct/illuminate?

**safan80** · 06-13-2007, 02:23 AM

Originally Posted by nn_step

umm he stated that the number of transistors on an integrated circuit for minimum component cost doubles every 24 months. Which doesn't exactly have anything to do with how complex or simple the cores, nor the number of cores involved

the point I was trying to make is that they're no longer trying to just double the transistor count in a single core or cpu. They're trying to make a lot of cores each with their own thing to do and that's a good thing.

**madcho** · 06-13-2007, 03:04 AM

AMD, intel, please do not too much cores, but better cores

**nn_step** · 06-13-2007, 03:07 AM

Originally Posted by madcho

AMD, intel, please do not too much cores, but better cores

I'm voting for cleaner Processor design and cleaner/leaner software

**nn_step** · 06-13-2007, 03:17 AM

Originally Posted by LOE

I think its about time people write software that can write software

there are many things that computers do a lot better than people, simply cause our brains are working in a totally different way

it is obvious that a lot of parallel cores are better than a single core, no matter how big it it, and remember, big means complex, and thats a bad thing

it is just too hard for a human brain to come up with making general purpose tasks work in parallel, I personally welcome multicores, cause most of the software I use can take advantage of 128+ cores, but that are not apps that an average person uses in his everyday life

They have already done that but it doesn't perform remotely as well as software written by talentless morons

**P_1** · 06-13-2007, 05:44 AM

Obviously nn has skipped this slide...

They are developing it to make exploiting parallelism easier

Thread: Intel Larrabee Roadmap 48 cores in 2010

Thread Tools

Search Thread

Rate This Thread

Display

Intel Larrabee Roadmap 48 cores in 2010

Bookmarks

Bookmarks

Posting Permissions