Page 1 of 2 12 LastLast
Results 1 to 25 of 33

Thread: Intel Larrabee Roadmap 48 cores in 2010

  1. #1
    Registered User
    Join Date
    Jun 2006
    Posts
    61

    Intel Larrabee Roadmap 48 cores in 2010













    http://pc.watch.impress.co.jp/docs/2.../kaigai364.htm

    The CPU architect team takes charge of development

    Larrabee which was seen as discrete GPU for a long time really is not GPU. It is メニイコア CPU which is specialized in the stream computing which processes many data in parallel. Gelsinger expresses as follows.

    "Larrabee is the high-speed steel kale parallel machine. We load very many cores, our メニイコア (Many-core) become the first product "

    Larrabee which adhered to x86 instruction set architecture

    The largest point of Larrabee IA (x86 system) expanded instruction set architecture, it is the point which is the high parallel processor. There differs from GPU and the other stream processor of individual instruction set architecture largely.

    "The core of Larrabee is IA instruction set interchangeable. This thinks that it is very important feature. However, floating point order is expanded to instruction set. It is the instruction set expansion which is specialized because of high parallel workload. In addition, cash coherency is taken in the form which extends over the core (joint ownership) it has cash. This (キャッシƒ…コヒーレンシ) when of プログラマビリティ is thought, it is very important. In addition, special use unit and I/O are loaded.

    Larrabee, never, GPGPU (general purpose GPU) is not the traditional graphic pipeline in the space. General-purpose processor, in other words, it is the processor which is directed to the use where IA プログラマビリティ becomes important. But, it can answer to the workload of specification, with the expansion of instruction set, "(Gelsinger)

    Like GPU, it is not the product which reforms the graphic pipeline in for general-purpose road, the fact that the approach whose is widely used is taken is Larrabee. Because of that, IA (x86 system) the rear interchangeability to instruction set architecture is taken. Starting from the IA core which is general purpose CPU, to the micro architecture which faces to the stream type computing of floating point arithmetic it is the processor which is expanded.

    Actually, also discrete GPU plan of the Intel graphic core team exists to Intel. This differs Larrabee completely architecture and mounting, you say that it is discrete edition of Intel graphics. As for graphic integrated chip set of the CSI generation, because it is easy to derive discrete GPU, as for this it is the natural flow. Intel was advancing this project from the time before, but as for the concrete product road map because it is not audible, there is also a possibility of going out.

    GPU the parallel processor of the approach which differs

    The performance of Larrabee with graphic processing is unknown. As for graphic processing because it approaches to the execution efficiency of シェーダプログラム steadily, as for the possibility architecture of Larrabee type becoming advantage it is high. But, like rasterizing like processing and filtering and the luster operation where the functional unit which is locked completely is more effective the processing whose semi- fixed unit is effective is mainly included in the graphic pipeline. When these are processed with all processors, wastefulness increases performance/electric power mainly.

    Because of that, efficiency with graphics, depending upon Larrabee how much has GPU hard, changes. Circuit scale there is also a possibility of having private hard concerning small unit.

    It is clear present Larrabee not to be something which was focused to graphics, to be the architecture which approached to non graphics rather.



    ==================================================



    http://www.tgdaily.com/content/view/32447/113/

    Intel aims to take the pain out of programming future multi-core processors

    Santa Clara (CA) – The switch from single-threaded to multi-threaded applications to take advantage of the capabilities of multi-core processors is taking much longer than initially expected. Now we see concepts of much more advanced multi-cores such as heterogeneous processors surfacing – which may force developers to rethink how to program applications again. Intel, however, says that programming these new processors will require a “minimal” learning curve.

    As promising as future microprocessors with perhaps dozens of cores sound, there appears to be huge challenge for developers to actually take advantage of the capabilities of these CPUs. Both AMD and Intel believe that we will be using highly integrated processors, combining traditional CPUs with graphics processors, general purpose graphics processors and other types of accelerators that may open up a whole new world of performance for the PC on your desk.

    AMD recently told us that it will take several years for programmers to exploit those new features. While Fusion - a processor that combines a regular CPU and a graphics core - is expected to launch late in 2009 or early in 2010, users aren’t likely to see a functionality that is different from a processor and an attached integrated graphics chipset. AMD believes that it will take about two years or until 2011 when the acceleration features of a general purpose GPU will be exploited by software developers.





    Intel told us today that the company will be taking an approach that will make it relatively easy for developers to take advantage of this next generation of processors. The company aims to “hide” the complexity of a heterogeneous processor and provide an IA-like look and feel to the environment. Accelerators that are integrated within the chip are treated as processor-functional units that can be addressed with ISA extensions and a runtime library. Intel compares this approach with the way how multimedia extensions (MMX) were integrated into Intel’s instruction set back in 1996.

    As a result, Intel hopes that developers will be able to understand these new processors quickly and develop applications almost immediately. “It is a very small learning curve,” a representative told us today. “We are talking about weeks, rather than years.”

    Nvidia, which is also intensifying its efforts in the massively parallel computing space, is pursuing a similar idea with its CUDA architecture, which allows developers to process certain applications - or portions of them - through a graphics card: Instead of requiring a whole new programming model, CUDA can be used by a C++ based model and a few extensions that help programmers to access the horsepower of an 8-series Geforce GPU.
    Last edited by coffeetime; 06-13-2007 at 06:55 PM.

  2. #2
    Xtreme News Addict
    Join Date
    May 2005
    Location
    Winnipeg, Manitoba, Canada
    Posts
    2,065
    "There's no chance that the iPhone is going to get any significant market share. No chance." -- Microsoft CEO Steve Ballmer

  3. #3
    Xtreme Enthusiast
    Join Date
    Sep 2006
    Posts
    881
    That would be freaking cool, if the softwares can utilize them all.

  4. #4
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    Pleasant Hill, MO
    Posts
    1,211
    did somebody say reverse hyper-threading?

    I read that in there, I swear.

    Ryan
    "Political Correctness is a doctrine fostered by a delusional, illogical, liberal minority, and rabidly promoted by an unscrupulous mainstream media, which holds forth the proposition that it is entirely possible to pick up a turd by the clean end."

    Abit IP35 Pro
    Intel Core 2 Quad 6600 @ 3200 w/ Tuniq Tower
    2x2gb A-Data DDR2 800
    AMD/ATi HD 4870

  5. #5
    Xtreme Guru adamsleath's Avatar
    Join Date
    Nov 2006
    Location
    Brisbane, Australia
    Posts
    3,803
    um



    5-10 years from now "intel Developer forum pic"

    the first pic in this thread is bogus imo.
    but interesting pic's nonetheless...who and how to use all those cores simultaneously...?

    no arguments from me that multi-multi cores is the way of the future....

    i'm visualising 10 people at a LAN all using one server box with no separate computers needed
    Last edited by adamsleath; 06-12-2007 at 08:15 PM.
    i7 3610QM 1.2-3.2GHz

  6. #6
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    I wonder how long it'll be for them to realize how hard it is to make use of all those cores
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  7. #7
    I am Xtreme
    Join Date
    Oct 2004
    Location
    U.S.A.
    Posts
    4,743
    oh look they have something to replace moore's law with.


    Asus Z9PE-D8 WS with 64GB of registered ECC ram.|Dell 30" LCD 3008wfp:7970 video card

    LSI series raid controller
    SSDs: Crucial C300 256GB
    Standard drives: Seagate ST32000641AS & WD 1TB black
    OSes: Linux and Windows x64

  8. #8
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by safan80 View Post
    oh look they have something to replace moore's law with.
    umm he stated that the number of transistors on an integrated circuit for minimum component cost doubles every 24 months. Which doesn't exactly have anything to do with how complex or simple the cores, nor the number of cores involved
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  9. #9
    Xtreme Cruncher
    Join Date
    Jun 2006
    Location
    Iowa State
    Posts
    877
    Maybe they should start making software that uses 4 cores efficiently...then start talking about 100+ cores

    Still very cool though.

  10. #10
    Xtreme Addict
    Join Date
    Jul 2005
    Location
    ATX
    Posts
    1,004
    Whoever figures out how to reverse-hyperthread in any fashion will become a wealthy wo/man.

    Get to work.

  11. #11
    Xtreme Mentor
    Join Date
    Aug 2006
    Location
    HD0
    Posts
    2,646
    Quote Originally Posted by FghtinIrshNvrDi View Post
    did somebody say reverse hyper-threading?

    I read that in there, I swear.

    Ryan
    it's called making it all one core.

    that is reverse hyper threading.

    the problem with that is that you end up with underutilized parts of the core.

    my solution is just to add a bunch of functional units and allow cores to share them.

  12. #12
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by xlink View Post
    it's called making it all one core.

    that is reverse hyper threading.

    the problem with that is that you end up with underutilized parts of the core.

    my solution is just to add a bunch of functional units and allow cores to share them.
    Lets see, the approach of the IPC wall.
    The Clock speed wall (already hit that)
    The Thread wall (human limits)
    and now the User wall.
    Will people notice a benefit from more?
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  13. #13
    Xtreme Guru adamsleath's Avatar
    Join Date
    Nov 2006
    Location
    Brisbane, Australia
    Posts
    3,803
    ...perhaps a clever programmer could find a way to code that runs a program like a RAID 0...but as i have no idea how to code or how a cpu functions i suppose i'll just receive a flat NO on that one

    just a bunch of gobbledy gook and no results.

    so, will you be able to encode a dvd video in like 1 microsecond???
    i7 3610QM 1.2-3.2GHz

  14. #14
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by adamsleath View Post
    ...perhaps a clever programmer could find a way to code that runs a program like a RAID 0...but as i have no idea how to code or how a cpu functions i suppose i'll just receive a flat NO on that one

    just a bunch of gobbledy gook and no results.

    so, will you be able to encode a dvd video in like 1 microsecond???
    If you can find a way to parallel process
    A = B + C
    You can be a rich man
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  15. #15
    Xtreme Guru adamsleath's Avatar
    Join Date
    Nov 2006
    Location
    Brisbane, Australia
    Posts
    3,803
    http://en.wikipedia.org/wiki/Parallel_programming_model

    Example parallel programming models

    Parallel programming models include:

    * POSIX Threads
    * PVM
    * MPI
    * OpenMP
    * TBB
    * Charm++
    * Cilk
    * Global Arrays
    * HPF
    * SHMEM
    * Stream processing
    * Pipelining
    * Partitioned global address space: UPC,Co-array Fortran, Titanium
    * Occam (programming language)
    * Ease (programming language)
    * Erlang (programming language)
    * Linda coordination language

    more languages than i can poke a stick at.

    anyway all i'm really interested in is PC games...pathetic really.
    Last edited by adamsleath; 06-13-2007 at 12:20 AM.
    i7 3610QM 1.2-3.2GHz

  16. #16
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    You don't seem to be able to understand that there is an absolute limit to how parallel you can thread something
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  17. #17
    Xtreme Guru adamsleath's Avatar
    Join Date
    Nov 2006
    Location
    Brisbane, Australia
    Posts
    3,803
    depends what the something is; what sort of algorithms.

    like i said i have no idea of programming limitations.

    but there are many examples of parallel compute already crunching away on highly repetitive sequences i spose..like dna / protein stuff?

    the more complex the sequence of algorythms the harder it is to split up and recompile?

    absolutely limited by what? the language?


    PC hardware might not be capable of handling parallel processing, in which case it's a total waste of time....unless you want to encode dvd videos in 1 microsecond

    High-performance computing (HPC) clusters

    High-performance computing (HPC) clusters are implemented primarily to provide increased performance by splitting a computational task across many different nodes in the cluster, and are most commonly used in scientific computing. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on HPC clusters. HPCs are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node's calculations will affect future calculations on other nodes.

    One of the most popular HPC implementations is a cluster with nodes running Linux as the OS and free software to implement the parallelism. This configuration is often referred to as a Beowulf cluster.

    Microsoft offers Windows Compute Cluster Server as a high-performance computing platform to compete with Linux.[1]

    Many software programs running on High-performance computing (HPC) clusters use libraries such as MPI which are specially designed for writing scientific applications for HPC computers.
    Last edited by adamsleath; 06-13-2007 at 12:29 AM.
    i7 3610QM 1.2-3.2GHz

  18. #18
    Xtreme Guru adamsleath's Avatar
    Join Date
    Nov 2006
    Location
    Brisbane, Australia
    Posts
    3,803
    tell me what the absolute limit is, and then the hardware manufacturers can downgrade to that level.

    or we could have 10 people all sharing one Multi-multicore PC

    i do not understand, and yet no-one can explain it to me.

    Parallel processing is used right now...but not in the PC?
    Last edited by adamsleath; 06-13-2007 at 12:39 AM.
    i7 3610QM 1.2-3.2GHz

  19. #19
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by adamsleath View Post
    tell me what the absolute limit is, and then the hardware manufacturers can downgrade to that level.
    That wall depends on several factors
    1) the application
    2) the Number of Users
    3) the Number of applications running concurrently
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  20. #20
    Xtreme Guru adamsleath's Avatar
    Join Date
    Nov 2006
    Location
    Brisbane, Australia
    Posts
    3,803
    so a 48 core pc can be a Linux cluster?

    anyway any ideas on what a 48 core pc could be used for?

    3/ the number of "single threaded" applications running concurrently?

    is there such a thing as a "multithreaded" application? - well actually i know there are.

    and is there a way to further split up the threads into smaller parts to be offloaded to the various cores and recompiled at the other end and what is the "overhead"?

    it seems to me that some applications are coded in such a way that it makes parallel ism difficult or impossible, while others can be split up and recombined.

    i t seems that multithreaded applications are not necessarily parallel compute; but different types of algorythms being processed simultaneously with one another.

    can you clarify/correct/illuminate?
    Last edited by adamsleath; 06-13-2007 at 12:53 AM.
    i7 3610QM 1.2-3.2GHz

  21. #21
    I am Xtreme
    Join Date
    Oct 2004
    Location
    U.S.A.
    Posts
    4,743
    Quote Originally Posted by nn_step View Post
    umm he stated that the number of transistors on an integrated circuit for minimum component cost doubles every 24 months. Which doesn't exactly have anything to do with how complex or simple the cores, nor the number of cores involved
    the point I was trying to make is that they're no longer trying to just double the transistor count in a single core or cpu. They're trying to make a lot of cores each with their own thing to do and that's a good thing.


    Asus Z9PE-D8 WS with 64GB of registered ECC ram.|Dell 30" LCD 3008wfp:7970 video card

    LSI series raid controller
    SSDs: Crucial C300 256GB
    Standard drives: Seagate ST32000641AS & WD 1TB black
    OSes: Linux and Windows x64

  22. #22
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    AMD, intel, please do not too much cores, but better cores

  23. #23
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by madcho View Post
    AMD, intel, please do not too much cores, but better cores
    I'm voting for cleaner Processor design and cleaner/leaner software
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  24. #24
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by LOE View Post
    I think its about time people write software that can write software

    there are many things that computers do a lot better than people, simply cause our brains are working in a totally different way

    it is obvious that a lot of parallel cores are better than a single core, no matter how big it it, and remember, big means complex, and thats a bad thing

    it is just too hard for a human brain to come up with making general purpose tasks work in parallel, I personally welcome multicores, cause most of the software I use can take advantage of 128+ cores, but that are not apps that an average person uses in his everyday life
    They have already done that but it doesn't perform remotely as well as software written by talentless morons
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  25. #25
    teh 0wnage
    Join Date
    Dec 2004
    Posts
    633
    Obviously nn has skipped this slide...


    They are developing it to make exploiting parallelism easier
    Last edited by P_1; 06-13-2007 at 05:46 AM.

Page 1 of 2 12 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •