Page 2 of 2 FirstFirst 12
Results 26 to 46 of 46

Thread: K10 Folding@Home SMP Performance?

  1. #26
    Xtreme Cruncher
    Join Date
    Jul 2006
    Posts
    1,374
    I wonder sometimes what the engineers of this chip were thinking when they designed this chip. This isn't derogatory either. I'm just wondering if we have another cell processor on our hands; a chip that is designed for a purpose outside the mainstream, such that only special programming will help it realize it's true potential. It is like ATI's latest GPU. On paper, it looked to be very well engineered. It works, and works well for the most part, but it seems as though there is a disconnect between the engineers and the programmers. Dunno, just some pondering on my part.

    OT: I hope that F@H can be optimized to make the most of this processor. Perhaps the difference in times can be atributed to the L2 size? Again, if the L3 doesn't do the job, then it's going to hurt performance of the processor. If I'm correct, then Phenom does the opposite of Intel: pushing the data into L2 from the L3 in a more organized fashion while Intel pulls in from a larger L2 pool. Unless newer chipsets can improve the efficiency of the L3 somehow, then Core will outperform Phenom in F@H...

  2. #27
    Xtreme Member
    Join Date
    Apr 2007
    Posts
    118
    sp33: I don't now a lot about F@H program itself. Could you please tell me if there is a "standard test molecule" in the program, which would mean that you both are running the same system? Or there isn't such a thing, and you both exchanged job IDs to run the same job? Or, the worst (and then useless) scenario, you both are using different jobs? :P

  3. #28
    Xtreme Enthusiast
    Join Date
    Jun 2007
    Posts
    546
    Quote Originally Posted by JohannesRS View Post
    sp33: I don't now a lot about F@H program itself. Could you please tell me if there is a "standard test molecule" in the program, which would mean that you both are running the same system? Or there isn't such a thing, and you both exchanged job IDs to run the same job? Or, the worst (and then useless) scenario, you both are using different jobs? :P
    To the best of my knowledge, you mainly have to focus on the project #. The other run and gen stuff for the most part is irrelevent when doing comparison. I'm assuming this because I have around 10 P2653 WU that all ran at the same time, even though their run and gen number differ.

    Quote Originally Posted by xVeinx View Post
    I wonder sometimes what the engineers of this chip were thinking when they designed this chip. This isn't derogatory either. I'm just wondering if we have another cell processor on our hands; a chip that is designed for a purpose outside the mainstream, such that only special programming will help it realize it's true potential. It is like ATI's latest GPU. On paper, it looked to be very well engineered. It works, and works well for the most part, but it seems as though there is a disconnect between the engineers and the programmers. Dunno, just some pondering on my part.

    OT: I hope that F@H can be optimized to make the most of this processor. Perhaps the difference in times can be atributed to the L2 size? Again, if the L3 doesn't do the job, then it's going to hurt performance of the processor. If I'm correct, then Phenom does the opposite of Intel: pushing the data into L2 from the L3 in a more organized fashion while Intel pulls in from a larger L2 pool. Unless newer chipsets can improve the efficiency of the L3 somehow, then Core will outperform Phenom in F@H...
    I know one reason why the K10 architecture isn't performing as great in Folding@Home, the cache. F@H loves big L2 cache, which most of the current Intel chips have at least 4MB and the Kentsfield with 8MB. F@H is really data intensive and the ability to fetch more and more instructions is key. However, it will be interesting to see if the Stanford Teams decides to exploit the K10 L3 and Integrated Memory Controller and optimize for that architecture.

    The other reason could be Kyosen's results were ran on single-channel ram, which a while back, someone discovered that dual channel vs single channel ram has a big impact on performance. I think figures as high as 50% difference just by single and dual channel.

    It'll come down to this:

    Big L2 Cache vs IMC + L3 Cache
    Last edited by Start; 11-06-2007 at 10:16 PM.

  4. #29
    Xtreme Member
    Join Date
    Apr 2007
    Posts
    118
    thanks for the info, sp33.

    Moreover, I would find it strange for F@H to be so much cache sensitive. The main original core (gromacs) is not. But it's possible that this happens, since they modified the program a bit (lot) and recompiled it with intel compilers (which I guess would make exectuable files that are really cache hungry). Original gromacs is compiled with gcc, but to make sure it spoilts the best from each architecture most important parts of it were entirely written on assembly (to make proper use even of SSE2 instructions) for mostly used platafforms. Of course, it includes both intel and amd 64 bit processors.

    Btw, if someone has a linux kentsfield, and one of our friends with Barcellonas has a linux installed also, we could give it a try. At least a single core one (it scales in a pretty well, but having to install the MPI packages is not that fun AT ALL) would be easilly done.

    Do we have the two volunteers here?

    P.S.: dual-channel is still being a problem with barcellonas? Even in the L1N? bastards! :p

  5. #30
    Team Japan
    Join Date
    Nov 2003
    Location
    Tokyo, Japan
    Posts
    345
    Quote Originally Posted by SP33DFR34K View Post
    The other reason could be Kyosen's results were ran on single-channel ram, which a while back, someone discovered that dual channel vs single channel ram has a big impact on performance. I think figures as high as 50% difference just by single and dual channel.
    Though CPU-Z shows single channel,
    it also shows that D(ram)C(ontroller) Mode is unganged.
    Ganged and unganged is still not familiar terms,
    but 2 or more memory modules are running in unganged mode on K10,
    we can think it's superior rather than equivalent to "conventional" dual channel mode,
    in my understanding.
    And it's my case in this time...I used 2x 1GB memory modules in unganged mode.

  6. #31
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    F@H isn't that cache sensitive. I have run it on a E2160 (1MB) and an E6600 (4MB) clocked at the same speed (2.4GHz) and the results on the same WUs were within 4-5% of each other. I would not call that cache dependent.

    What F@H is optimized for is SSE2/3 code. AMD X2 CPUs used to get trounced compared to C2D at the same clock in F@H, and with AMD's improvements to the SSE performance of K10 they are now up to par with C2D/C2Q. They didn't take the lead it looks like, but it is encouraging that they are competitive again.

  7. #32
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    Quote Originally Posted by mstp2009 View Post
    F@H isn't that cache sensitive. I have run it on a E2160 (1MB) and an E6600 (4MB) clocked at the same speed (2.4GHz) and the results on the same WUs were within 4-5% of each other. I would not call that cache dependent.

    What F@H is optimized for is SSE2/3 code. AMD X2 CPUs used to get trounced compared to C2D at the same clock in F@H, and with AMD's improvements to the SSE performance of K10 they are now up to par with C2D/C2Q. They didn't take the lead it looks like, but it is encouraging that they are competitive again.
    Depends on the units really. There is this one project that my friend's X2 3800+ @ 2.7GHz would finish in time but not much time left to spare, while my Opteron 165 @ 2.8GHz has more time left. I think it is probably due to a mix of extra 100MHz and 2X the L2. Then others it didn't make a hill of beans difference, was just slightly faster on my opteron due to the 100MHz extra speed.


    Sp33d, thanks for that test, brings it more into perspective
    BTW, what team do you fold for?
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  8. #33
    Xtreme Member
    Join Date
    Jul 2007
    Location
    Finland
    Posts
    105
    Quote Originally Posted by mstp2009 View Post
    F@H isn't that cache sensitive. I have run it on a E2160 (1MB) and an E6600 (4MB) clocked at the same speed (2.4GHz) and the results on the same WUs were within 4-5% of each other. I would not call that cache dependent.
    Hmmm.. a lot of projects run much faster on 4 meg L2 per dual core, Conroe and Kentsfield. 26xx regular console, 1498, and 26xx SMP clients are way slower on 2 meg L2 or less per dual core. My best on 2620,2621 is 3:30 a frame (Kents, 2.96 GHz). Have yet to get a 26xx SMP project on the Kentsfield.
    Last edited by JVguest; 11-07-2007 at 01:20 PM.

  9. #34
    Xtreme Member
    Join Date
    Dec 2003
    Location
    Blacksburg, VA
    Posts
    329
    Quote Originally Posted by SparkyJJO View Post
    Depends on the units really. There is this one project that my friend's X2 3800+ @ 2.7GHz would finish in time but not much time left to spare, while my Opteron 165 @ 2.8GHz has more time left. I think it is probably due to a mix of extra 100MHz and 2X the L2. Then others it didn't make a hill of beans difference, was just slightly faster on my opteron due to the 100MHz extra speed.


    Sp33d, thanks for that test, brings it more into perspective
    BTW, what team do you fold for?
    I believe thats the 2610
    Gigabyte MA790X-UD4P//Dell Latitude D620
    Phenom II BE 940//Core 2 Duo 2.0Ghz
    2x2GB OCZ Reaper//1GB PC5400
    640GB WD Blue//80gb 7200 rpm hdd
    Radeon 4850//Geforce 7300 Go

  10. #35
    Xtreme Member
    Join Date
    Apr 2007
    Posts
    118
    is tehre differences between the SSE implementations between K10 and Kentsfield?

    I just remembered that a few minutes before reading it being said here. It can be a possibility. That and intel compiler making huge use of caches by default.

  11. #36
    Xtreme Enthusiast
    Join Date
    Jun 2007
    Posts
    546
    Quote Originally Posted by JohannesRS View Post
    is tehre differences between the SSE implementations between K10 and Kentsfield?

    I just remembered that a few minutes before reading it being said here. It can be a possibility. That and intel compiler making huge use of caches by default.
    I'm pretty sure of that, with no doubt in my mind about it.

  12. #37
    Xtreme Member
    Join Date
    Apr 2007
    Posts
    118
    @sp33: ok, but enougth difference to allow that difference in performance? Sorry, I just know the total time difference, not the relative slow down to see how much slower AMD implementation of SSEs would need to be to allow that delay.

  13. #38
    Banned
    Join Date
    Nov 2006
    Posts
    46
    Quote Originally Posted by SP33DFR34K View Post
    How about using it for what it is, a gaming console.

    Instead of a job.

    Folding at home, what aloada crap, you wanna do that, get a computer and do it.

    Failed.

    The PS3 is a computer - it's more than just a console.

    It's also one of the best for folding.

    It would be more cheaper, too.


    Quote Originally Posted by xanvincent View Post
    Isn't Folding computing? So isn't folding for the computer to compute?

    And normally people fold when they aren't using their computer, ie. during sleep, to put wasted CPU cycles to good use.
    Or you could just turn the computer off too save electricity.

    You could also say the same for the PS3.

    Quote Originally Posted by SparkyJJO View Post

    nice comeback lol
    If that was nice then my George Bush isn't a retard.


    Quote Originally Posted by SparkyJJO View Post
    Yes.

    Folding@home is another use for computers. Don't know what his problem was, computers are used mostly for jobs anyway
    Computers are mostly used for jobs, your right, though out of a K10 system which will probaly cost the earth too buy and run, the PS3 would be a better idea, price for price.

  14. #39
    Xtreme Cruncher
    Join Date
    Feb 2005
    Location
    Finland
    Posts
    1,031
    Quote Originally Posted by Equil|briuM View Post
    the PS3 would be a better idea, price for price.
    You're making one fatal wrong assumption.
    And you should know and take account that: PS3 and Quads are folding/computing different kind of wus or calculations, thus data output is (very) different. Now, not saying more valuable to way or another.

  15. #40
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    Quote Originally Posted by Equil|briuM View Post
    Failed.

    The PS3 is a computer - it's more than just a console.

    It's also one of the best for folding.

    It would be more cheaper, too.
    You OBVIOUSLY don't fold.

    The PS3 is VERY limited in the types of WUs it can fold (just like GPUs). For this reason, it doesn't generate many points for its folding.

    Right now, in folding, Quads are KING. They generate anywhere from 1500-4000PPD (depending only on clock speed).

  16. #41
    Xtreme Cruncher
    Join Date
    Feb 2005
    Location
    Finland
    Posts
    1,031
    Quote Originally Posted by mstp2009 View Post
    You OBVIOUSLY don't fold.
    Maybe he's folding with PS3...
    ...but I'm not sure why he's buying Yorkfield... for gaming perhaps

  17. #42
    Xtreme Enthusiast
    Join Date
    Mar 2007
    Posts
    761
    Quote Originally Posted by sc00p View Post
    Maybe he's folding with PS3...
    ...but I'm not sure why he's buying Yorkfield... for gaming perhaps

  18. #43
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    Quote Originally Posted by Equil|briuM View Post
    If that was nice then my George Bush isn't a retard.

    Computers are mostly used for jobs, your right, though out of a K10 system which will probaly cost the earth too buy and run, the PS3 would be a better idea, price for price.
    I don't think Bush is a retard. Not the most brilliant person around, but not a retard.

    The K10 is supposed to be cheaper than the Q6600 IIRC. Plus if the rumors about low voltage is true, I doubt it'll cost a tone to run either. So no, I do not think it'll "cost the earth to buy and run" - this isn't quadFX

    I'm sure people will fold on it instead of the PS3. Me for example. I don't own any game consoles, can't afford that and a computer. I use the computer only, so I fold on it. Hey even if I did have both I'd still fold on both Like sc00p said, they all crunch different work. They all have a purpose.
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  19. #44
    Registered User
    Join Date
    Dec 2005
    Posts
    21
    Quote Originally Posted by Equil|briuM View Post
    How about using it for what it is, a computer.

    Instead of a job.

    Folding at home, what aloada crap, you wanna do that, get a PS3 and do it.
    actually, i echo equils thoughts. folding is a waste of time/electricity/money. thats just me, but i respect those who does it for a good cause

  20. #45
    Registered User
    Join Date
    Sep 2007
    Posts
    43

    Lol

    unless your comparing the same exact WU,it means nothing,the Bracy 1.7,does a WU per day actually about 22 hours,on suse linux, and it doesnt load the processor unless I use chrt 60 ./fah6 -smp
    Last edited by Viper666; 11-14-2007 at 07:26 PM.

  21. #46
    Registered User
    Join Date
    Nov 2005
    Posts
    1
    Does anyone want to try the holy grail of wus: 2928, 2929? It's very rare, but if you can receive it,
    I'd like to see how many ppd on barcelona on these wu.

    _ftp://dualamd.no-ip.org/292x.7z
    Note: those wus are used, so don't try uploading for credit.

Page 2 of 2 FirstFirst 12

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •