Results 1 to 22 of 22

Thread: This is what happens to my work computer...

  1. #1
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349

    This is what happens to my work computer...



    Didn't know where else to put it so it ended up in general hardware.

    This is what happens to my work computer.

    Specs:
    Intel Core i7 980X (3.33 GHz, 6-cores, TurboBoost enabled, HTT disabled, EIST disabled, C1E disabled, auto-OC'd to 3.46 GHz)
    Gigabyte GA-X58A-UD3R
    6x 4 GB Kingston DDR3-1333
    OCZ Vertex2 Extended 90 GB SSD SATA2
    Fusion io-Xtreme 80 GB PCIe x4 SSD card
    Western Digital 2 TB 7,2krpm SATA2
    nVidia Quadro FX1800

    This is while running a simulation (finite element analysis) that had I think it was 1.23 million elements, 1.9 million nodes, and 5.4 million degrees of freedom (DOFs).

    I ended up using all of my RAM except for 812 kB (out of 24 GB), created a 24.7 GB swap file, AND STILL ran out of space.

    The simulation ultimately created about 80 GB of data during the run (and it's still running). I ended up having to move it over to the WD drive, more space, but A LOT slower, so right now it's running about TEN TIMES slower compared to running it entirely off the SSDs.

    No, I can't put the swap file onto the card because the card isn't recognized by BIOS as a storage device (no SATA/SAS host bridge chip). That means that I can't boot off of it either.

    Peak write speeds are somewhere like 300 MB/s. Highest read speed I've been able to get from it is about 811 MB/s (while doing an analysis/simulation).

    And my system has already burnt out a power supply, and has technically going through five upgrades (the Fusion io card, the new power supply, and 24 GB of RAM; up from 6 GB -> 12 GB -> 24 GB, and the 2 TB drive). It was originally built only as a CAD design station. Now, my computer is faster than the computer than the one that the analyst has/uses. (I probably beat the crap out of my mine far more, and wayyyy harder than anybody else in the company right now.)
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  2. #2
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    so u need new hardware for it?

    maybe quad mangy with all 32 slots having 4GB sticks so 128GB of ram and 48 cores

    or a quick fix go raid0 on a pair of SLC SSDs, as u will be writing u need the slc over the mlc
    Last edited by zanzabar; 11-08-2010 at 06:08 PM.
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  3. #3
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Los Angeles/Hong Kong
    Posts
    3,058
    Or maybe a SR-2 rig?
    Team XS: xs4s.org



  4. #4
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    Quote Originally Posted by lkiller123 View Post
    Or maybe a SR-2 rig?
    the sr2 will only go 92GB of ram wont it and its going to cost close to price of a quad mangy for the MB/CPUs to go with 6 core intel or 12 core amd
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  5. #5
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by zanzabar View Post
    so u need new hardware for it?

    maybe quad mangy with all 32 slots having 4GB sticks so 128GB of ram and 48 cores

    or a quick fix go raid0 on a pair of SLC SSDs, as u will be writing u need the slc over the mlc
    We actually do have a quad Magny Cours on order right now. It will be 48-cores, with 128 GB of RAM, and an OCZ Vertex2 240 GB SSD SATA2 (I think, I don't think that any of them are SATA3 yet).

    Half of the RAM will be used as swap. At least that's the plan for now, but that might change because the work that I'm doing is evolving.

    The speed isn't so much the issue as capacity. Well...okay...speed is a little bit of an issue, but even MLC SSD will be better than any mechanically rotating disk. What we DON'T know, however, is what happens to the SSD with continual thrashing like that.

    For a lot of what I am doing right now, it's very much a "80 gig(abytes) up, 80 gig-down." (Meaning, I write 80 GB to disk, and then I will purge the data as fast as it can create it.)

    So while most people probably don't generate that much data, I kinda do so, maybe like every hour or so, and in 80 GB "blocks".

    The simulation that I'm doing right now, which is just a small, quick test one, has already read/written 510 GB of data and it's only been running for 45 minutes. My current estimates is predicting it's got about another hour and 15 minutes to go. That will put the read/write estimate (by the time it's done) at 1.36 TB. And that's "small" for me.

    Quote Originally Posted by zanzabar View Post
    the sr2 will only go 92GB of ram wont it and its going to cost close to price of a quad mangy for the MB/CPUs to go with 6 core intel or 12 core amd
    Did a little bit of research, actually the EVGA SR-2 only has a maximum of 48 GB of RAM.

    AFAIK, the nVidia C2070 Tesla CANNOT do SLI, so it won't really be much of help.

    And it uses Xeons rather than my Core i7 980X (which, if that were the case, I'd rather go with a Supermicro or Tyan board).

    *edit*
    I ended up running for 3.26 hours today, and reading/writing 2.1 TB of data.
    Last edited by alpha754293; 11-09-2010 at 09:40 AM.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  6. #6
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349


    Briefly, very briefly shot up to 1600 MB/s while doing a simulation (FEA) run.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  7. #7
    Registered User
    Join Date
    Oct 2009
    Location
    On the Internets!
    Posts
    5
    Hey Alpha, if you don't mind, I'm gonna pass this thread on to some of our techmonkies here at Fusion-io- I'm sure they'd be interested in that cool 1.6GB/s spike. Do you have any more data from that specific period of usage that you could provide, or any info that might help them replicate the spike? What version of Windows & Service Pack are you running?

    Also, I noticed you're using an old 1.2.7.x driver. Upgrading to the latest driver should cut down on the amount of memory the ioXtreme uses. You can head on over to http://support.fusionio.com/ to grab 'em.
    Last edited by Terrence; 11-10-2010 at 12:47 PM. Reason: Forgot to explicitly mention that I'm a FIO employee. Really should get around ot putting that in my sig...
    Management Consultant for Fusion-io.

  8. #8
    Xtreme Addict
    Join Date
    Sep 2010
    Location
    US, MI
    Posts
    1,680
    Looks like you have a virus or a bunch of them with that many procceses.

  9. #9
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by Terrence View Post
    Hey Alpha, if you don't mind, I'm gonna pass this thread on to some of our techmonkies here at Fusion-io- I'm sure they'd be interested in that cool 1.6GB/s spike. Do you have any more data from that specific period of usage that you could provide, or any info that might help them replicate the spike? What version of Windows & Service Pack are you running?

    Also, I noticed you're using an old 1.2.7.x driver. Upgrading to the latest driver should cut down on the amount of memory the ioXtreme uses. You can head on over to http://support.fusionio.com/ to grab 'em.
    Sorry, I would have responded sooner. I didn't see that there were replies although it's supposed to email me.

    Anyways, yea, sure go on and pass it along.

    Umm, to answer your questions; sorta yes and no.

    The Fusion-io card was used as the working directory/scratch drive for an Ansys FEA run. Ummm...I don't really remember how many nodes/elements/etc. there were (oh wait, n/m, it's in my initial post). I was also using the direct/Sparse solver (technically distributed Sparse, cuz I'm solving using distributed memory parallel (a.k.a. MPI) across 4 out of the 6 cores.

    So, what happens is that all of the working matrices and stuff are scratched onto the disk (Fusion-io card). More typical bandwidths that I see are between 100-300 MB/s (using distributed PCG solver). My previous peak speed on it was 811 MB/s.

    If you want to recreate, you can probably do so in a variety of ways:

    Create a moderately complex Ansys model with a few non-linearities and solve it using the Sparse/distributed Sparse solver. As the Fusion-io as your working directory. (I actually do my runs from command line rather than from the GUI because it tends to run a little bit faster; so write your input file out to the card, make a directory, and then launch the run from command line.)

    If you're using MATLAB, you can probably do something very similiar. Minus the parallelization; if you set your working directory to the card as well, it should do something very similiar. Depending on what you're doing and how you've coded it; you can write it such that all of the matrices have to be scratched to disk (card) and read back in. That's one way of getting it. If you re-read some of the matrices multiple times, that's a way of getting the spike as well. (Sometimes it's artifical, sometimes it's not, because if; for example, you need to re-read the stiffness matrix a few times, I wouldn't say that it's out the question of being unrealistic.)

    You could also write the program using Fortran/C/C++. Same/similiar idea, sparse solver (or full). Scratch to disk/card. Reread back in.

    Can't give you much more information than that because I'm not entirely sure HOW it happened exactly. And my current simulations are different problems now.

    *edit*
    I'm not too concerned about the memory usage for the driver. When you're eating up 24 GB, I doubt that few hundred kB or even a few MB would make a difference.

    Quote Originally Posted by NEOAethyr View Post
    Looks like you have a virus or a bunch of them with that many procceses.
    No. 65 processes. That's fairly normal for me.

    What you don't see is my task bar, which my boss has commented "...I don't think that you have enough windows opened there."

    I think right now, I have like 10-20 Windows explorer windows open (I forget), ummm....three cygwin windows, one command prompt window, two sessions of CATIA V5, Mozilla Firefox (with...uh...4 tabs open. Only.) Ansys. Ansys. Four Excel windows. Maybe a PDF or two. I don't think that I have any Powerpoints or Word files open though. Coretemp. Sound/tray. Umm...nvidia tray. Maybe the HP plotter and printer driver ("pre-spooler"). No MATLAB though.

    65 processes is about the average for me. 60-90 anyways.

    And remember that cygwin spawns like at least one cmd.exe process and two bash.exe processes each. So...yea.

    Nothing too out of the ordinary for me (other than just having 4 tabs open in Mozilla, instead of my usual 125).
    Last edited by alpha754293; 11-11-2010 at 04:47 PM.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  10. #10
    Xtreme Member Gilhooley's Avatar
    Join Date
    Nov 2006
    Posts
    164
    hmm, your most important info is missing - what OS?.. anyway, sounds like the classic MS bug, with a older os you can use dyncache http://www.microsoft.com/downloads/e...displaylang=en and with a newer OS an app like O&O Clevercache might work.
    Q9650@4000 - Apogee GTX, Gigabyte X48-DS5, 8GB Corsair Dominator XMS2-8500, GTX480 El cheapo Asetek block, Audiophile 192 + Adam-A7, Win7

  11. #11
    NooB MOD
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    5,799
    Dude, you can free up 20GB right now by closing Firefox...
    Xtreme SUPERCOMPUTER
    Nov 1 - Nov 8 Join Now!


    Quote Originally Posted by Jowy Atreides View Post
    Intel is about to get athlon'd
    Athlon64 3700+ KACAE 0605APAW @ 3455MHz 314x11 1.92v/Vapochill || Core 2 Duo E8500 Q807 @ 6060MHz 638x9.5 1.95v LN2 @ -120'c || Athlon64 FX-55 CABCE 0516WPMW @ 3916MHz 261x15 1.802v/LN2 @ -40c || DFI LP UT CFX3200-DR || DFI LP UT NF4 SLI-DR || DFI LP UT NF4 Ultra D || Sapphire X1950XT || 2x256MB Kingston HyperX BH-5 @ 290MHz 2-2-2-5 3.94v || 2x256MB G.Skill TCCD @ 350MHz 3-4-4-8 3.1v || 2x256MB Kingston HyperX BH-5 @ 294MHz 2-2-2-5 3.94v

  12. #12
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by Gilhooley View Post
    hmm, your most important info is missing - what OS?.. anyway, sounds like the classic MS bug, with a older os you can use dyncache http://www.microsoft.com/downloads/e...displaylang=en and with a newer OS an app like O&O Clevercache might work.
    Well, the screenshot of the Task Manger ought to be the most telling sign as to what OS it is.

    However, to answer your question, this is Windows XP Professional 64-bit (I forget if there's a SP on it or not...).

    And what "classic MS bug" is that?

    The eating of the RAM and "caching" (it's not really caching, it's more swapping/paging) is because of the size of the problem.

    To put it into perspective, here is the memory usage from the Ansys performance guide:


    Distributed ANSYS Dsparse Direct Solver Using p Cores
    RAM: 1 GB/million DOF on master node, 0.7 GB/million DOF on all other nodes
    I/O: 10 GB/million DOF * 1/p
    Double memory estimates for non-symmetric systems
    Add 30% for 3D models with higher order elements

    So, for RAM, on 5.4 million DOFs
    1.7 GB/MDOF * 5.4 = 9.18 GB.
    Add 30% for higher order elements = 11.934 GB.
    Double it for non-symmetric systems = 23.868 GB.

    For I/O (disk) on 5.4 million DOFs (on 4 cores)
    10 GB/MDOF * 1/p = 10*5.4*1/4 = 13.5 GB
    Add 30% for higher order elements = 17.55 GB*
    Double it for non-symmetric systems = 35.1 GB*

    (*Actually, it doesn't really say what happens for file/disk I/O for non-symmetric/higher-order elements).

    The point of it is that it doesn't surprise me because of the large problem that I'm solving, and also the nature of the problem.

    Quote Originally Posted by [XC] Oj101 View Post
    Dude, you can free up 20GB right now by closing Firefox...
    I wished. But I doubt it though. Not with...like 4-6 tabs anyways. Chrome is the WORST BTW....
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  13. #13
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    I do have SP1 installed (apparently).
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  14. #14
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349


    48-core partially went live today. Didn't have all of the RAM installed yet. Started testing it though.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

  15. #15
    Xtreme Addict
    Join Date
    Oct 2006
    Posts
    2,141
    uuuhhhhh.... Thats insane.
    Rig 1:
    ASUS P8Z77-V
    Intel i5 3570K @ 4.75GHz
    16GB of Team Xtreme DDR-2666 RAM (11-13-13-35-2T)
    Nvidia GTX 670 4GB SLI

    Rig 2:
    Asus Sabertooth 990FX
    AMD FX-8350 @ 5.6GHz
    16GB of Mushkin DDR-1866 RAM (8-9-8-26-1T)
    AMD 6950 with 6970 bios flash

    Yamakasi Catleap 2B overclocked to 120Hz refresh rate
    Audio-GD FUN DAC unit w/ AD797BRZ opamps
    Sennheiser PC350 headset w/ hero mod

  16. #16
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    can u w-prime it, im sure u would get alot of HWbot points


    edit- yes w-prime, not 2 prime
    Last edited by zanzabar; 11-21-2010 at 02:42 PM.
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  17. #17
    NooB MOD
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    5,799
    W-prime
    Xtreme SUPERCOMPUTER
    Nov 1 - Nov 8 Join Now!


    Quote Originally Posted by Jowy Atreides View Post
    Intel is about to get athlon'd
    Athlon64 3700+ KACAE 0605APAW @ 3455MHz 314x11 1.92v/Vapochill || Core 2 Duo E8500 Q807 @ 6060MHz 638x9.5 1.95v LN2 @ -120'c || Athlon64 FX-55 CABCE 0516WPMW @ 3916MHz 261x15 1.802v/LN2 @ -40c || DFI LP UT CFX3200-DR || DFI LP UT NF4 SLI-DR || DFI LP UT NF4 Ultra D || Sapphire X1950XT || 2x256MB Kingston HyperX BH-5 @ 290MHz 2-2-2-5 3.94v || 2x256MB G.Skill TCCD @ 350MHz 3-4-4-8 3.1v || 2x256MB Kingston HyperX BH-5 @ 294MHz 2-2-2-5 3.94v

  18. #18
    Registered User Utroz's Avatar
    Join Date
    Nov 2002
    Location
    Maine
    Posts
    68
    I wonder what the 48core system would get in Cinebench 11, maybe run everest memory bench, even a cpuid screen shot would be cool. Let us know how much faster it is in simulations compaired to your 6 core intel system.
    File Server


    Super Old system
    [SIGPIC][/SIGPIC]
    http://valid.x86-secret.com/show_oc.php?id=371866

  19. #19
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Location
    Montreal
    Posts
    791
    out of curiosity, why do you disable HT? Also, wouldn't you benefit from buying a better ssd like a c300 or intel 160gb or something bigger for your needs?

    Anyway, this seems like a strange thing but why has nobody suggested you overclock that 980x a bit in the thread yet? :P I'd think you could get a good 20-30% reduction in solve time.

  20. #20
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    Quote Originally Posted by antiacid View Post
    out of curiosity, why do you disable HT? Also, wouldn't you benefit from buying a better ssd like a c300 or intel 160gb or something bigger for your needs?

    Anyway, this seems like a strange thing but why has nobody suggested you overclock that 980x a bit in the thread yet? :P I'd think you could get a good 20-30% reduction in solve time.
    the intel ssds are worse and the c300 can be better but its about equal overall.

    and overclocking would not help he was out of memory
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  21. #21
    Xtreme Member
    Join Date
    Dec 2008
    Location
    Raleigh, NC
    Posts
    318
    Quote Originally Posted by alpha754293 View Post
    http://img819.imageshack.us/img819/4083/17189327.jpg

    48-core partially went live today. Didn't have all of the RAM installed yet. Started testing it though.
    I feel this is appropriate:


  22. #22
    Xtreme Member
    Join Date
    Apr 2006
    Location
    Ontario
    Posts
    349
    Quote Originally Posted by antiacid View Post
    out of curiosity, why do you disable HT? Also, wouldn't you benefit from buying a better ssd like a c300 or intel 160gb or something bigger for your needs?

    Anyway, this seems like a strange thing but why has nobody suggested you overclock that 980x a bit in the thread yet? :P I'd think you could get a good 20-30% reduction in solve time.
    The 980X was actually slightly OC'd from 3.33 GHz stock to 4 GHz. I forget if I had kept it that way when I ran this test. But eventually it was OC'd (because CATIA LOVESSS a 4GHz processor).

    I disable HTT because of two reasons: The first one is because with HTT, my logical core counts goes up to 12, which means for running Ansys, I would need TWO licenses instead of one. And the logical cores can only give me 10% increase in performance, which I don't think it's worth preventing other people from being able to do their runs. (Yea, that company had a wonky license set up. I've since then, changed jobs and we have an actual cluster running LSF Platform for job submissions.)

    The second is because if I don't disable HTT, and only set it run with 8-processors, that results in a LOT of thread/process migration. (BAD. Super BAD). So, without HTT, I can assign 4 cores for the FEA (6 (all) in the event of an emergency and they needed the results badly), and that leaves two for me to design with at the same time.

    Quote Originally Posted by zanzabar View Post
    the intel ssds are worse and the c300 can be better but its about equal overall.

    and overclocking would not help he was out of memory
    Yup. That's typically what happens or what can happen. It depends on how I set up the run.

    Quote Originally Posted by nascasho View Post
    I feel this is appropriate:

    LOL. That's funny.

    Sorry for the super late responses guys. XS used to email me whenever I get a reply and now it doesn't.
    flow man:
    du/dt + u dot del u = - del P / rho + v vector_Laplacian u
    {\partial\mathbf{u}\over\partial t}+\mathbf{u}\cdot\nabla\mathbf{u} = -{\nabla P\over\rho} + \nu\nabla^2\mathbf{u}

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •