Results 1 to 7 of 7

Thread: Ryzen-Test & Stress-Run Cause Segmentation Faults On Zen CPUs [Update: AMD Confirmed]

  1. #1
    Join XS BOINC Team StyM's Avatar
    Join Date
    Mar 2006
    Location
    Tropics
    Posts
    9,468

    Ryzen-Test & Stress-Run Cause Segmentation Faults On Zen CPUs [Update: AMD Confirmed]

    http://www.phoronix.com/scan.php?pag...est-Stress-Run

    Besides using ryzen-test to hammer the CPU to easily reproduce the issue, I also decided to use phoronix-test-suite stress-run. The stress-run command within the Phoronix Test Suite has been used by enterprise customers for stress testing / burn-ins of hardware and checking for stability. Rather than benchmarking for performance, stress-run allows executing multiple test profiles in parallel for fully loading the system with whatever workloads you would like. Using PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-php build-apache pgbench apache redis will have the Phoronix Test Suite continually running four different benchmarks simultaneously for a period of 60 minutes. As soon as one test finishes, another is fired up. The stress-run algorithm randomly picks the tests of your set to run, but does look at the test profile to ensure if the tests stress multiple subsystems, it tries to ensure stress on all subsystems are always being stressed. The Phoronix Test Suite's stress-run functionality isn't advertised as much as its other features, but is very useful for loading up a system with plenty of real-world workloads concurrently.

    We'll see now if AMD will provide public comments or if they investigate further as they now have another reproducible test case to slam the Ryzen chips hard in just a few minutes even with SMT disabled and running at DDR4-2133. As far as whether this just affects Ryzen or also Threadripper and Epyc remains unclear. While there are many Windows reviewers out there now with Threadripper, it doesn't look like AMD will be sending any Threadripper samples to Phoronix, at least in the immediate days ahead but I have asked if at least can get SSH access to a TR system for a few hours to be able to run some Linux benchmarks. We'll see. For the Epyc server processors as well, no samples are available according to a motherboard vendor that has been trying to get them on my behalf.

    Just to reiterate, while this problem is easy to cause under very heavy workloads, under normal Linux desktop workloads and even normal benchmarking, I haven't run into any Ryzen problems. I will be running some more Ryzen stress-tests today.

  2. #2
    Xtreme Member
    Join Date
    Jun 2008
    Location
    /dev/null
    Posts
    286
    Thank you for posting this.
    I was just about to buy a 1300X, but since I use Linux for everything except benchmarking I will read up on this issue further before I buy anything.

  3. #3
    Xtreme Enthusiast
    Join Date
    Dec 2005
    Posts
    746
    I've not had any issues under linux with Epyc. currently running Ubuntu 16.04.02 with 4.10 kernel just fine.
    Heat: 50 - 0 - 0 under "Argus333"

  4. #4
    Xtreme Enthusiast
    Join Date
    Feb 2010
    Posts
    578

  5. #5

  6. #6
    Xtreme Enthusiast
    Join Date
    Oct 2012
    Posts
    687
    Seems weird.
    So AMD is saying ryzen is affected but only early cpus`, and not TR and EPYC, even tho TR uses B1 stepping also.So there had to be some changes to the silicon either hardware wise or programing wise even in the same revision
    Intel 5960X@4.2Ghz[Prime stable]@4.5 [XTU stable] 1.24v NB@3.6ghz Asrock X99 Extreme 3 4x8GB Corsair Vengeance@3200 16-17-17
    Sapphire nitro+ VEGA 56 Samsung SSD 850 256GB Crucial MX100 512GB HDD:WD10TB WD:8TB Seagate8TB

  7. #7
    Xtreme Member
    Join Date
    May 2009
    Location
    Portugal
    Posts
    317
    Really weird. If it was willingly corrected at some point and there won't be a bios errata for affected CPUs, at least they could indicate affected batches for recall. It won't be nice for an early Ryzen adopter to find a defective CPU a couple years after when some other heavy-threaded app starts to error out.
    Strix X470-F, 1.2.0.6b | 5800X3D + Galahad 360, 3xP28 | 4x8GB Flare X 3200C14 @3200C14 1T+GDM | Strix 2070S A8G @1830/1750 | SB Z | SN750 500GB, MX500 1TB, DT01 2TB | O11D XL: 6xNB PL-2 | RM750

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •